Text Normalizers

TTS Cleaning & Normalization


source

TTSTextNormalizer

 TTSTextNormalizer (language='en')

Initialize self. See help(type(self)) for accurate signature.

cleaner = TTSTextNormalizer()
print(cleaner.en_normalize_numbers("$350"))
print(cleaner.expand_time_english("12:05pm"))
print(cleaner("Oh my dear! this is $5 too soon... It's 1:04 am!"))
print(cleaner(["Oh my dear! this is $5 too soon...", "It's 1:04 am!"]))
three hundred fifty dollars
twelve oh five p m
oh my dear! this is five dollars too soon... it's one oh four a m!
['oh my dear! this is five dollars too soon...', "it's one oh four a m!"]

source

Punctuation

 Punctuation (puncs:str=';:,.!?¡¿—…"«»“”')

Handle punctuations in text.

Just strip punctuations from text or strip and restore them later.

Args: puncs (str): The punctuations to be processed. Defaults to _DEF_PUNCS.

Example: >>> punc = Punctuation() >>> punc.strip(“This is. example !”) ‘This is example’

>>> text_striped, punc_map = punc.strip_to_restore("This is. example !")
>>> ' '.join(text_striped)
'This is example'

>>> text_restored = punc.restore(text_striped, punc_map)
>>> text_restored[0]
'This is. example !'

source

PuncPosition

 PuncPosition (value, names=None, module=None, qualname=None, type=None,
               start=1)

Enum for the punctuations positions

punc = Punctuation()
text = "This is. This is, example!"
print(punc.strip(text))
split_text, puncs = punc.strip_to_restore(text)
print(split_text, " ---- ", puncs)
restored_text = punc.restore(split_text, puncs)
print(restored_text)
This is This is example
['This is', 'This is', 'example']  ----  [_punc_index(punc='. ', position=<PuncPosition.MIDDLE: 2>), _punc_index(punc=', ', position=<PuncPosition.MIDDLE: 2>), _punc_index(punc='!', position=<PuncPosition.END: 1>)]
['This is. This is, example!']