Aligners

Collection of Aligner models

Wav2Vec2.0 Aligner


source

AlignerWAV2VEC2

 AlignerWAV2VEC2 (text_normalizer, device='cuda')

Initialize self. See help(type(self)) for accurate signature.


source

Point

 Point (token_index:int, time_index:int, score:float)

source

Segment

 Segment (label:str, start:int, end:int, score:float)

Usage

text_normalizer = TTSTextNormalizer().english_cleaners
aligner = AlignerWAV2VEC2(text_normalizer, device='cpu') # for CI on cpu
wav_path = "../data/en/LibriTTS/test-clean/1089/134686/1089_134686_000015_000001.wav"
txt_path = "../data/en/LibriTTS/test-clean/1089/134686/1089_134686_000015_000001.original.txt"
wav, sr = torchaudio.load(wav_path)
with open(txt_path, 'r') as f: txt = f.read()
alignments = aligner.get_alignments(wav, txt)
Word: HE, Confidence: 1.00, Start:0.121,  End: 0.242 sec
Word: TRIED, Confidence: 0.91, Start:0.323,  End: 0.625 sec
Word: TO, Confidence: 1.00, Start:0.686,  End: 0.787 sec
Word: THINK, Confidence: 0.93, Start:0.948,  End: 1.311 sec
Word: HOW, Confidence: 0.91, Start:1.473,  End: 1.675 sec
Word: IT, Confidence: 0.71, Start:1.755,  End: 1.836 sec
Word: COULD, Confidence: 0.75, Start:1.917,  End: 2.118 sec
Word: BE, Confidence: 1.00, Start:2.239,  End: 2.461 sec