RE: LeoThread 2024-10-28 03:27

You are viewing a single comment's thread from:

RE: LeoThread 2024-10-28 03:27

View the full context
View the direct parent

taskmaster4450le (81)in LeoFinance • last year

The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

last year in LeoFinance by taskmaster4450le (81)

$0.00

Sort:

Trending