Sort:  

You can try to redirect played voice to recorder by software, use dictation.io or something like this and then translate output. It might be incorrect, but I think it might be understandable. It worked for me for some German vids