You are viewing a single comment's thread from:

RE: LeoThread 2024-10-24 11:17

You Only Need 32 Tokens to Represent a Video Even in VLMsPermalink

Salesforce's new method uses a novel encoder for video that requires substantially fewer tokens for proper representation. This has been tried a number of times in the past with minimal success, the key seems to be an explicit temporal encoder along with a spatial encoder.

#technology #ai #salesforce #vlm

Sort:  

Amazing what is taking place. In 18 months, it will take very few tokens to do anything.

Yea, the efficiency gain is incredible.

Yeah. Makes Moore's Law look stagnant.

People are not aware for what is going to hit them.