RE: LeoThread 2024-10-24 11:17

You are viewing a single comment's thread from:

RE: LeoThread 2024-10-24 11:17

View the full context
View the direct parent

tokenizedsociety (69)in LeoFinance • 3 months ago

You Only Need 32 Tokens to Represent a Video Even in VLMsPermalink

Salesforce's new method uses a novel encoder for video that requires substantially fewer tokens for proper representation. This has been tried a number of times in the past with minimal success, the key seems to be an explicit temporal encoder along with a spatial encoder.

#technology #ai #salesforce #vlm