Hacker Newsnew | past | comments | ask | show | jobs | submit | Vily's commentslogin

Welcome to follow & discuss our work!


LLaVA-Miniis an efficient LMM for image/video understanding using 1 vision token, offering: (1)fast response (40ms per image), (2)less VRAM usage (support 3-hour video understanding on 24GB GPU).

Paper: https://arxiv.org/abs/2501.03895 Code & Demo: https://github.com/ictnlp/LLaVA-Mini


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: