Category: Multimodal AI

Week 22, 2026 — Vision & Multimodal Systems

Vision-language models made strides in high-resolution perception, 3D reasoning, video efficiency, and unified digital human generation. CVSearch: Cognitive Visual Search for High-Resolution MLLMs CVSearch by Liupeng Li et al. addresses the coverage-efficiency dilemma in high-resolution image perception for MLLMs. It dynamically schedules search strategies: first trying expert-assisted search, and only triggering a novel Semantic Guided…

31 May 2026
Multimodal AI: The Year We Stopped Gluing Encoders to LLMs

54 papers surveyed | May 2025 – May 2026 — For years, multimodal AI was mostly a wiring problem: take a vision encoder, glue it to an LLM, add a projection layer, and call it a day. In 2025-2026, that era ended. The field stopped asking “how do we connect vision to language?” and started…

25 July 2025

Week 22, 2026 — Vision & Multimodal Systems