Qwen-2-VL-72B-Instruct is a multimodal model handling images, videos, and text across languages. This 72B parameter model excels in visual analysis, 20+ minute video understanding, device control, and multilingual text recognition.
anthropic