Photos, voice notes, videos, audio files — just send them in Telegram and Ghali understands what you're sharing.
Real conversations aren't just words. You snap a photo of a menu. You record a quick voice note. You forward a video someone sent you. Ghali handles all of it — natively, through Telegram.
Send a photo and ask about it. Screenshots, documents, receipts, menus, signs — Ghali reads and describes them.
Too lazy to type? Just talk. Ghali transcribes your voice note and responds to what you said.
Forward a video and ask what's happening. Ghali watches it and gives you a summary or answers your questions.
Podcasts, recordings, audio messages — send them over and Ghali listens and responds.
No special commands. No "please analyze this image." Just send it the way you'd send it to a friend — drop the photo, add a question if you want, and Ghali figures out the rest.
Reply to a photo you sent earlier with a new question, and Ghali pulls it up and re-analyzes it. Context carries over naturally.
Under the hood, Ghali uses Google Gemini's native multimodal capabilities. That means images, audio, and video aren't converted to text first — the AI actually sees and hears them, giving you much better results than transcription-based approaches.