Applications that extract insights from video and audio content in a single model call, without the separate ASR (automatic speech recognition) transcription step, speaker diarisation service, and language model summarisation pipeline that equivalent analysis required before native video/audio model support. Gemini 1.5 Pro processes up to 1 hour of video or approximately 8.4 hours of audio in a single context window via Cloud Storage URI submission, making it practical for processing full meeting recordings, training videos, and recorded presentations without splitting them into segments. Meeting recording summarisation: a 60-minute recorded meeting submitted as an MP4 file; Gemini identifies the discussed topics, decisions made, and action items with the responsible person's name, output as a structured JSON object that populates your meeting management system. Timestamp-anchored analysis: Gemini returns references to specific moments in the video (e.g., "at 23:45, the team agreed to...") that link directly to the corresponding point in the recording, actionable for follow-up rather than requiring a full re-watch. Product demo analysis for sales intelligence: recorded customer demos or prospect calls processed to extract objections raised, features the prospect engaged with most, competitor mentions, and buying signals, structured data for CRM enrichment without manual call review. Training video indexing: educational video content processed to generate a timestamped chapter index, keyword index, and Q&A pairs from the content, searchable without full transcript generation. Audio content analysis without transcription: Gemini processes audio natively and can identify speaker tone, identify multiple speakers, and extract information that does not appear in the literal words, pauses, emphasis, and sentiment that transcription alone misses.