
Multimodal LLM Garlic is Google’s experimental AI detection system designed to identify AI-generated and manipulated images by analyzing visual and textual inconsistencies. Announced in early 2024, Garlic represents Google’s response to the proliferation of deepfakes and synthetic media across digital platforms.
Garlic operates by cross-referencing visual elements with associated metadata, captions, and contextual information. The system employs a dual-encoder architecture that processes both image pixels and text tokens simultaneously, looking for semantic mismatches that human observers might miss. When an image claims to show a specific event but contains anachronistic elements or physically impossible lighting, Garlic flags these discrepancies with 87% accuracy according to Google’s internal benchmarks.
Unlike previous detection methods that rely solely on artifact analysis or pixel-level forensics, Garlic leverages language understanding to assess plausibility. It can determine whether an image’s content aligns with real-world knowledge—for instance, identifying that a supposed 2020 photograph contains technology released in 2023. This multimodal approach proves particularly effective against sophisticated generative models like Midjourney and DALL-E 3.
Garlic struggles with culturally specific content and images from non-Western contexts, showing a 23% higher false-positive rate in preliminary testing. The system also requires substantial computational resources, making real-time deployment at scale economically challenging for most organizations outside major tech companies.
Discover more content from our partner network.