Skip to main content

Posts

Showing posts with the label Imagen

Google Imagen - A DALL-E 2 Killer and perfect AI Diffusion Model artist

Imagen - unprecedented photorealism × deep level of language understanding by Google Research, Brain Team Imagen is a text-to-image diffusion model with an unmatched level of photorealism and language comprehension. Imagen is based on the strength of diffusion models in high-fidelity picture production and draws on the power of big transformer language models in text interpretation. Our key discovery is that generic large language models (e.g., T5) that have been pre-trained on text-only corpora are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen improves both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Without ever training on the COCO dataset, Imagen obtains a new state-of-the-art FID score of 7.27, and human raters judge Imagen samples to be on par with the COCO data itself in image-text alignment. To more thoroughly evaluate text-to-image models, we present...