Microsoft NUWA-Infinity takes on DALL-E, artist AI which create images and videos from text

With NUWA-Infinity, Microsoft joins the development of tools based on Artificial Intelligence (AI) to produce visuals from the text. The development of these tools is currently a perfect success, so much so that huge technology corporations such as Google have delved into the sector to provide increasingly complex and absolutely startling solutions. Microsoft has now presented a new proposal that outperforms its primary competitors, "DALL-E" from Open AI and "Image" from Google.

NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.  An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches,  and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation.

Nuwa-Infinity: Microsoft's New Artist AI

This technique is now known as NUWA-INFINITY. It is "a multimodal generative model designed to produce high-quality pictures and films from text, image, or video input," according to Microsoft. As a result, there isn't much of a competitive edge over OpenAI's DALL-E or Google's Brain.

The key distinction of this new Microsoft AI is that it can make long films based on a description, as well as high-resolution photographs of arbitrary sizes, and its main power and distinguishing characteristic is that it can "stretch" any image. Microsoft uses Van Gogh's Starry Night as an example on its official website to demonstrate how it can add more features to the picture while maintaining its original design and providing highly precise continuities. Microsoft also displayed the classic snapshot of a Windows landscape, in which you guess what additional things are around, as well as films made from images.

What sets Microsoft’s NUWA-Infinity apart from the competition?

Unlike the previous two innovations, Microsoft's NUWA-Infinity is intended to produce high-quality pictures and movies from a given text, image, or video. As a result, she is the only AI capable of producing a full-length film from an image derived from text. It also offers greater visual synthesis capabilities in terms of resolution and resizable rendering. Second, Microsoft's NUWA-Infinity is capable of bringing static images to life with an extremely realistic effect.

The sole disadvantage is that, like Open AI DALL-E, this new Microsoft AI is not yet available to all users. Currently, only a restricted number of persons chosen by Microsoft may utilise it for specialised research activities.