Multimodal GPT-4 is on the way, it will work with text, images, video and music

2 min read

ChatGPT is still the most sought-after artificial intelligence system and application that attracts more and more users, but something new is brewing in the background. OpenAI and Microsoft continued the development of the language model GPT-3, then GPT-3.5, which is currently current, and as early as next week the public could get a first look at the next iteration, GPT-4.
It was announced, albeit unofficially, at the German conference “AI in Focus – Digital Kickoff”, where Andreas Braun, technical director of Microsoft for Germany, mentioned this fact by the way.

Versatile AI
According to him, GPT-4 will not only be an upgrade of the language model but will also gain multimodality, a function that Microsoft recently demonstrated in the form of its own Kosmos-1 system. This means that the new AI model will include input information from images, videos, as well as from text, it will be able to combine them and understand the context, just as it now “understands” instructions given only in natural language, and in almost all languages of the world.

The system could also work in the opposite direction – instead of taking multimedia content as input, it will probably be able to produce images, video, and even music, based only on linguistic “prompts”. These possibilities would lead to a situation where a publicly available AI system solves visual intelligence tests created for people, has the ability to “read” any multimedia content and then use the information obtained in further processing, is able to autonomously narrate a video, talk about it, and the like.

500 times more powerful?
According to unofficial information, GPT-4 will be based on 500 times more parameters than the ChatGPT model, so they could be counted in tens of trillions. That something of this type is “cooking” is confirmed by the paper published this week, which describes “Visual ChatGPT”, a combination of advanced chatbot and visual generative models.

- ADVERTISEMENT -

After the presentation of Kosmos-1, and the already known capabilities of the DALL-E 2 system, it would not be unusual for these technologies to merge into one so that under OpenAI we get a unique, comprehensive, and multimodal system of generative artificial intelligence.

- ADVERTISEMENT -

Multimodal GPT-4 is on the way, it will work with text, images, video and music

FOLOW US

LATEST NEWS

How to Pair AirPods: A Simple Step-by-Step Guide

5 Reasons Why Google Nest Mini is the Perfect Birthday Gift

How to hide read receipts on Whatsapp

How to Install Google Chrome on Your Android Smartphone

Linus Sebastian got hacked!