OpenAI GPT-4, a highly anticipated AI model, is set to be released in mid-March 2023, according to a report from Search Engine Journal. The report states that GPT-4 will be a multimodal model, capable of processing multiple types of input, such as video, images, and sound. This announcement was confirmed by Andreas Braun, CTO of Microsoft Germany, who stated that GPT-4 will be released on March 14, 2023.
Unlike its predecessors, GPT-3 and GPT-3.5, which only operated within the text modality, GPT-4 will expand its capabilities by incorporating multiple modalities. This means that GPT-4 will be able to process text, speech, images, and videos. The inclusion of these additional modalities is expected to significantly enhance the model’s performance and versatility.
The report highlights that GPT-4 has undergone extensive testing and training to improve its overall performance. OpenAI claims that GPT-4 demonstrates human-level performance on various professional and academic benchmarks, showcasing its advancements in factuality, steerability, and adherence to specified guardrails. GPT-4 also surpasses GPT-3.5 by achieving higher scores on simulated bar exams, further highlighting its enhanced capabilities.
The introduction of multimodality in GPT-4 has been a significant development. Microsoft’s Holger Kenn explained that multimodal AI allows for the translation of text into various other forms, such as images, music, and video. This multimodal capability opens up new avenues for applications of GPT-4 across different industries and sectors.
Furthermore, Microsoft has released its own multimodal language model, called Kosmos-1, which integrates text and images. While Kosmos-1 represents a step towards multimodality, GPT-4 takes it a step further by incorporating additional modalities like video and sound. GPT-4 is also reportedly capable of answering questions in different languages, transcending language barriers and providing answers in the language in which the question was asked.
The report suggests that Google may be falling behind in the race to incorporate multimodal AI into its search engine. While Google has integrated AI into various products like Google Lens and Google Maps, Microsoft’s implementation of multimodality has garnered more attention and raised questions about Google’s leadership in consumer-facing AI.
In conclusion, the upcoming release of OpenAI GPT-4 is highly anticipated due to its multimodal capabilities. This new AI model is expected to revolutionize the way AI processes different types of inputs, ranging from text to images and videos. With its enhanced performance and versatility, GPT-4 has the potential to redefine the AI landscape and open up new possibilities in various industries.