[NExT-GPT: Open Source Multimodal AI Technology Challenges Big Tech Giants]

Singapore’s National University of Singapore (NUS) and Tsinghua University have joined forces to develop NExT-GPT, an open source multimodal AI model that aims to rival industry giants like OpenAI and Google. With its revolutionary conversational capabilities, NExT-GPT combines text, images, audio, and video to provide more natural interactions compared to text-only models.

The team behind NExT-GPT describes it as an any-to-any system, meaning it can accept inputs in any modality and deliver responses accordingly. This open-source model empowers users to customize and modify it to suit their specific needs, resulting in potential advancements far beyond its original capabilities.

So, how does NExT-GPT work? The model utilizes separate modules to encode various inputs, such as images and audio, into text-like representations that the core language model can process. Researchers have implemented a technique called modality-switching instruction tuning to enhance cross-modal reasoning abilities, enabling seamless transitions between different types of inputs during conversations.

NExT-GPT employs unique tokens for each input and output modality, allowing for flexible any-to-any conversion. These tokens facilitate the generation of text responses, as well as trigger the production of non-text outputs like images and videos. Different decoders, including Stable Diffusion for images, AudioLDM for audio, and Zeroscope for videos, handle the output generation for each modality. In addition, NExT-GPT integrates Vicuna as the base LLM (large language model) and ImageBind for input encoding.

Despite training only 1% of the total parameters, NExT-GPT achieves remarkable flexibility in any-to-any conversion. The majority of parameters are frozen, pretrained modules, making it incredibly efficient.

Although a demo site for NExT-GPT has been established, availability remains intermittent. Nonetheless, NExT-GPT presents itself as a compelling open-source alternative for creators seeking to harness the power of multimodal AI. Multimodality is crucial for enabling more natural interactions, and by open-sourcing NExT-GPT, researchers are providing a platform for the community to propel AI to new heights.

As tech giants like Google and OpenAI launch their own multimodal AI products, NExT-GPT introduces healthy competition in the field. Its ability to process multiple modalities and generate coherent responses holds great potential for advancing conversational AI. By embracing openness and collaboration, NExT-GPT brings researchers, developers, and enthusiasts together to shape the future of AI.

NExT-GPT: Open Source Multimodal AI Model Takes on Big Tech Giants with Revolutionary Conversational Capabilities, Singapore

Subscribe

Revolutionary Small Business Exchange Network Connects Sellers and Buyers

District 1 Commissioner Race Results Delayed by Recounts & Ballot Reviews, US

Fed Minutes Hint at Potential Rate Cut in September amid Economic Uncertainty, US

Baltimore Orioles Host First-Ever ‘Faith Night’ with Players Sharing Testimonies, US

Democratic National Convention Approves Platform Doubling Down on Abortion and LGBTQ+ Rights in 2024

More like this
Related

Revolutionary Small Business Exchange Network Connects Sellers and Buyers

District 1 Commissioner Race Results Delayed by Recounts & Ballot Reviews, US

Fed Minutes Hint at Potential Rate Cut in September amid Economic Uncertainty, US

Baltimore Orioles Host First-Ever ‘Faith Night’ with Players Sharing Testimonies, US

About us

Company

The latest

Revolutionary Small Business Exchange Network Connects Sellers and Buyers

District 1 Commissioner Race Results Delayed by Recounts & Ballot Reviews, US

Fed Minutes Hint at Potential Rate Cut in September amid Economic Uncertainty, US

Subscribe

NExT-GPT: Open Source Multimodal AI Model Takes on Big Tech Giants with Revolutionary Conversational Capabilities, Singapore

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

More like this
Related