OpenAI continues to add new capabilities to its conversational AI that have brought humanity into a new era of true human-machine collaboration. Connection to the web, analysis and generation of images, analysis and speech synthesis are now also available to the chatbot in the paid version.
ChatGPThas been driven since the beginning of the year by the LLM model “GPT-4”, which we know is multimodal, but whose visual and auditory capabilities have so far been limited and blocked.
In the last few days, OpenAI has decided to further exploit the potential of its generative and conversational AI. Although these new features are currently limited to paid users of the ChatGPT Plus and ChatGPT Enterprise versions. As a reminder, for those who want to stay free, Microsoft’s Bing Chat offers most of these features.
The return of the WEB connection
It started with reinstating a feature that appeared briefly this summer9 but was quickly removed (after smart people discovered that it allowed free access to paid sites via ChatGPT): AI connectivity on the WEB! The GPT-4 model underlying ChatGPT was trained on documents from before the end of 2021. Without an internet connection, AI not only would not be able to truly analyze web documents, but it would also not be able to enrich its answers with up-to-date information. By enabling the “Search with Bing” setting in Settings and Beta/Beta Features, Conversational AI can now answer questions about current topics and events and connect to the web to refine its analytics.
From understanding to image generation
Another important innovation: OpenAI has finally decided to unleash the multimodal potential of GPT-4. ChatGPT is now based on the brand new GPT-4V iteration of its base model, formalizing access to image analysis. ChatGPT Plus users will soon (and now can on iOS and Android mobile versions) submit images or questions illustrated with images and have them analyzed and commented on by the AI. AI can translate handwritten manuscripts, convert the drawn contour of an algorithm or a screen into computer code, analyze and describe a photo or a painting, analyze captchas, and much more.
In addition, OpenAI will soon integrate its spectacular image generator “Dall-E 3” into ChatGPT (it is already available on Bing Image Creator and the renders are really much more impressive than Dall-E 2) and seriously compete with Midjourney, but offer more variety of styles.
Voice to expand interactions
One of the great strengths of generative AI is that it is revolutionizing human-machine interactions by making natural language support these interactions. From now on, the idea is to be able to carry out such interactions using voice instead of writing. Due to the current time of analysis and understanding of human language, we still have to wait a little to have a discussion like with a human. But we’re getting closer.
With Bing Chat in the mobile version, you can ask questions by voice and the AI can also answer them by voice. The AI uses models developed by Microsoft.
OpenAI will soon integrate its voice-to-text model “Whisper” into ChatGPT Plus. Thanks to a new “TTS – Text to Speech” model with 5 different voices, the chatbot can also speak.
In other words, ChatGPT can now connect, see, speak and hear. So many new features that clever little users will definitely be using for unforeseen purposes, having fun bypassing the restrictions OpenAI has put in place to prevent malicious or inappropriate uses of their AI.