ChatGPT introduced voice and image capabilities for a more intuitive user interaction

Editor

18:00, 25 September 2023

ChatGPT introduced voice and image capabilities for a more intuitive user interaction

The latest enhancements to ChatGPT allow users to have voice conversations, making the interaction smoother and more natural. Whether you're on the move, craving a bedtime story, or engrossed in a dinner debate, ChatGPT is ready to converse.

To activate this feature, users can navigate to Settings, opt for New Features, and enable voice conversations. A range of five distinct voices, crafted with the collaboration of professional voice actors, awaits users, promising a more tailored experience. This voice capability stems from an advanced text-to-speech model, adept at generating lifelike audio from mere text fragments and a brief speech sample. Whisper, OpenAI's proprietary speech recognition system, seamlessly transcribes spoken words, ensuring fluid conversations.

Besides voice, ChatGPT can now process images. Whether it's troubleshooting an appliance, brainstorming dinner ideas from available ingredients, or deciphering intricate work graphs, ChatGPT is equipped to assist. Users can highlight specific image segments using the drawing tool, ensuring that the chatbot's focus aligns with theirs.

This image understanding is anchored by multimodal GPT-3.5 and GPT-4, combining language reasoning with diverse image types, spanning photographs, screenshots, and composite documents.

OpenAI's vision, centered around the creation of safe and beneficial AGI, emphasizes the gradual release of tools. Such a strategy ensures iterative improvements, risk mitigation, and lays the groundwork for future, more potent systems.

The introduction of voice technology, capable of crafting synthetic voices indistinguishable from human ones, is a double-edged sword. While it promises a plethora of creative and accessibility applications, it also poses risks like impersonation and fraud. This underscores the importance of a controlled release, ensuring responsible usage.

Vision-based models, while groundbreaking, present their own challenges. From generating inaccurate interpretations to privacy concerns, these models require careful calibration. OpenAI's collaboration with Be My Eyes, an application tailored for the visually impaired, has been instrumental in understanding the practical uses and constraints of visual AI. Taking cues from real-world feedback, OpenAI has implemented measures to curb ChatGPT's direct analysis of people, striking a balance between utility and privacy.

OpenAI acknowledges the limitations of ChatGPT, particularly when users rely on it for specialized domains. The company encourages users to approach the model's suggestions with caution and emphasizes verifying information for high-risk applications. Additionally, while proficient in English, ChatGPT's performance dwindles with non-roman scripts, prompting OpenAI to advise non-English users to exercise discretion.

As ChatGPT gears up to offer voice and image capabilities to Plus and Enterprise users in the coming weeks, the horizon looks promising. OpenAI's commitment to expanding access to other user groups, including developers, signals an exciting future for digital assistants.

In summary, with its multi-sensory enhancements, ChatGPT is poised to redefine digital interactions, cementing its position as a frontrunner in the AI landscape.

More news

How to transform your photos into Ghibly style (and other) pictures using ChatGPT
OpenAI opens free access to improved ChatGPT image generator

Tags:

ChatGPT introduced voice and image capabilities for a more intuitive user interaction

Войти через почту

Register

I forgot the password

Register