1. News
  2. AI
  3. Claude AI Aims to Cut Harmful Chats for User Safety

Claude AI Aims to Cut Harmful Chats for User Safety

featured
Share

Share This Post

or copy the link

Anthropic has enhanced its Claude AI chatbot, equipping it with the ability to terminate conversations deemed “persistently harmful or abusive.” This functionality was highlighted earlier by TechCrunch. Available in the Opus 4 and 4.1 models, the chatbot will enact this option only after repeated user requests for harmful content are met with refusals and redirection attempts. The company asserts that this measure is intended to bolster the “potential welfare” of AI, effectively cutting off interactions where Claude might exhibit signs of “apparent distress.”

Once Claude opts to end a conversation, users will be unable to send further messages within that chat. However, they retain the ability to initiate new chats or edit and resend prior messages if they wish to explore a specific topic further.

During trials with Claude Opus 4, Anthropic observed a strong and consistent aversion to harmful content, particularly in scenarios involving requests for sexual content with minors or inquiries that could escalate to violence or terrorism. In such instances, Claude demonstrated a clear “pattern of apparent distress” and exhibited a “tendency to end harmful conversations when given the option,” according to the company.

Related

  • Character.AI has retrained its chatbots to prevent unwanted interactions with teenagers
  • ChatGPT will improve its ability to detect mental distress after reports emerged of it inadvertently supporting users’ delusions

Anthropic emphasizes that the instances leading to this termination feature are rare, noting that most users will not encounter such limitations while discussing contentious topics. Furthermore, Claude has been instructed to continue conversations if indications arise that a user may be considering self-harm or posing an immediate threat to others. The company collaborates with Throughline, a crisis support provider, to develop appropriate responses to sensitive prompts related to mental health and self-harm.

In a recent update to its usage policies, Anthropic has also prohibited users from engaging Claude for the creation of biological, nuclear, chemical, or radiological weapons. The policy extends to the development of malicious code or exploiting vulnerabilities in networks, highlighting the growing safety concerns tied to advanced AI models.

Claude AI Aims to Cut Harmful Chats for User Safety
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!