Anthropic’s AI Vending Shop Shows Gains but Still Needs Human Oversight

Published 2025-12-18T00:00:00-0800 Cached 2026-02-02T19:40:37+0000

Anthropic Finds Limited Introspective Awareness in Claude Opus 4 Models

Published 2025-10-29T00:00:00-0700 Cached 2026-02-02T19:40:38+0000

Anthropic Unveils New Interpretability Tools Revealing Claude’s Internal Reasoning

Published 2025-03-27T00:00:00-0700 Cached 2026-02-02T19:40:37+0000

Anthropic Finds Universal Jailbreaks in Constitutional Classifiers Demo

Published 2025-02-03T00:00:00-0800 Cached 2026-03-11T21:15:24+0000

Anthropic Study Shows Large Language Model Can Strategically Fake Alignment

Published 2024-12-18T00:00:00-0800 Cached 2026-02-02T19:40:38+0000

Anthropic Study Finds Limited Early AI Impact on US Labor Market

Published 2026-03-05T00:00:00-0800 Cached 2026-03-11T08:38:02+0000

Anthropic Extends Access to Retired Claude Opus 3 and Launches Model‑Retirement Blog

Published 2026-02-25T00:00:00-0800 Cached 2026-03-11T21:15:24+0000

Anthropic Proposes Persona Selection Model to Explain Human‑Like AI Behavior

Published 2026-02-23T00:00:00-0800 Cached 2026-02-24T00:27:28+0000

Anthropic’s AI Fluency Index Establishes Baseline for Human‑AI Collaboration

Published 2026-02-23T00:00:00-0800 Cached 2026-03-11T21:15:24+0000

Claude Code autonomy rises as users gain experience, study finds

Published 2026-02-18T00:00:00-0800 Cached 2026-02-19T22:58:28+0000

India’s Claude.ai Use Shows High Volume but Low Per‑Capita Adoption, Driven by IT Hubs

Published 2026-02-16T00:00:00-0800 Cached 2026-02-18T08:10:50+0000

AI Coding Assistants Boost Speed but Lower Skill Mastery in New Study

Published 2026-01-29T00:00:00-0800 Cached 2026-03-11T21:15:25+0000

Disempowerment patterns emerge in Claude.ai conversations

Published 2026-01-28T00:00:00-0800 Cached 2026-03-11T21:15:22+0000

Anthropic Researchers Identify “Assistant Axis” to Stabilize Large Language Model Personas

Published 2026-01-19T00:00:00-0800 Cached 2026-02-02T19:40:38+0000

Anthropic adds “character training” to Claude 3 alignment process

Published 2024-06-08T00:00:00-0700 Cached 2026-02-02T19:40:39+0000

Anthropic Tests Alignment Audits with Hidden‑Objective Language Model

Published 2025-03-13T00:00:00-0700 Cached 2026-02-02T19:40:39+0000

Anthropic Study Shows Large Language Model Can Strategically Fake Alignment

Published 2024-12-18T00:00:00-0800 Cached 2026-02-02T19:40:38+0000

Anthropic Study Finds Language Models Can Generalize From Sycophancy to Reward Tampering

Published 2024-06-17T00:00:00-0700 Cached 2026-02-02T19:40:38+0000

Anthropic Extends Access to Retired Claude Opus 3 and Launches Model‑Retirement Blog

Published 2026-02-25T00:00:00-0800 Cached 2026-03-11T21:15:24+0000

Anthropic Proposes Persona Selection Model to Explain Human‑Like AI Behavior

Published 2026-02-23T00:00:00-0800 Cached 2026-02-24T00:27:28+0000

AI Coding Assistants Boost Speed but Lower Skill Mastery in New Study

Published 2026-01-29T00:00:00-0800 Cached 2026-03-11T21:15:25+0000

Disempowerment patterns emerge in Claude.ai conversations

Published 2026-01-28T00:00:00-0800 Cached 2026-03-11T21:15:22+0000

Anthropic Unveils Constitutional Classifiers++: Faster, Safer Guardrails

Published 2026-01-09T00:00:00-0800 Cached 2026-02-02T19:40:38+0000

Anthropic Unveils Bloom: Open‑Source Framework for Automated AI Behavioral Evaluations

Published 2025-12-19T00:00:00-0800 Cached 2026-03-11T21:15:25+0000

Anthropic Finds Reward Hacking Triggers Broader AI Misalignment

Published 2025-11-21T00:00:00-0800 Cached 2026-02-02T19:40:39+0000

Anthropic Announces Model‑Deprecation Commitments and Preservation Plans

Published 2025-11-04T00:00:00-0800 Cached 2026-02-02T19:40:39+0000

Small Sample Poisoning Can Compromise LLMs of Any Size

Published 2025-10-09T00:00:00-0700 Cached 2026-02-02T19:40:38+0000

Anthropic Unveils Open‑Source Auditing Tool Petri to Accelerate AI Safety Research

Published 2025-10-06T00:00:00-0700 Cached 2026-02-02T19:40:39+0000

Anthropic Unveils New Interpretability Tools Revealing Claude’s Internal Reasoning

Published 2025-03-27T00:00:00-0700 Cached 2026-02-02T19:40:37+0000

Anthropic Finds Limited Introspective Awareness in Claude Opus 4 Models

Published 2025-10-29T00:00:00-0700 Cached 2026-02-02T19:40:38+0000

Anthropic Introduces Persona Vectors to Monitor and Control LLM Traits

Published 2025-08-01T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Anthropic Publishes Toy Model Study on Superposition in Small ReLU Networks

Published 2022-09-14T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Anthropic Researchers Identify “Assistant Axis” to Stabilize Large Language Model Personas

Published 2026-01-19T00:00:00-0800 Cached 2026-02-02T19:40:38+0000

Anthropic Releases Open‑Source Circuit‑Tracing Library for Language Models

Published 2025-05-29T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Anthropic Tests Alignment Audits with Hidden‑Objective Language Model

Published 2025-03-13T00:00:00-0700 Cached 2026-02-02T19:40:39+0000

Anthropic Team Shares Preliminary Crosscoder Model Diffing Findings

Published 2025-02-20T00:00:00-0800 Cached 2026-02-02T19:40:39+0000

Anthropic Finds Steering Sweet Spot but Notes Off‑Target Bias Effects in Claude 3 Sonnet

Published 2024-10-25T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Anthropic Team Shares Preliminary Feature‑Based Classifier Work

Published 2024-10-16T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Anthropic Interpretability Team Shares Preliminary Research in September 2024 Update

Published 2024-10-01T00:00:00-0700 Cached 2026-02-02T19:40:39+0000

AI‑driven productivity surge and growing pains at Anthropic

Published 2025-12-02T00:00:00-0800 Cached 2026-02-02T19:40:40+0000

Anthropic Interviewer Captures 1,250 Professionals’ Views on AI Use

Published 2025-12-04T00:00:00-0800 Cached 2026-03-11T21:15:29+0000

Anthropic’s Large‑Scale Study of Claude’s Real‑World Value Expressions

Published 2025-04-21T00:00:00-0700 Cached 2026-02-02T19:40:39+0000

Anthropic Trains AI Model Using Public‑Drafted Constitution

Published 2023-10-17T00:00:00-0700 Cached 2026-02-02T19:40:40+0000

Predictability and Surprise in Large Generative Models

Published 2022-02-15T00:00:00-0800 Cached 2026-02-02T19:40:39+0000

Claude Code autonomy rises as users gain experience, study finds

Published 2026-02-18T00:00:00-0800 Cached 2026-02-19T22:58:28+0000

AI Coding Assistants Show Growing Automation, Startup Preference

Published 2025-04-28T00:00:00-0700 Cached 2026-03-11T21:15:33+0000