Home

Anthropic Drops Flagship Safety Pledge

$$5874
https://infosec.pub/u/cm0002 posted on Mar 1, 2026 21:15

Anthropic, the wildly successful AI company that has cast itself as the most safety-conscious of the top research labs, is dropping the central pledge of its flagship safety policy, company officials tell TIME.

In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate. For years, its leaders touted that promise—the central pillar of their Responsible Scaling Policy (RSP)—as evidence that they are a responsible company that would withstand market incentives to rush to develop a potentially dangerous technology.

But in recent months the company decided to radically overhaul the RSP. That decision included scrapping the promise to not release AI models if Anthropic can’t guarantee proper risk mitigations in advance.

https://infosec.pub/post/42799756
Reply
$$5879
https://lemmy.world/u/certified_expert posted on Mar 1, 2026 21:19
In reply to: https://infosec.pub/post/42799756

What? How does this align with them dropping the pentagon’s contract?

https://lemmy.world/comment/22420207
Reply
$$5899
https://sh.itjust.works/u/ThePantser posted on Mar 1, 2026 21:46
In reply to: https://lemmy.world/comment/22420207

Sounds like they got black listed by the US and decided that was bad for business so they flipped quickly. Probably start sucking off Trump to get back in.

https://sh.itjust.works/comment/24048947
Reply
$$5913
https://reddthat.com/u/IAmYouButYouDontKnowYet posted on Mar 1, 2026 22:10
In reply to: https://infosec.pub/post/42799756

I hope the human species dies off completely this year.

https://reddthat.com/comment/25091396
Reply
$$5914
https://piefed.social/u/XLE posted on Mar 1, 2026 22:11
In reply to: https://lemmy.world/comment/22420207

I’m not sure if this change is entirely relevant, because the whole “AI safety” thing has been a sham from the beginning. It’s always been unverifiable and the promises have always been undoable. LLM’s just predict next word with a little extra randomness. And there’s no way to guarantee through an LLM that they won’t predict next word that ends up being bad. You can’t promise this without removing the randomness and then testing the infinite input and output that could happen.

It’s basically like when Google removed “don’t be evil.” It was a promise that was unfalsifiable and unquantifiable.

https://piefed.social/comment/10348896
Reply
$$5915
https://piefed.social/u/XLE posted on Mar 1, 2026 22:15
In reply to: https://infosec.pub/post/42799756

Funny timeline

  • February 24: Anthropic drops “responsible” policy
  • February 25: Defense Department gives Anthropic a deadline
  • February 27: Trump orders cutting ties
https://piefed.social/comment/10348939
Reply
$$5917
https://lemmy.world/u/certified_expert posted on Mar 1, 2026 22:17
In reply to: https://piefed.social/comment/10348896

Yeah, not in predicting, but they could do analysis of the generated output and filter. The so called “guardrails”

https://lemmy.world/comment/22421088
Reply
$$5921
https://piefed.social/u/XLE posted on Mar 1, 2026 22:23
In reply to: https://lemmy.world/comment/22421088

The problem is the filtration algorithm is basically flaky in the same way as the LLM itself. And even if it does work, I’ve never heard a single soul say that Anthropic shut down their account due to questionable prompts. I even ran into somebody here who claims he uses AI to work on sexual abuse cases; he says that he’s been stalled by the chatbot, but he’s never been blocked even for review.

https://piefed.social/comment/10349036
Reply
$$5933
https://fedia.io/u/FaceDeer posted on Mar 1, 2026 22:36
In reply to: https://infosec.pub/post/42799756

Feb 24, 2026 1:00 PM MT

This happened days before Trump threw his toddler tantrum.

Just another example of how attempting to appease wannabe-autocrats doesn’t work. Best you can do is maybe distract or delay them a bit, but be ever ready for them to turn on you and demand more.

https://fedia.io/m/technology@piefed.social/t/3527344/-/comment/14236261
Reply
$$5938
https://piefed.social/u/Rekall_Incorporated posted on Mar 1, 2026 22:47
In reply to: https://infosec.pub/post/42799756

All American polemics and “pledges” are BS, at least with respect to anything substantial.

Not saying it was always like and that it will always be like that, but it is reasonable to assume there it will take another generation (20-30 years) before we see any positive developments with respect to the culture of corruption, criminality and dishonesty that has unfortunately come to dominate American society.

Doesn’t matter if a hypothetical Barack Obama II comes to power. From my time in the living in the US (several years with extensive travel across many different states), the impression I got is that on real matters an Obama is actually not too different from a Trump. The biggest difference is that Trump owns his corruption and criminality (with excellent electoral success).

Even in foreign policy, Obama de facto approved the annexation of Crimea (our new leadership asked for support to fight the russian invasion of Crimea and were rejected) and he went along to characterize russia as “a regional power making trouble with its neighbors.”

A comically stupid approach that’s not too different from Trump’s gibberish.

And if you think I am being uncharitable, ask yourself the following question:

Meta has been found to knowingly enable fraud to gain $16 B in 2024 alone. Meta was also reported to have developed a “playbook” to manage this fraudulent scheme; so the whole thing was premeditated and with clear intent.

Is anything going to happen Meta (the entity) or Meta’s leadership (be it the far right or the centre right is in power)? Anyone who has lived in the US in the last ~30 years knows the answer!

https://piefed.social/comment/10349277
Reply
$$5943
https://piefed.social/u/aeiou posted on Mar 1, 2026 23:00
In reply to: https://infosec.pub/post/42799756

Am not fully used to Lemmy/pieced yet - why does this 4 day old post say it was posted an hour ago?

https://piefed.social/comment/10349379
Reply
$$5957
https://piefed.social/u/Rekall_Incorporated posted on Mar 1, 2026 23:46
In reply to: https://piefed.social/comment/10349379

It was posted in different communities.

In this one it was posted 4 days ago:

https://piefed.social/c/futurology/p/1815447/anthropic-drops-its-pledge-to-pause-ai-training-over-safety-concerns

In https://piefed.social/c/technology it was posted a few hours ago.

It’s cross-posts like on reddit.

https://piefed.social/comment/10349771
Reply
$$5961
https://feddit.org/u/timestatic posted on Mar 2, 2026 00:00
In reply to: https://fedia.io/m/technology@piefed.social/t/3527344/-/comment/14236261

This wasn’t for Trumo tho, this was for Anthropic themselves so they could develop AIs quicker in the AI race. So mainly a business incentive mostly unrelated to Trump I believe

https://feddit.org/comment/11791980
Reply