TV9
user profile
Sign In

By signing in or creating an account, you agree with Associated Broadcasting Company's Terms & Conditions and Privacy Policy.

Anthropic launches Bloom, tool to study real-world AI behaviour and safety

Anthropic has launched Bloom, an open-source tool to help researchers study how AI models behave in real-world scenarios. The tool dynamically generates evaluation tests to measure alignment, safety, and misaligned behaviour at scale.

Bloom is now publicly available on GitHub and is already being used to explore AI safety risks and vulnerabilities.
Bloom is now publicly available on GitHub and is already being used to explore AI safety risks and vulnerabilities.
| Updated on: Dec 23, 2025 | 05:09 PM

New Delhi: Anthropic has released a new open-source research tool called Bloom that tries to allow scientists to gain a better insight into the behaviour of advanced AI models in real-life scenarios. The tool aims at analysing the alignment, safety, and instances of where models can behave in an unpredictable or misaligned manner as they are exposed to ever more complex environments.

With increasingly advanced AI systems, the conventional evaluation systems are proving ineffective. Fixed test cases may require months to be designed and may soon become redundant as the models continue to get better or as they learn to recognise the test patterns themselves. Bloom by Anthropic is the answer to this problem and will provide a faster and more flexible method of testing model behaviour than carefully controlled demonstrations.

Also Read

Moving beyond static AI tests

There are no predetermined scenarios of evaluation used by Bloom as opposed to conventional benchmarks. A researcher has a defined behaviour of interest that he or she wishes to research, and Bloom creates new scenarios automatically that are meant to stimulate that behaviour. This enables an evaluation to be relevant despite the adaptation of models.

The tool quantifies the frequency of occurrence of a behaviour and its intensity in a wide variety of situations. This method, according to Anthropic, provides the researcher with a better sense of the behaviour of models in the non-laboratory setting.

Focus on alignment and safety

As the project was launched, the benchmark results were released by Anthropic on four behaviours associated with alignment and safety to AI, such as delusional sycophancy, long-horizon sabotage, self-preservation, and self-preferential bias. The company claims that the results of these tests on 16 prominent AI models were highly consistent with those made by human assessors (Bloom).

Bloom has four steps of automation. It initially forms an interpretation of the target behaviour and subsequently develops conditions that trigger the behaviour. The results are assessed with the help of simulated conversations, and the results are rated by a separate model judge. Bloom then generates summary measures of the frequency and strength of the occurrence of the behaviour.

Flexible yet reproducible

Flexibility is one of the major strengths of Bloom. With every run of evaluation, new scenarios are generated, and as a result, there is a risk that the models will overfit to the well-known tests. Simultaneously, the same outcome can be recreated with the help of common configuration files, and researchers can compare the results more easily.

Anthropic adds to its previous open-source Petri. Whereas Petri examines a full spectrum of behaviours in a longer conversation, Bloom examines individual behaviours in depth.

Bloom has been made publicly available on GitHub. It is already being used by early researchers to examine jailbreak risks, evaluation awareness and other safety concerns. Anthropic states that instruments such as Bloom are gaining more significance where AI systems are now entering the real world and behaviour comprehension is paramount.

{{ articles_filter_432_widget.title }}