2 Results

The Safety Divide: Open-Source AI Models Fall Short on Guardrails for Antisemitic, Dangerous Content

Report

ADL study finds popular open-source LLMs can easily be manipulated by malicious actors to produce antisemitic, extremist, and dangerous content amid weak safety guardrails.

December 09, 2025

Read more about The Safety Divide: Open-Source AI Models Fall Short on Guardrails for Antisemitic, Dangerous Content

Breaking the Building Blocks of Hate: A Case Study of Minecraft Servers

Report

The first analysis of hate and harassment on Minecraft server data.

July 26, 2022

Read more about Breaking the Building Blocks of Hate: A Case Study of Minecraft Servers