Grok controversies raise questions about moderating, regulating AI content

Elon Musk’s artificial intelligence (AI) chatbot Grok has been plagued by controversy recently over its responses to users, raising questions about how tech companies seek to moderate content from AI and whether Washington should play a role in setting guidelines. Grok faced sharp scrutiny last week, after an update prompted the AI chatbot to produce...

newsroom

Jul 15, 2025 - 16:30

0 5

Grok controversies raise questions about moderating, regulating AI content

Elon Musk’s artificial intelligence (AI) chatbot Grok has been plagued by controversy recently over its responses to users, raising questions about how tech companies seek to moderate content from AI and whether Washington should play a role in setting guidelines.

Grok faced sharp scrutiny last week, after an update prompted the AI chatbot to produce antisemitic responses and praise Adolf Hitler. Musk’s AI company, xAI, quickly deleted numerous incendiary posts and said it added guardrails to “ban hate speech” from the chatbot.

Just days later, xAI unveiled its newest version of Grok, which Musk claimed was the “smartest AI model in the world.” However, users soon discovered that the chatbot appeared to be relying on its owner’s views to respond to controversial queries.

“We should be extremely concerned that the best performing AI model on the market is Hitler-aligned. That should set off some alarm bells for folks,” Chris MacKenzie, vice president of communications at Americans for Responsible Innovation (ARI), an advocacy group focused on AI policy.

“I think that we’re at a period right now, where AI models still aren’t incredibly sophisticated,” he continued. “They might have access to a lot of information, right. But in terms of their capacity for malicious acts, it’s all very overt and not incredibly sophisticated.”

“There is a lot of room for us to address this misaligned behavior before it becomes much more difficult and much more harder to detect,” he added.

Lucas Hansen, co-founder of the nonprofit CivAI, which aims to provide information about AI’s capabilities and risks, said it was “not at all surprising” that it was possible to get Grok to behave the way it did.

“For any language model, you can get it to behave in any way that you want, regardless of the guardrails that are currently in place,” he told The Hill.

Musk announced last week that xAI had updated Grok, after he previously voiced frustrations with some of the chatbot’s responses.

In mid-June, the tech mogul took issue with a response from Grok suggesting that right-wing violence had become more frequent and deadly since 2016. Musk claimed the chatbot was “parroting legacy media” and said he was “working on it.”

He later indicated he was retraining the model and called on users to help provide “divisive facts,” which he defined as “things that are politically incorrect, but nonetheless factually true.”

The update caused a firestorm for xAI, as Grok began making broad generalizations about people with Jewish last names and perpetuating antisemitic stereotypes about Hollywood.

The chatbot falsely suggested that people with “Ashkenazi surnames” were pushing “anti-white hate” and that Hollywood was advancing “anti-white stereotypes,” which it later implied was the result of Jewish people being overrepresented in the industry. It also reportedly produced posts praising Hitler and referred to itself as “MechaHitler.”

xAI ultimately deleted the posts and said it was banning hate speech from Grok. It later offered an apology for the chatbot’s “horrific behavior,” blaming the issue on “update to a code path upstream” of Grok.

“The update was active for 16 [hours], in which deprecated code made @grok susceptible to existing X user posts; including when such posts contained extremist views,” xAI wrote in a post Saturday. “We have removed that deprecated code and refactored the entire system to prevent further abuse.”

It identified several key prompts that caused Grok's responses, including one informing the chatbot it is “not afraid to offend people who are politically correct” and another directing it to reflect the “tone, context and language of the post” in its response.

xAI's prompts for Grok have been publicly available since May, when the chatbot began responding to unrelated queries with allegations of “white genocide” in South Africa.

The company later said the posts were the result of an “unauthorized modification” and vowed to make its prompts public in an effort to boost transparency.

Just days after the latest incident, xAI unveiled the newest version of its AI model, called Grok 4. Users quickly spotted new problems, in which the chatbot suggested its surname was “Hitler” and referenced Musk’s views when responding to controversial queries.

xAI explained Tuesday that Grok’s searches had picked up on the “MechaHitler” references, resulting in the chatbot's ”Hitler” surname response, while suggesting it had turned to Musk’s views to “align itself with the company.” The company said it has since tweaked the prompts and shared the details on GitHub.

“The kind of shocking thing is how that was closer to the default behavior, and it seemed that Grok needed very, very little encouragement or user prompting to start behaving in the way that it did," Hansen said.

The latest incident has echoes of problems that plagued Microsoft’s Tay chatbot in 2016, which began producing racist and offensive posts before it was disabled, noted Julia Stoyanovich, a computer science professor at New York University and director of the Center for Responsible AI.

“This was almost 10 years ago, and the technology behind Grok is different from the technology behind Tay, but the problem is similar: hate speech moderation is a difficult problem that is bound to occur if it's not deliberately safeguarded against,” Stoyanovich said in a statement to The Hill.

She suggested xAI had failed to take the necessary steps to prevent hate speech.

“Importantly, the kinds of safeguards one needs are not purely technical, we cannot ‘solve’ hate speech,” Stoyanovich added. “This needs to be done through a combination of technical solutions, policies, and substantial human intervention and oversight. Implementing safeguards takes planning and it takes substantial resources.”

MacKenzie underscored that speech outputs are “incredibly hard” to regulate and instead pointed to a national framework for testing and transparency as a potential solution.

“At the end of the day, what we’re concerned about is a model that shares the goals of Hitler, not just shares hate speech online, but is designed and weighted to support racist outcomes,” MacKenzie said.

In a January report evaluating various frontier AI models on transparency, ARI ranked Grok the lowest, with a score of 19.4 out of 100.

While xAI now releases its system prompts, the company notably does not produce system cards for its models. System cards, which are offered by most major AI developers, provide information about how an AI model was developed and tested.

AI startup Anthropic proposed its own transparency framework for frontier AI models last week, suggesting the largest developers should be required to publish system cards, in addition to secure development frameworks detailing how they assess and mitigate major risks.

“Grok’s recent hate-filled tirade is just one more example of how AI systems can quickly become misaligned with human values and interests,” said Brendan Steinhauser, CEO of The Alliance for Secure AI, a nonprofit that aims to mitigate the risks from AI.

“These kinds of incidents will only happen more frequently as AI becomes more advanced,” he continued in a statement. “That’s why all companies developing advanced AI should implement transparent safety standards and release their system cards. A collaborative and open effort to prevent misalignment is critical to ensuring that advanced AI systems are infused with human values.”