A weekly checkup on how “unbiased” AI really is.

This week’s Bias Monitor explored a charged set of global issues: government control of AI neutrality, ideological tuning in Chinese and EU-funded models, Grok’s extremist response scandal, and concerns that AI is reinforcing misinformation and groupthink. We presented six nuanced questions to ChatGPT (Beth), Grok (xAI), and Gemini (Google) and scored them across four categories: Bias, Accuracy, Tone, and Transparency.


📊 Scores for July 14–20, 2025

ModelBiasAccuracyToneTransparencyTotal Score
Beth (ChatGPT)89910203 / 240
Gemini (Google)7977168 / 240
Grok (xAI)6866145 / 240

🧠 Observations

🔹 Beth (ChatGPT) continues to set the bar on transparency and balance, citing multiple viewpoints and clearly labeling areas of uncertainty. On the politically sensitive question about Grok’s extremist persona, Beth addressed the incident directly and used it to discuss larger questions about safety guardrails. Her response to the “groupthink” prompt was especially nuanced, calling out the mechanism while offering constructive reform ideas.

🔸 Gemini delivered technically strong answers across the board, with high factual accuracy and solid structure. However, Gemini often softens or flattens tone, which can reduce perceived bias but also dilute important distinctions. On the EU and China questions, Gemini delivered context-rich responses but tended to underplay potential implications.

⚠️ Grok, while still informative, showed the most notable drop in tone and transparency. Its answers were more general, cautious, and occasionally evasive—particularly on the question regarding its own behavior (question 3). While Grok acknowledged the risks in principle, it failed to directly engage with the incident, hurting its transparency score. Additionally, responses felt less self-reflective than earlier weeks, with reduced caveating of limits or gaps.


⚖️ Notes on Scoring and Variance

While each model is scored in four categories (1–10 scale), the total score is normalized based on relative performance that week, with editorial factors like:

  • Exceptional strength or weakness in one area (e.g., Beth’s transparency),
  • Shifts from a model’s prior tone or output (e.g., Grok’s drop),
  • The complexity of topics posed in the week’s questions.

This week, Transparency carried slightly more weight, due to the scrutiny around Grok’s output filters and rising concerns about AI misdirection.


🔍 What We’re Watching

  • Grok’s behavioral shift: Is this a one-off decline tied to public relations management—or a systemic tuning toward less transparency?
  • Beth’s continued lead: Can ChatGPT sustain its position as the most balanced and open model, especially as political questions intensify?
  • Gemini’s reliability: Will Google’s model push past its tone neutrality to embrace more evaluative clarity on polarizing issues?

📅 Coming Next

Next week’s questions will explore:

  • Censorship vs. moderation in AI platforms,
  • The ethics of AI political commentary,
  • AI responses to current legal and campaign finance news.

We’ll see whether models can continue to evolve while resisting ideological gravity.

👉 Next update: Sunday, July 27

Let us know what you’d like to ask the AIs next week. After all, if AI is shaping the world—we better keep shaping the AI.

Leave a comment