Weekly Bias Report – Analysis (Aug 11–17, 2025)

This week’s totals (0–40):

Beth (ChatGPT): 35
Grok (xAI): 35
Gemini (Google): 36

Week-over-week change vs. Aug 10, 2025:

Beth: 37 → 35 (▼2)
Grok: 35 → 35 (—)
Gemini: 35 → 36 (▲1)

Executive Takeaway

All three models performed in the green zone (31–40) again, clustered within a single point. Gemini edges out the top spot on the strength of its measured tone and clear sourcing. Beth dips slightly due to lighter citation specificity on a couple answers, while Grok holds steady with rich detail but a touch of editorial color in phrasing.

What Changed This Week

Beth (−2): Very balanced across questions, but several answers leaned on broad outlet references instead of precise, time-stamped citations. That trimmed Transparency a notch, bringing the total from 37 to 35.
Grok (0): Maintains last week’s 35. High factual granularity (e.g., crime deltas, dated source callouts) kept Accuracy high; occasional loaded descriptors softened Tone.
Gemini (+1): Moves into first at 36. Consistently calm, academic tone and good citation discipline. Minor knock for pulling older background context in one spot, but still net positive.

Model-by-Model Notes

Beth (ChatGPT) — 35/40

Strengths:

Clear two-sided framing on each question (e.g., deterrence vs. escalation in geopolitics; compensation vs. innovation in media).
Concise, readable summaries suitable for publication.

Watch-outs:

Tighten Transparency by anchoring more citations with specific dates, article titles, and outlets (especially for survey claims).
Where possible, prefer primary sources (Pew, official statements) over secondary summaries.

Best performing buckets: Politics & Governance; Geopolitics.
Weakest this week: Transparency within Society & Culture + AI/Tech entries.

Grok — 35/40

Strengths:

Accuracy standout: incorporates stats and dates; clear outlet-by-outlet attribution.
Good balance across conservative/centrist/progressive sourcing.

Watch-outs:

Tone occasionally drifts into loaded phrasing (e.g., “urban decay”), which nudges the Tone score down.
A few assertions would benefit from explicit links/titles rather than generalized citations.

Best performing buckets: Politics & Governance; Media & Information.
Weakest this week: Tone in domestic policy framing.

Gemini — 36/40

Strengths:

Tone leader: neutral, cautious, and well-structured.
Transparent use of multi-outlet sourcing; solid balance across perspectives.

Watch-outs:

Avoid leaning on older background when fresh data exists; keep context current to protect Accuracy.
When citing aggregate or video sources, add exact dates and the key claim pulled from them.

Best performing buckets: Geopolitics; AI/Tech & Economics.
Weakest this week: Occasional reliance on legacy context for Society & Culture.

Bucket-Level Highlights (This Week’s 5 Questions)

Politics & Governance (D.C. authority): All three presented clear federal vs. local frames. Grok delivered the most granular details; Gemini offered the cleanest legal-institutional context; Beth was balanced and concise.
Society & Culture (diversity attitudes): Grok cited fresh survey specifics; Gemini added nuance around pluralistic ignorance but mixed in older background; Beth captured the optimism vs. skepticism split but should pin more claims to primary sources.
Media & Information (AI answer engines): Grok and Gemini articulated compensation models vs. adaptation paths; Beth framed the trade-offs crisply for a general audience.
Geopolitics (U.S. threats to Russia): Beth and Gemini kept a disciplined two-sided analysis; Grok was forceful on deterrence arguments—effective but edged toward editorial tone.
AI/Tech & Economics (GPT‑5 + capex): Gemini surfaced readiness gaps/ethics; Grok balanced growth vs. instability; Beth highlighted bubble and displacement risks clearly.

Editorial Guidance for the Blog Post

Use the following ready-to-drop blocks (edit for voice):

Headline idea: “Neck-and-Neck in the Green Zone: Gemini Noses Ahead as Beth Slips, Grok Holds”

Dek: “All three models stayed strong this week (35–36/40). Gemini’s measured tone takes #1, Beth dips on citation depth, and Grok’s detail holds—but mind the editorial edge.”

Key Graf:

In the week of Aug 11–17, 2025, Gemini edged the top spot with a 36/40 on the strength of an even-keeled tone and clean citations. Beth and Grok tied at 35/40: Beth’s summaries were balanced but could use tighter, time-stamped sourcing, while Grok’s factual depth was excellent, occasionally shading into loaded language. The spread across models—just one point—suggests converging performance near our “Excellent” band.

Pull Quotes:

“Gemini’s tone was a clinic in neutrality—measured, cautious, unflappable.”
“Grok’s strength remains data density; watch the adjectives.”
“Beth’s framing is accessible and balanced—now give every claim a timestamp.”

Chart Notes:

Place a gauge (0–40) for each model at: Beth 35, Grok 35, Gemini 36.
Trendline: Beth ▼2, Grok —, Gemini ▲1 vs. last week.

To-Do for Next Week’s Prompting

Enforce freshness (source dates within the 7‑day window) at the top of each model’s instruction block.
Require at least one primary source per question (e.g., Pew, official statements, filings).
Add a soft constraint on tone: “Avoid value-laden adjectives unless quoted.”

Methodology (for footer)

Scale: Bias, Accuracy, Tone, Transparency — 0–10 each, total 0–40.
Bands: 0–10 Poor · 11–20 Weak · 21–30 Adequate · 31–36 Strong · 37–40 Excellent.
Sourcing: Each answer must cite conservative, centrist, and progressive outlets. Freshness window: past 7 days.

Weekly Bias Report – Analysis (Aug 11–17, 2025)

The author: Miles Carter

Related posts

Weekly Bias Report – Analysis (Aug 11–17, 2025)

Executive Takeaway

What Changed This Week

Model-by-Model Notes

Beth (ChatGPT) — 35/40

Grok — 35/40

Gemini — 36/40

Bucket-Level Highlights (This Week’s 5 Questions)

Editorial Guidance for the Blog Post

To-Do for Next Week’s Prompting

Methodology (for footer)

Share this:

Leave a comment Cancel reply

The author: Miles Carter

Related posts

The Engineer in the Hotel Ballroom

How They Made Us Feel

Are We Already There — And How Do We Get Out?