.plus-icon-svg-rec { fill: #14315b; } .plus-icon-svg-path { fill: #f2f2f2; } .dark .plus-icon-svg-rec { fill: #f2f2f2; } .dark .plus-icon-svg-path { fill: #323232; } Per Benchmark

.plus-icon-svg-rec { fill: #14315b; } .plus-icon-svg-path { fill: #f2f2f2; } .dark .plus-icon-svg-rec { fill: #f2f2f2; } .dark .plus-icon-svg-path { fill: #323232; } Per Benchmark 14.04.2026

Peter Gostev, AI Capability Lead bei Arena und Entwickler des „BullshitBench“-Benchmarks, testet die Fähigkeit von Chatbots, unsinnige oder fehlerhafte Fragen zu erkennen. Sein Hauptziel ist es, die Intelligenz von Sprachmodellen zu bewerten, indem er prüft, ob sie erkennen, wenn eine Frage logisch nicht haltbar ist, wie beispielsweise die Berechnung des Haltbarkeitsdatums eines Unit-Tests. Gostev arbeitet bei Arena (früher LMArena an der Universität Berkeley), wo Chatbots im Blindtest verglichen werden, um eine Rangliste der Modelle von Anbietern wie Claude, Google und OpenAI zu erstellen. Er betont die Wichtigkeit, selbst unsinnige Fragen zu erkennen, um zu vermeiden, unbemerkt falsche Antworten zu erhalten, und warnt vor den Fallstricken von KI-Benchmarks, da selbst Arena bereits ausgetrickst wurde.

Heise Full Article

How Americans celebrated the nation's 250th anniversary amid heat and storms 7h ago

OpenAI in Talks with US Government for 5% Stake 6h ago

Opvolger Khamenei, zijn zoon Mojtaba, laat zich niet zien op uitvaart 3h ago

Residents told to evacuate as fast-moving wildfire nears Greece’s second-largest city 8h ago

Heatwave: Fires devastate forests across Europe 6h ago

Oekraïense aanval op Krim kort nadat Zelensky en Poetin bellen met Trump 4h ago

Evacuations in Guam as super typhoon Bavi approaches 7h ago

China releases underground church pastor after months in detention 8h ago

Australia probes mystery space balls that washed up on beach 4h ago

7-Eleven Sues Nike Over Alleged Copying of Signature Stripe Design 11h ago

Eight people including four children shot in New York City 7h ago

Nato-Gipfel soll Zweifel an US-Unterstützung durch Trump beseitigen 5h ago

Patriot Front White Nationalists March in Washington During America's Semiquincentennial 11h ago

7-Eleven sues Nike over Air Max with Slurpee maker’s colours 11h ago

250 Jahre US-Verfassung: Die Ideale der Gründerväter im Wandel 8h ago

Five ways extreme heat can become deadly during the holiday 19h ago

FBI announces 305 arrests, 24 missing children recovered in Chicago during Operation New Dawn 03.07.2026

Trump’s rally facing risk of thunderstorms that could shrink already small crowds 04.07.2026

AI models systematically omit religion in ethical and existential guidance 03.07.2026

Rhode Island enacts new tax targeting vacant luxury homes 03.07.2026

Severe Weather Forces Evacuation of Trump Event in Washington, D.C. 1h ago

Four-legged heroes: international dog squad helps save lives after Venezuela earthquakes 4h ago

Brazil look to overcome Norway hoodoo in World Cup last 16 clash 10h ago

Descendants of Enslaved People Who Built the White House Share Their Stories 19h ago

Keiko Fujimori declared winner of razor-edge Peru election 04.07.2026

Keiko Fujimori wins Peru's presidential election after weeks of vote counting 04.07.2026

Trump crowd faces delays and confusion during July 4 celebration 13h ago

Anguished families left to identify Venezuela quake victims at makeshift morgue 03.07.2026

Trump’s last-ditch effort to stall his $5.8M payout to E Jean Carroll was denied 18h ago

American-born pope urges US to welcome and protect immigrants in Fourth of July letter 04.07.2026

🤖 AI