Tone Dark
Tint
20 Adversarial · collusion, consensus, ground truth

When agents disagree. On purpose, or by accident.

The "everyone is friendly" assumption is comforting and wrong. Real agent systems have to handle agents whose goals don't line up:

Where the classic distributed-systems algorithms fit

Two ideas from regular distributed systems show up here: RAFT (a way to keep multiple servers in sync as long as they don't lie to each other) and BFT, short for Byzantine Fault Tolerance (a way to keep the system working even when some servers do lie). They aren't a perfect fit for AI agents, but they're useful.

RAFT: for the orchestrator infrastructure

BFT: for combining answers from multiple agents

What works better in practice

from dataclasses import dataclass
import numpy as np
from sklearn.cluster import AgglomerativeClustering

@dataclass
class AgentResponse:
    agent_id: str
    answer: str
    confidence: float       # 0..1, agent's self-rating
    evidence: list[str]     # urls, db rows, tool outputs
    embedding: np.ndarray   # pre-computed

def semantic_consensus(responses, sim_threshold=0.85):
    embeddings = np.stack([r.embedding for r in responses])
    clusterer = AgglomerativeClustering(
        n_clusters=None,
        distance_threshold=1 - sim_threshold,
        metric="cosine", linkage="average",
    )
    labels = clusterer.fit_predict(embeddings)

    # Score each cluster: weighted by confidence × evidence count
    scores = {}
    for r, label in zip(responses, labels):
        weight = r.confidence * (1 + len(r.evidence))
        scores[label] = scores.get(label, 0) + weight

    winning = max(scores, key=scores.get)
    members = [r for r, l in zip(responses, labels) if l == winning]
    total = sum(scores.values())

    return {
        "answer": members[0].answer,
        "support": scores[winning] / total,
        "agreeing_agents": [m.agent_id for m in members],
        "clusters": len(set(labels)),
        "requires_human": scores[winning] / total < 0.6,
    }

This is the same idea as classic Byzantine voting, but adapted for natural language: similarity instead of exact match, weighted votes instead of one-per-agent, groups of similar answers instead of identical-answer buckets.

RAFT for keeping your servers in sync. Meaning-based voting for combining agent opinions. Real-world checks for facts. These don't compete; they layer.
Further reading: Du et al., ICML 2024 introduced the idea of having multiple agents debate. Liu et al., ICLR 2025 showed that mixing model families in the debate helps the agents avoid getting stuck on the same wrong answer. On accidental adversarial situations: Lee et al., arXiv 2024 on how compromise spreads between agents, Shinn et al., NeurIPS 2023 on agent self-reflection, and the 2025 follow-up MAR, arXiv 2025 extending the idea to teams of agents.