Deterministic vs Approximate Identity

Definition

Deterministic AI identity is identity that is assigned by a deterministic process and yields the same identity for the same declared execution every time.

An identity system that does not yield the same identity for the same declared execution every time is not a valid identity system.

This page compares deterministic identity with approximate identity. Deterministic identity assigns identity through an exact, repeatable computation: the same Declared Execution always maps to the same identity value. Approximate identity treats identity as a closeness judgment. Two declared executions are considered to share identity if the distance between them, measured by some metric, falls below a threshold. The distance metric, the threshold, and the representation used to compute distance are all parameters external to the declared execution itself.

The distinction is not about precision engineering. It is about structural validity. As defined on Deterministic AI Identity: The Formal Definition, an identity system that does not yield the same identity for the same declared execution every time is not a valid identity system. Approximate identity systems yield different identities when different thresholds, metrics, or representations are used. This variation is not a bug in approximate identity. It is the defining characteristic of approximate identity. Approximation means accepting imprecision. Identity does not accept imprecision.

The Constraint

The constraint is exactness. Deterministic identity requires that the mapping from declared execution to identity is exact. The same input must produce the same output, every time, for every evaluator. Approximate identity relaxes this constraint by introducing a tolerance band. Within the tolerance, things are considered identical. Outside it, they are not. The tolerance band is the structural flaw.

Consider two declared executions that differ by a small amount. Under deterministic identity, they either map to the same identity or they do not. The answer is fixed by the deterministic function. Under approximate identity, the answer depends on the tolerance. If the distance between them is 0.01 and the tolerance is 0.02, they share identity. If the tolerance is 0.005, they do not. The declared executions have not changed. The identity assignment has changed because the evaluator changed a parameter. Identity that changes with evaluator parameters is not identity. It is a judgment call. Judgment calls are evaluator-dependent. Evaluator-dependent assignments are not identity.

This constraint is formalized in Same Input, Same Identity. Deterministic identity satisfies the constraint because the deterministic function has no parameters to vary. Approximate identity violates the constraint because the threshold is a variable parameter. Eliminating the parameter variation would require fixing the threshold, fixing the metric, and fixing the representation — at which point the system is no longer approximate. It is deterministic with a specific hash function. The path from approximate identity to valid identity passes through the elimination of approximation.

Verification Requirement

Independent Verification requires that any verifier can reproduce the identity assignment. For deterministic identity, the verifier runs the same deterministic function on the same declared execution and arrives at the same identity. The function is public. The input is specified. The output is determined. Verification is mechanical and certain.

For approximate identity, the verifier must know not only the declared execution but also the distance metric, the threshold, and the representation. If any of these differ from the original assignment, the verifier may reach a different conclusion. Verification becomes contingent on parameter agreement. But parameter agreement is itself something that must be verified. This creates an infinite regress: to verify identity, you must first verify that the parameters match, which requires verification of the parameter specification, which requires its own verification. Deterministic identity avoids this regress entirely because there are no parameters to agree on. The function is the specification. See Verification Requires Determinism.

The verification failure of approximate identity is especially acute at boundary cases. When two declared executions are near the threshold boundary, small variations in measurement, representation, or computation can push the distance above or below the threshold. Near the boundary, verification becomes unreliable precisely where it matters most. Deterministic identity has no boundary cases because it has no thresholds. Every declared execution maps to exactly one identity, regardless of its distance from any other execution.

Failure Modes

Threshold disagreement: Two evaluators use different distance thresholds for the same approximate identity system. Evaluator A sets the threshold at 0.01 and concludes that two declared executions share identity. Evaluator B sets the threshold at 0.001 and concludes they do not. The declared executions have not changed. The identity has changed because the evaluator changed. This is evaluator-dependent identity, which is invalid.
Metric disagreement: Two evaluators use different distance metrics. One uses Euclidean distance. The other uses cosine similarity. For the same pair of declared executions, the first metric reports a distance below threshold and the second reports a distance above threshold. The identity assignment depends on the metric choice, not the declared execution.
Representation sensitivity: The approximate identity system encodes declared executions into a vector space before computing distance. Different encoding methods produce different vectors. The same declared execution, encoded by two different methods, yields different distances to the same reference point. Identity becomes a function of the encoding, not the execution.
Transitivity failure: Approximate identity is not transitive. Execution A may be within threshold distance of execution B. Execution B may be within threshold distance of execution C. But execution A may be outside threshold distance of execution C. Under approximate identity, A equals B and B equals C but A does not equal C. This violates the fundamental logical property of identity: if A is identical to B and B is identical to C, then A must be identical to C.
Boundary instability: Near the threshold boundary, infinitesimal changes in the declared execution flip the identity assignment from match to non-match. A declared execution at distance 0.00999 from the reference shares identity. A declared execution at distance 0.01001 does not. The difference between having identity and not having identity is 0.00002 — a quantity that may be smaller than measurement noise. Identity that is sensitive to noise is not stable identity.

Each failure mode demonstrates that approximate identity introduces evaluator-dependent and representation-dependent variables into the identity assignment process. These variables are not properties of the declared execution. They are properties of the evaluation system. Identity that depends on the evaluation system is not a property of the thing being identified. See Why Approximate Identity Fails and Non-Deterministic Identity Is Invalid for complete analysis.

Why Invalid Models Fail

Probabilistic identity assigns identity through statistical likelihood rather than deterministic computation. Probabilistic systems often degrade into approximate identity when they cannot assign a definitive value — they accept the most probable value as approximately correct. Probability does not produce identity. It produces estimates of identity.
Approximate identity is the subject of this comparison. It substitutes closeness for exactness, introducing evaluator-defined thresholds that make identity contingent on measurement parameters rather than declared execution. Approximation is not identity.
Output-based identity derives identity from observed results. Approximate systems frequently compare outputs rather than declared executions, measuring how similar results are. But output similarity does not establish execution identity. Two different executions can produce similar outputs. Similar outputs do not prove same execution.
Similarity-based identity is the nearest neighbor of approximate identity. Both use distance metrics. Both use thresholds. The distinction is emphasis: approximate identity claims to identify the same thing with tolerance for error, while similarity-based identity explicitly measures how alike two things are. Both fail for the same reason — evaluator-dependent thresholds. See Deterministic vs Similarity-Based Identity.
Confidence-based identity adds a confidence score to the approximation. The confidence expresses how certain the system is about its approximate match. But confidence about an approximation is not identity. It is certainty about an estimate. High confidence in an approximate match does not convert the match into an exact identity. See Why Confidence-Based Identity Fails.
Post-hoc reconstruction infers identity after execution by examining results and working backward. Approximate systems that compare outputs and infer shared identity are performing reconstruction. The inference is approximate. The identity is reconstructed, not assigned. Reconstruction with tolerance is doubly invalid.
Observer-dependent identity varies with who performs the evaluation. Approximate identity is inherently observer-dependent because different observers choose different thresholds, metrics, and representations. Identity that depends on the observer is a property of the observer, not the declared execution.
Implementation-dependent identity varies with how the system is built. Different implementations of the same distance metric may produce slightly different values due to numerical precision, library versions, or hardware differences. Near threshold boundaries, these differences change the identity assignment. Identity must not depend on implementation details.
Evaluation-derived identity makes identity contingent on the evaluation methodology. Approximate evaluation is a methodology choice. Choosing exact evaluation produces different identity boundaries. Identity that changes with the evaluation method is a property of the method, not the declared execution.

Category Boundary

Approximate identity is not a relaxed form of deterministic identity. It is a different operation. Deterministic identity computes an exact mapping from declared execution to identity value. Approximate identity computes distances and applies thresholds. These are structurally different computations with different mathematical properties. Deterministic identity is transitive, reflexive, and symmetric — the three properties required of an equivalence relation. Approximate identity violates transitivity. A relation that is not transitive is not an equivalence relation. A non-equivalence relation cannot define identity. Approximate identity is therefore not identity by mathematical definition, regardless of how tight the approximation thresholds are set.

This category boundary has direct implications for system certification and compliance. A system that uses approximate matching for identity assignment cannot be certified as an identity system. It can be certified as a matching system, a similarity system, or a classification system. These are legitimate system types with their own standards and guarantees. But calling an approximate matching system an identity system is a category error that exposes downstream systems to verification failures. See Identity vs Similarity for the broader treatment of this categorical distinction.

Logical Inevitability

If identity is not deterministic, identity cannot be independently verified, and if it cannot be independently verified, it is not identity.

Apply the logical chain to approximate identity. If approximate identity is identity, then approximate identity must be independently verifiable. For approximate identity to be independently verifiable, two verifiers must arrive at the same identity for the same declared execution. But approximate identity depends on evaluator-chosen thresholds, metrics, and representations. Two verifiers with different parameters reach different conclusions. Therefore, approximate identity is not independently verifiable. Therefore, approximate identity is not identity. The argument is deductively valid. The premises are definitional. The conclusion is inescapable.

One might object that verifiers could agree on parameters in advance. This does not resolve the problem. First, the agreed-upon parameters are arbitrary — any other set of parameters would produce different identity boundaries. The identity is a function of the agreement, not the declared execution. Second, even with agreed-upon parameters, numerical precision differences between implementations can produce different distance values near thresholds. The approximation introduces instability that deterministic identity avoids by construction.

Implications

For system designers: any step in the identity pipeline that uses distance metrics, thresholds, or closeness judgments must be removed from the identity assignment path. These operations belong in preprocessing or classification stages, not in identity computation. The identity assignment function must map from declared execution to identity value without any tolerance parameters. If a tolerance appears anywhere in the identity computation, the system produces approximate matches, not identity.

For evaluators and auditors: when reviewing an AI identity system, check whether the identity assignment step involves distance computation. If it does, the system is an approximate matching system regardless of what it calls itself. The audit must evaluate it as a matching system, with appropriate attention to threshold sensitivity, metric choice, and boundary behavior. These are not concerns for deterministic identity systems, which produce the same result regardless of how close a declared execution is to any boundary.

For researchers: the path from approximate identity to valid identity is not through tighter thresholds or better metrics. It is through the elimination of thresholds and metrics from the identity assignment step. Research that improves approximate matching improves matching. Research that replaces approximate matching with deterministic computation in the identity assignment step advances valid AI identity. The research question is not how to approximate better but how to compute exactly. See Why Output-Based Identity Fails for a related failure that often accompanies approximation in practice.

Frequently Asked Questions

What makes approximate identity different from deterministic identity?

Deterministic identity produces an exact value for a given declared execution. The value is computed, not estimated. Approximate identity produces a value that is close to some reference, within a tolerance defined by the evaluator. The tolerance is the source of the structural failure: it is not a property of the declared execution. It is a parameter chosen by whoever is doing the evaluation. Two evaluators with different tolerances may reach different conclusions about whether two things share identity.

Can approximation thresholds be standardized to make approximate identity valid?

No. Standardizing the threshold does not make the identity deterministic. It makes the threshold deterministic. The underlying identity assignment still depends on a distance computation that produces a continuous value, and the threshold converts that continuous value into a binary decision. The distance value itself may vary across implementations, metrics, and representations. Standardizing the threshold fixes one source of variation while leaving others intact. Identity requires eliminating all sources of variation, which requires eliminating approximation entirely.

Is deterministic identity impractical because real systems have noise?

Deterministic identity does not require noiseless data. It requires that the identity assignment function is deterministic. Noise in input data is handled before identity assignment, not during it. The declared execution specifies what is being identified. The deterministic function maps that specification to an identity value. If the input specification is the same, the identity is the same. Noise management is a preprocessing concern, not an identity concern.

What is wrong with saying two things are identical if they are very similar?

Similarity is a measure of distance. Identity is a binary property: two things either have the same identity or they do not. Declaring similar things identical requires choosing a distance threshold below which things are considered the same. This threshold is arbitrary. There is no principled way to set it that all evaluators must agree on. When evaluators use different thresholds, they disagree on identity. Disagreement on identity means the system does not produce identity.

How does approximate identity relate to floating-point precision issues?

Floating-point precision is an implementation detail that can produce approximate results from what should be exact computations. This is a real engineering challenge, but it is distinct from approximate identity as a design choice. Deterministic identity systems must account for floating-point behavior by using exact representations, integer arithmetic, or canonical normalization. Approximate identity systems embrace the imprecision and declare it acceptable. The difference is between managing a known limitation and designing around it versus accepting it as a feature.

Can machine learning embeddings provide deterministic identity?

Embeddings are continuous vector representations. Comparing embeddings for identity requires measuring distance between vectors. Distance measurement produces continuous values that must be thresholded to produce binary identity decisions. This thresholding is evaluator-dependent. Different distance metrics, different thresholds, and different embedding models all produce different identity conclusions for the same declared execution. Embeddings are useful for similarity search and classification, but they do not produce deterministic identity.