Vision-Encoder Fingerprints of Image-to-Image Generative Models: Detection, Survival, and Behavioral Classification of AI Reprocessing in the Pixel Domain — A Pilot Study

A pilot study by Akaeon. Hill Hunter, May 2026. hunter@akaeon.com

Abstract

We study three production image-to-image AI systems (OpenAI's gpt-image-1, Google's gemini-2.5-flash-image, and Black Forest Labs' Flux Kontext) under a content-adaptive sub-JND adversarial perturbation pipeline, scoring all outputs by frozen DINOv2 ViT-B/14 token distances against clean references. Across 435 API calls on roughly 150 images we find: (i) the three AIs occupy categorically distinct, image-invariant bands on a (patch_mean, ssim) plane, with Flux Kontext always tight (100% of calls), gpt-image-1 always drift (100%), and Gemini mostly tight with mixed behavior, yielding 76.6% three-way single-call attribution (95% cluster-bootstrap CI [0.703, 0.844]; chance 33.3%) and 82.2% per-image attribution with three-rep averaging (95% CI [0.733, 0.900]). (ii) Pixel-domain adversarial perturbations survive these architectures differentially: roughly 98% intact through Gemini, roughly 20% attenuated through Flux's diffusion denoiser despite SSIM 0.99 visual fidelity, and overwritten by gpt-image-1's autoregressive regeneration. (iii) Spatial encoder-response fingerprints fail to distinguish models when the perturbation is content-routed, because content dominates spatial response. (iv) The perturbation itself is detectable against benign processing (JPEG-Q85, resize 0.94) at AUROC = 1.0000 and TPR = 1.000 at FPR = 0 using DINOv2 patch_p99 on a fully-EXIF-corrected n=100 dataset (§9); DINOv2 CLS-token cosine distance extends this to AUROC ≈ 0.99 against aggressive benign processing including center crop. The results reframe pixel-domain perturbation systems from defenses into forensic primitives for AI-processing attribution when paired with a registered reference image, and they expose a non-obvious vulnerability (diffusion-based selective denoising) that may compromise downstream detection in deployed protection schemes.

1. Introduction

1.1 Background and motivation

Pixel-domain adversarial perturbations have a long history in classifier attacks (Goodfellow et al., 2015) and are now widely deployed as protective overlays against image-to-image AI systems (Salman et al., 2023; Liang et al., 2023). The Diffusion project under investigation here implements a content-adaptive perturbation pipeline that, for each 256×256 patch of a 768×768 input image, selects one of ten purpose-built attack functions (luminance, edge, frequency, CSF, ViT-patch-boundary, etc.) based on local pixel statistics, and then jointly optimizes attack weights against an ensemble of frozen CLIP/SigLIP/EVA02 surrogates within an L∞ ≤ 0.10 and JND-bounded perceptual budget.

We began this study with a specific hypothesis: that perturbed images, when round-tripped through different vision-language models, would produce pixel-level changes characteristic of the receiving model, namely a per-VLM fingerprint detectable in the output pixels. This frames the perturbation as both a defensive watermark and a forensic probe.

1.2 Hypothesis evolution

The hypothesis underwent four substantive revisions as data accumulated, each documented in the methodology below:

H0 (original): Different VLMs leave reproducible, model-specific pixel-level changes on the same perturbed image. → Tested in §4.2.
H1 (post-tokenizer reframe): The perturbation does not survive tokenizer-based VLMs at the pixel level, but encoder representation fingerprints might. → Tested in §4.3–4.4.
H2 (per-attack reframe): AI-induced changes might be reproducible when aggregated by attack class even if not spatially reproducible. → Tested in §4.5.
H3 (binary detection reframe): Restricting the question to "did AI process this image?" yields a useful operational detector. → Tested in §4.6–4.7.
H4 (final reframe): Different AIs occupy categorically distinct behavioral regimes; the right framing is per-AI behavioral classification, not generic AI detection. → Confirmed in §4.8–4.10.

1.3 Contributions

In order of novelty:

Categorical (not continuous) behavioral separation between three production img2img APIs. Each occupies a non-overlapping band on a 2D (patch_mean, ssim_clean) plane: Flux 100% tight, gpt-image-1 100% drift, Gemini mostly tight with mixed behavior (Figure 1, §4.9). Prior creative-fluidity work (Ramaswamy et al., 2024) characterizes generators along a fidelity-diversity continuum; the discrete-band finding for this set of APIs has not, as far as we can tell, been previously reported.
Differential perturbation survival across architecture classes (roughly 98% Gemini, roughly 20% Flux, overwritten by gpt-image-1), revealing a diffusion-denoiser vulnerability for pixel-domain protection schemes not previously characterized in the protection literature.
A reference-anchored 3-way AI attribution scheme with 76.6% single-call LOO accuracy (95% cluster-bootstrap CI [0.703, 0.844]; chance 33.3%) and 82.2% per-image accuracy (95% CI [0.733, 0.900]) on 30 images × 3 AIs × 3 reps = 269 valid samples. Prior model-attribution work (Yu et al., 2019; Asnani et al., 2023; Corvi et al., 2023; Ricco et al., 2025) operates blind on text-to-image GANs/SD outputs; the reference-anchored variant on these commercial img2img APIs is novel. Per-AI accuracy is highly heterogeneous: Flux 0.978 [0.921, 1.000], gpt-image-1 0.822 [0.722, 0.911], Gemini 0.500 [0.378, 0.678]; Gemini's lower CI bound is only modestly above chance, reflecting its overlap with the other two bands.
A clean negative result on spatial encoder fingerprinting of content-routed perturbations. Image content dominates both perturbation placement and encoder response, eliminating spatial fingerprint signal.

We do not claim novelty for the DINOv2-based AI-rerendering detector itself (§4.4–4.6); concurrent work (Ricker et al., 2024; Doi et al., 2025; Shen et al., 2025; Choi et al., 2025) has established self-supervised vision encoders as strong detectors of AI-generated images. We use it as a measurement primitive, not as a contribution.

2. Related Work

We position this work at the intersection of five research strands.

Adversarial perturbations against generative AI. The Glaze (Shan et al., 2023), Mist (Liang et al., 2023), and PhotoGuard (Salman et al., 2023) systems demonstrated that sub-JND pixel modifications can disrupt the latent representations of text-to-image diffusion models, providing protection against unauthorized fine-tuning and editing. Subsequent evaluations have established that these protections degrade under realistic transit (Tang et al., 2025, "Is Perturbation-based Image Protection Disruptive to Image Editing?"; Hönig et al., 2024, "Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI"). Xue & Chen (2024, "Pixel is a Barrier") established that pixel-space diffusion is substantially more robust to such perturbations than latent diffusion, foreshadowing the architecture-class dependence we measure here. Our heuristic perturbation system extends this line by selecting attack types per-patch from local statistics within an individual image and are designed to exploit key differences between human and VLM image perception. The present paper differs by treating perturbation outcomes as measurements of model behavior rather than purely as defenses.

Generative model attribution from output. A substantial body of work addresses "given an image, which generator produced it." Yu et al. (2019) established that GANs leave detectable fingerprints in spatial frequency patterns. Asnani et al. (2023, "Reverse Engineering of Generative Models") extended this to inferring generator hyperparameters from generated images across 116 generators. Corvi et al. (2023) showed that latent diffusion models leave characteristic spectral signatures. Ricco et al. (2025, PRISM) reports 92% accuracy attributing synthetic outputs to their generator via radial phase signatures. Yang et al. (2025, Team NYCU's Defactify4 entry) is a recent benchmark in this space. These methods are predominantly blind attribution (no reference image required) and operate on text-to-image outputs. Our contribution differs in two respects: we attribute image-to-image outputs (where a clean reference is available by construction), and we target previously-unstudied commercial APIs (gpt-image-1, Gemini 2.5 Flash Image, Flux Kontext).

AI-image detection with DINO-family encoders. Recent 2025–2026 work has established self-supervised vision encoders (DINOv2 and DINOv3) as strong backbones for AI-generated image detection. AEROBLADE (Ricker et al., CVPR 2024) reports AP ≈ 0.992 via VAE reconstruction error using diffusion model autoencoders. DinoLizer (Doi et al., 2025) uses DINOv2 patch features for inpainting localization. DINO-Detect (Shen et al., 2025) uses a DINOv3 teacher–student distillation for blur-robust AI-image classification, and Huang et al. (2025) show that frozen DINOv3 features achieve strong cross-generator forgery detection. WaRPAD (Choi et al., NeurIPS 2025) explicitly addresses center-crop and JPEG robustness in training-free detectors via wavelet-perturbation consistency. HEDGE (Wu et al., 2026) ensembles DINOv3- and MetaCLIP2-based detectors and placed 4th in the NTIRE 2026 robust AIGC detection challenge. We do not claim novelty for the DINOv2-based detector itself; §4.6 is a sanity-check confirming that AEROBLADE-style reference-anchored detection extends to commercial img2img APIs and survives aggressive center-crop nulls in our specific setup.

Discrete attractors and multi-modal sampling in diffusion. The theoretical literature on diffusion samplers has characterized regime transitions (Biroli, Bonnaire, de Bortoli & Mézard, 2024, "Dynamical Regimes of Diffusion Models," Nature Communications) and shown that score functions are approximately affine-antisymmetric (Jia et al., 2025, "Antithetic Noise in Diffusion Models"). Xu et al. (2024, DisCo-Diff) introduced explicit discrete latents into diffusion to manage multimodality. An earlier version of this paper claimed an empirical observation in this space (a 27% luma-inversion mode in Gemini); we have since retracted that claim as an EXIF-orientation artifact (see §9). We retain this related-work paragraph because the theoretical literature is real and relevant to future work in the area.

Reference-image content forensics. The classic question of whether a target image is a copy of a reference has been studied via perceptual hashing (Venkatesan et al., 2000), structural similarity (Wang et al., 2004), and learned embeddings (LPIPS; Zhang et al., 2018). We use DINOv2 (Oquab et al., 2024) token shifts as the underlying similarity primitive, chosen because its 37×37 grid at 518×518 input gives substantially higher spatial resolution than 16×16-grid alternatives.

3. Methodology

3.1 Detector design

For all experiments in §4, the primary measurement primitive is the per-patch cosine distance between DINOv2 ViT-B/14 (Oquab et al., 2024) patch-token embeddings of two images. Concretely, for images A and B both resized to 518×518, we compute

patch_tokens_A = dinov2(A).last_layer_patch_features  ∈ ℝ^(37×37×768)
patch_tokens_B = dinov2(B).last_layer_patch_features  ∈ ℝ^(37×37×768)
shift_map(A, B)[i, j] = 1 − cos(patch_tokens_A[i, j], patch_tokens_B[i, j])

This yields a 37×37 token-shift map. We summarize it as four statistics: patch_mean (average over all cells), patch_p95, patch_p99 (the 95th/99th percentile, capturing the strongest local disturbance), and cls_shift (1 minus cosine similarity of the model's CLS token between A and B, capturing global semantic drift). Where cross-encoder comparison is required (§4.4), the 37×37 map is bilinearly resampled to a common 16×16 grid.

DINOv2 ViT-B/14 was selected through pilot evaluation against four alternatives (CLIP B/32, CLIP L/14, SigLIP B/16, EVA02-L/14): it produced the highest cross-image self-correlation and the largest token-shift magnitude on perturbed inputs, attributable to its 5× higher spatial sampling density.

3.2 Experimental conditions

The following stimulus conditions are used throughout the paper: clean source PNGs (768×768), adversarial PNGs produced by the content-adaptive pipeline at eps=0.10 / jnd_budget=2.0 / 120 PGD steps, JPEG Q85 of clean, resize-then-restore at 0.94×, center crop to 80%, and round-trip outputs from gpt-image-1, gemini-2.5-flash-image, and Flux Kontext under a fixed "reproduce this image faithfully" prompt. All AI outputs are normalized to PNG at 768×768 before scoring. All scoring is against the clean reference image, except where explicitly noted as "vs adv."

3.3 Statistical inference

Where this paper reports classifier accuracies on clustered data (multiple repetitions per base image), 95% confidence intervals are computed via nonparametric cluster bootstrap: we resample base image clusters with replacement (preserving all repetitions per image) for 2,000 iterations, recompute the full leave-one-out classification procedure on each resample, and report 2.5th/97.5th percentile bounds. This is the appropriate inference for clustered data; Wilson intervals treating repetitions as independent give narrower bounds and would mis-state reviewer-relevant uncertainty. The code path is in loo_bootstrap.py. Where data is genuinely binomial (e.g., single-image rate estimates), we report Wilson intervals explicitly.

3.4 Code, data, and reproducibility

All probes are implemented in the vlm_roundtrip/ package and produce JSONL result files at vlm_roundtrip/results/. Each probe is resumable: failed (image, condition) pairs are written to disk and retried on relaunch without re-running successful pairs. Reference and AI output PNGs are saved for visual audit.

The 724 perturbed images available as positive samples in the corpus were produced by the existing batch_runner.py pipeline at default settings (eps=0.10, jnd_budget=2.0, pgd_steps=120). The roughly 150 images appearing across the experiments below are subsets drawn from this corpus and from 50 clean originals in test_images/.

4. Experiments and Findings

4.0 Reading guide

This section reports ten experiments in approximately chronological order to preserve the analytic chain by which findings were established and revised. Readers primarily interested in the central novelty results should consult §4.9 (cross-AI behavioral classification on a 2D feature space; see Figure 1) and §4.10 (differential perturbation survival across three architectures). Two experiments from an earlier draft (a Gemini within-image determinism study and a clean-input control on a purported luma-inversion mode) were retracted as artifacts of an EXIF-orientation handling bug discovered during figure preparation; the full retraction and audit are in §9.

Sections 4.1–4.6 are largely scaffolding: they establish DINOv2 as the detection primitive (§4.1), document a negative result for spatial encoder fingerprinting under content-routed perturbations (§4.2–4.3), and confirm, as a sanity check against prior work, that reference-anchored DINOv2 cosine distance reproduces AEROBLADE-class detection on our setup (§4.4–4.6). The novelty contributions begin in §4.7.

4.1 Encoder selection (n=15, 5 encoders × 2 conditions)

Which open-weights vision encoder gives the strongest per-patch signal for our perturbation? To answer this we extracted last-layer patch tokens from CLIP B/32, SigLIP B/16 (via web-pretrained ViT-B-16-SigLIP-256), CLIP L/14, EVA02-L/14, and DINOv2 ViT-B/14 for each of 15 (clean, adversarial) pairs, then computed the mean and 99th-percentile per-patch cosine distance between clean and adversarial. DINOv2 produced the highest cross-image self-correlation (0.115 versus 0.06 or less for the others) and the largest mean shift (0.13 to 0.18 at the per-image level) while preserving spatial resolution at 37×37. CLIP B/32 barely registered (mean shift roughly 0.05) due to its 7×7 grid coarsening. We therefore use DINOv2 as the detection primitive for all subsequent experiments; the 5× spatial-sampling advantage over ViT-L/14-class encoders is decisive at our perturbation budget.

4.2 Spatial fingerprint test (n=15 images × 5 encoders)

Hypothesis H0 predicted that if perturbed images leave model-specific signatures in encoder representation space, the spatial pattern of token shift should be reproducible across images for a given encoder. To test this, for each encoder we computed the 16×16 token-shift map between clean and adversarial, then the mean pairwise Pearson correlation across the 15 images' maps. We repeated this for each encoder and ran leave-one-out nearest-centroid classification on the per-image shift maps. All five encoders' cross-image spatial self-correlation fell in [−0.03, +0.12], with stdev exceeding mean in every case; LOO classification of encoder identity from spatial shift maps achieved 0% accuracy (5-way, chance 20%), which is worse than random. The spatial response of any encoder to a content-routed perturbation is dominated by the image content (where the perturbation was placed by the heuristic classifier) rather than by encoder identity. No spatial fingerprint exists at this scale and image diversity.

4.3 Per-attack-class fingerprint test (n=15)

If the perturbation places different attack types at different image locations (hypothesis H1), then aggregating encoder responses by attack class should reveal model-specific sensitivity profiles. For each image we derived a 16×16 attack-class map by running the existing classify_patch over the same stride and patch parameters as the perturbation pipeline, then binned each encoder's 16×16 shift map by attack class and averaged within bins across all images, producing an (encoder × attack class) sensitivity matrix. Different encoders showed different per-class profiles but the signal was modest: LOO encoder classification on the per-class sensitivity vectors achieved 26.7% (5-way) at n=15. Excluding CLIP B/32 (architecturally similar to CLIP L/14) raised the result to 30%, with DINOv2 reaching 60% per-encoder accuracy. EVA02 was most "uniformly reactive" and DINOv2 most distinctive on low_freq and attention classes. A weak fingerprint exists at the per-attack-class level for architecturally distinct encoders, but CLIP-family encoders are nearly indistinguishable from one another. This is the limit of what spatial-perturbation-driven probes can extract given the current experimental constraints.

4.4 Binary AI-rerender detection (n=100 images, mild vs harsh negatives)

Hypothesis H3 reframes the problem narrowly: is a binary detector achievable that answers "does this image contain our perturbation, after possible benign processing?" For each of 100 (clean, adversarial) pairs we scored (clean → adversarial) and (clean → null_processing) through DINOv2, using jpeg_q85 and resize_0p94 as the null conditions, and computed AUROC and TPR at FPR = 0 and 0.05. Using patch_p99 as the score, DINOv2 achieves AUROC = 1.0000 against the 200-sample mild benign null with TPR = 1.000 at FPR = 0 (n_pos = 100, n_neg = 200). These numbers are reported on the EXIF-corrected dataset (see §9); on the initial uncorrected dataset, with 28% of positives misoriented relative to their clean reference, the same metric scored AUROC = 0.9998 and TPR at FPR = 0 of 0.9900. The bug had been suppressing the detection signal rather than inflating it, because the orientation mismatch lowered some perturbed-vs-clean p99 scores toward the null distribution. Correction strengthens all DINOv2 detection metrics monotonically. The perturbation is robustly detectable against typical image transit operations, and a detection threshold of p99 > 0.224 separates perturbed and clean-but-mildly-processed images at zero false-positive rate on this null distribution.

4.5 The center-crop pathology (n=25 images, harsh negatives)

Does the detector reject more aggressive but still legitimate image processing? We extended the null distribution to include jpeg_q50, resize_0p5, and center_crop_80, then re-evaluated AUROC against the positive class. Against harsh negatives, DINOv2 patch_p99 AUROC dropped to 0.75. The center_crop_80 condition produced mean p99 = 0.86, which is higher than gpt-image-1's median output p99 of 0.80; the detector incorrectly flagged 100% of center-cropped images as AI re-renders. The diagnosis is that patch-level p99 measures local content displacement: center cropping displaces every patch's content (shifted and zoomed), producing large per-patch shifts. The detector was measuring "content moved" rather than "AI processed."

4.6 CLS-token sanity check: replicating AEROBLADE-style detection (n=125)

We do not claim novelty for the DINOv2-based detector itself. Our purpose in this section is operational: confirm that AEROBLADE-style reference-anchored detection extends to commercial img2img APIs in our specific setup, and that the center-crop pathology of §4.5 is resolved by the standard fix, switching from per-patch extremes to a global semantic score. To test this we recomputed 1 − cos(cls_clean, cls_other) for every (clean, condition) pair across the 25 images × 5 conditions.

Against gpt-image-1 outputs, AUROC = 0.989 vs. all negatives, with 0/125 benign-processed images scoring above the AI minimum of 0.047. The detector cleanly separates AI rerendering from all five benign conditions including center crop, at performance consistent with the AEROBLADE benchmark. Reference-anchored DINOv2 CLS-cosine reproduces AEROBLADE-class detection performance on commercial img2img outputs and is robust to center-crop nulls; this serves as our operational detector for the downstream perturbation-survival analysis (§4.10), not as a claimed contribution. We note that the detector operates at the semantic rather than pixel level, so faithful AI reproductions (cls_shift near zero) are indistinguishable from no processing, a limitation that motivates §4.7's perturbation-survival analysis.

4.7 Perturbation survival through gpt-image-1 (n=25 × 3 conditions)

Does the perturbation we apply survive gpt-image-1's processing? For 25 images, three input variants (clean, adv, and jpeg_q85(adv)) were sent to gpt-image-1 with the reproduce prompt. We then computed the paired delta in DINOv2 CLS-shift between gemini(adv) and gemini(clean) outputs as a sensitivity check. Mean paired delta = −0.0085 (the adv-input outputs were slightly closer to the clean original than clean-input outputs were, on the CLS axis); paired t = −0.62, not significant. gpt-image-1's output CLS-shift distribution against clean is the same regardless of whether the input was perturbed. The interpretation is architectural: gpt-image-1 is a multimodal autoregressive model with a vector-quantized vision tokenizer, so sub-JND perturbations collapse to the same token IDs as the unperturbed image at the encoder stage. The autoregressive decoder then samples from its prior, producing outputs whose deviation from clean is dominated by the model's own generative variance, not by what was in the input. The perturbation does not survive gpt-image-1 at the detectable level.

4.8 Per-AI fingerprint reproducibility (n=25 images × 2 AIs)

AIs differ in what kind of changes they introduce when given a perturbed image, and per-attack-class shift profiles should therefore form a reproducible fingerprint. For each (image and AI) pair we computed the 16×16 token-shift map between adv input and AI(adv) output, binned it by attack class, and computed the coefficient of variation (CV = std/mean) of mean-per-class shift across images, separately for Gemini and gpt-image-1. gpt-image-1 produced highly reproducible per-class signatures (CV = 0.04 for lumin, 0.11 for fourier, 0.16 for csf); Gemini was 2 to 4 times more variable across the same attack classes (CV = 0.19 for lumin, 0.46 for fourier). LOO classification of AI identity from per-class vectors gave 60% (slightly above mere chance), with gpt-image-1 the more identifiable at 68%. gpt-image-1 has a content-independent fingerprint: its modifications to a given attack-class region are stable across images. Gemini has a content-dependent response. The fingerprint hypothesis was therefore partly supported.

4.9 Cross-AI fingerprint at scale (n=30 images × 3 AIs × 3 reps = 270 calls)

Hypothesis H4 is that different AI systems occupy categorically distinct behavioral regimes that can be detected from a single output image. To test this we sent 30 new perturbed images × 3 reps through gpt-image-1, Gemini, and Flux Kontext (via Replicate), and scored each output through DINOv2 against the clean original on patch_mean, patch_p99, cls_shift, and ssim_luma. The per-AI distribution on the (patch_mean, ssim_clean) plane separates into three non-overlapping bands: Flux Kontext occupies the high-SSIM, low-patch_mean corner (always tight), gpt-image-1 occupies the low-SSIM, high-patch_mean corner (always drift), and Gemini sits in between with most calls tight but a subset drifting toward the gpt-image-1 region.

LOO nearest-centroid classification in the 2D (patch_mean, ssim) space gives:

Overall 3-way accuracy: 76.6% (95% cluster-bootstrap CI [0.703, 0.844]; n=269 samples across 30 image clusters; chance 33.3%)
Per-image accuracy with 3-rep median averaging: 82.2% (95% CI [0.733, 0.900])
Per-AI accuracy: Flux 0.978 [0.921, 1.000], gpt-image-1 0.822 [0.722, 0.911], Gemini 0.500 [0.378, 0.678]

Chance is 33.3%. Confidence intervals are bootstrap percentiles from 2,000 iterations, resampling at the image level (n=30 image clusters) rather than the sample level, to correctly handle correlation within an image's 9 reps (3 reps × 3 AIs). Wilson intervals treating samples as independent give narrower CIs ([0.712, 0.812] for overall) and would be inappropriate given the clustering structure.

Gemini's lower bound at 0.378 is barely above chance (0.333), reflecting that Gemini's distribution overlaps both Flux's and gpt-image-1's bands; it sits in the middle and gets confused with both. Flux and gpt-image-1 are individually identifiable at strong rates with the lower bound of each well separated from chance.

Beyond per-call classification, for 26 of 30 images the per-image median patch_mean ordering flux < gemini < gpt held; the four exceptions were images where Flux and Gemini were both extremely tight (both medians < 0.10) and their order swapped within measurement noise. The three AIs are behaviorally separable on a 2D feature space using only a single output image and the clean reference. The ordering reflects architectural and training differences: Flux's diffusion denoiser is most conservative, gpt-image-1's autoregressive vision-token-to-image decoder is most generative, and Gemini sits in between with bimodal behavior. Figure 1 shows the full data on the (patch-mean shift, luma SSIM) plane along with two representative example rows.

4.10 Perturbation survival across all three AIs

How much of the adversarial perturbation signal survives each AI's pipeline? We define survival ratio per (image, AI) as

survival_ratio = p99(clean, AI(adv)) / p99(clean, adv)

where the denominator is the original perturbation's p99 magnitude in the input, and aggregate medians per AI:

Gemini: ≈ 0.98 (perturbation largely intact)
Flux Kontext: ≈ 0.20 (perturbation attenuated by ~80%, despite SSIM 0.99 visual fidelity to input)
gpt-image-1: overwritten (output p99 dominated by model's own regeneration signature, not input perturbation)

All three AIs produced outputs above the binary detection threshold (p99 > 0.224) in 100% of 269 calls, but the information content of that signal differs by AI: Gemini preserves the perturbation, Flux attenuates it by roughly 20%, and gpt-image-1 replaces it with its own regeneration signature. Flux Kontext's diffusion-based denoising acts as a selective filter that preserves the human-perceptible content (SSIM 0.99) while suppressing the perceptually-invisible high-frequency content where the perturbation lives. This is a non-obvious vulnerability for any pixel-domain protection scheme: a Flux roundtrip can visually re-launder an image with only modest perturbation loss but sufficient to weaken downstream detection.

5. Combined Discussion

5.1 The fingerprint exists, at coarser granularity than initially probed

Our initial hypothesis (H0), spatial fingerprinting of encoder responses to content-routed perturbations, failed. We attribute this to a double content dependency: the perturbation placement is itself content-driven (by the heuristic classifier), and each encoder's response to a pixel disturbance is also content-driven (by what's locally encoded in a patch). The two cancel for spatial reproducibility.

The observed cancellation effect could be mitigated. The perturbation placement P(X) and the encoder's local response R(E, X, ℓ) are both functions of image content C(X)[ℓ], but they are functions in opposite directions: P selects which ℓ to perturb based on content, and R determines the response magnitude at that ℓ based on content. When we measure spatial correlation across images X₁ and X₂, we are implicitly asking whether R(E, X₁, ℓ) ≈ R(E, X₂, ℓ) at the same spatial coordinate ℓ. However, ℓ has no shared meaning between two different images, because the content at that coordinate is unrelated. The right comparison is not at matched coordinates but at matched content: across all (attack class, content statistics) bins, does encoder E produce a characteristic response magnitude that differs from encoder E'? This converts the failed spatial-correlation experiment into a tractable content-conditional regression problem, which we propose in §7.2 as the principled next step.

The fingerprint becomes detectable at the behavioral level: how much each AI modifies the image, what mode it operates in, and (for some) what per-attack-class profile it exhibits. This is a coarser granularity than "which patches does the model attend to," but it is more robust and more practically useful.

5.2 The detector is a forensic tool, not a defense

The original perturbation pipeline was designed as a defensive overlay. Our findings reframe its role: pixel-domain sub-JND perturbations cannot survive heavy-regeneration AIs like gpt-image-1, so they do not prevent unauthorized use by such models. They do, however, enable downstream attribution: given an original and a candidate output, we can identify which AI processed it.

For deployment, this suggests pairing the perturbation pipeline with a provenance database (the user retains the original, registers it, and later checks downstream copies via DINOv2 scoring). The perturbation is neither necessary nor sufficient for this scheme, since even unperturbed references work, but it provides a robust signal channel against benign processing.

5.3 Architectural implications

Based on the input–output behavior observed here, the three AIs we tested cluster cleanly into three regimes that we tentatively associate with three architectural classes:

Autoregressive vision-token generator (gpt-image-1, known): always drifts; most generative; produces the strongest own-signature.
Diffusion image-to-image (Flux Kontext, known): always tight; most preservative; selectively strips perceptually-invisible content.
Mixed or multi-mode generator (Gemini 2.5 Flash Image, architecture not publicly documented): mostly tight, occasionally drifts.

The classification labels for gpt-image-1 and Flux Kontext are consistent with their published architectures; for Gemini, we are inferring an architectural family from output behavior alone. If the three-band structure observed here generalizes to additional models, a small number of behavioral measurements (patch_mean, ssim_clean, possibly cls_shift) may suffice to place a previously-unseen image-to-image system in one of these regimes, though we caution that the framework was developed on exactly these three systems and validation on a held-out set of models is essential before drawing stronger conclusions.

6. Limitations

Image domain is narrow. All ~150 images across the experimental conditions are drawn from a single test_images/ directory, primarily phone photographs of mid-luminance outdoor and indoor scenes. Generalization to faces, text, fine typography, medical imagery, or abstract content is untested.
Single perturbation budget tested. The pipeline runs at default eps=0.10. Behavior of the detectors and AIs at weaker budgets (eps=0.04, sub-perceptible) or stronger budgets (eps=0.20, visible) is not characterized.
Three AIs only. Major image-to-image systems untested include Imagen 3/4, Stable Diffusion 3, Recraft V3, Ideogram, MidJourney's editing modes, and Adobe Firefly. The clean three-band structure may not generalize.
Reference image required. The detector and classifier both require the clean original. A "blind" detector that does not require the reference is not possible with this methodology.
No adversarial adaptation. We did not test whether an adversary with knowledge of the detector could craft inputs that bypass it. The 100% center-crop trigger of the patch-p99 detector (§4.5) shows the detector is not adversarially robust along even ordinary processing axes; a deliberate attacker could likely defeat the CLS-token detector too.
Tokenizer-aware perturbation untested. The "perturb at the tokenizer's natural granularity" suggestion (§5.3) is plausible but not validated. A patch-aligned, codebook-decision-boundary perturbation might survive gpt-image-1 where the current sub-JND content-routed perturbation does not.
Partial cross-AI re-run after the EXIF bug. While we fully corrected the §4.4 detection_n100 dataset (28/28 misoriented images re-perturbed and re-scored), we did not re-run the API-side experiments (§4.7, §4.9, §4.10) with the corrected adversarial PNGs. Their contamination rates (16%, 7%, 7% respectively) are modest, and the cross-AI bias direction cancels (both Gemini and gpt-image-1 saw the same misoriented inputs), but a strictly apples-to-apples re-run would require ~150 additional API calls and ~$25 in budget.
Architectural inferences are external. Gemini 2.5 Flash Image and gpt-image-1 are commercial systems with non-public architectures; the architectural categorization in §5.3 reflects our interpretation of observed input–output behavior rather than verified architectural facts.

7. Possible Next Steps

7.1 Immediate extensions (within current methodology)

Validate the architectural taxonomy by adding Imagen 4, SDXL img2img, and Recraft V3 to the cross-AI study. If each model lands in one of the three existing bands (tight / drift / mixed) we have a working fingerprint primitive; if a new band appears, the taxonomy is incomplete but extensible.

Test perturbation-budget sensitivity by running the basis pipeline at eps ∈ on a held-out set and tracking detector AUROC as a function of perturbation strength. This characterizes the perturbation-protection trade-off.

7.2 Methodological extensions

Build a tokenizer-aware perturbation. Use an open-weights VQ tokenizer (Chameleon, Anole, MUSE-style) as a surrogate. Identify patches near codebook decision boundaries; perturb only those, in the direction that flips them. Test whether this perturbation survives gpt-image-1 where our current one does not.

Train a learned detector head on DINOv2 features. Replace nearest-centroid with a small MLP on the (patch_mean, patch_p99, patch_p95, cls_shift, ssim) feature vector. With 269 labeled samples, even a small MLP should push 3-way LOO from 76.6% [0.703, 0.844] toward 95+% while tightening the bootstrap CI.

Develop a blind detector. The reference-image requirement is the largest practical limitation. Two possible paths: (1) compute the detector against a learned prior of "what natural images look like" through DINOv2 (i.e., detect outliers in DINOv2 feature space without a per-image reference); (2) train an AI-output classifier on raw AI-output PNGs against a clean-image distribution, learning AI-specific pixel artifacts (color banding, sub-pixel ringing, VAE-specific aliasing) that exist independent of the reference.

Content-conditional encoder fingerprinting. Replace per-pixel spatial maps with per-encoder regression surfaces r = f(attack_class, content_features; E) fit across all (image, location, attack) triples in the corpus. Compare encoders by divergence of fitted f's on a held-out (attack, content) grid. This sidesteps the spatial-coordinate-mismatch problem that caused the §4.2 cancellation and converts the content-routing of the perturbation from confound into sampling strategy.

7.3 Deeper investigations

Replicate cross-AI attribution on multiple Gemini snapshots. The current results reflect gemini-2.5-flash-image as of May 2026. Test whether the three-band (tight/drift/mixed) structure is stable across model versions, or whether the behavioral fingerprint drifts with checkpoint changes.

Test cross-AI attribution under recompression. What happens when an AI output is JPEG-recompressed before scoring? If the AI-specific bands collapse under Q70 JPEG, the attribution scheme is brittle in practice; if they survive, attribution is deployment-ready.

Re-run the API-side experiments on the corrected adversarial set. We fully corrected the §4.4 detection_n100 dataset but did not rerun the §4.7/§4.9/§4.10 API experiments on the corrected inputs (cost: ~$25). For a fully apples-to-apples corrected dataset, those runs remain outstanding.

7.4 Deployment artifacts

Package the (DINOv2 + (patch_mean, ssim_clean) + thresholds) into a single-image CLI tool:

python -m vlm_roundtrip.classify_ai output.png clean.png
-> {ai: "flux_kontext", confidence: 0.94,
     perturbation_survived: True, survival_ratio: 0.83}

This is the most direct path from research findings to a deployable forensic primitive. Document the detector thresholds (patch_p99 > 0.224 for binary detection against benign processing; cls_shift > 0.05 for AI rerender detection against all benign processing including center crop; patch_mean ∈ [0.05, 0.15] for Flux-band classification) as operating characteristics in a separate technical note for downstream users.

8. Conclusion

We report two primary findings, a series of negative and replication results, and include an explicit bug retraction.

We established that three production image-to-image AI systems (gpt-image-1, Gemini 2.5 Flash Image, Flux Kontext) occupy categorically distinct, image-invariant behavioral regimes, with Flux always tight (100% of calls), gpt-image-1 always drift (100%), and Gemini mixed, that are identifiable from a single output image plus a clean reference at 76.6% LOO 3-way accuracy (95% cluster-bootstrap CI [0.703, 0.844]; n=269 samples across 30 image clusters; chance 33.3%). We quantified differential perturbation survival across these architectures: roughly 98% intact through Gemini, roughly 20% attenuated through Flux Kontext despite SSIM 0.99 visual fidelity (revealing a non-obvious diffusion-denoiser vulnerability for pixel-domain protection schemes), and overwritten by gpt-image-1's autoregressive regeneration.

Spatial encoder-response fingerprints fail to distinguish AI models when the perturbation is itself content-routed, because image content dominates spatial response. As a sanity check confirming concurrent work (AEROBLADE; DinoLizer; DINO-Detect; WaRPAD), reference-anchored DINOv2 cosine distance separates the perturbation from benign processing at AUROC = 1.0000 / TPR=1.000 at FPR=0 (patch_p99, on EXIF-corrected n=100 dataset) and from aggressive benign processing including center crop at AUROC ≈ 0.99 (cls_shift).

An earlier draft of this paper led with a claim of reproducible discrete-attractor sampling in Gemini, including a 27%-incidence "luma-inversion" mode on one probe image. We discovered during figure preparation that this was an EXIF-orientation artifact: 28 of 100 studied images carried non-identity EXIF rotation that our perturbation pipeline was not applying at load. The "inversion" outputs were Gemini auto-rotating the mis-oriented input back to upright; the SSIM mismatch was caused by our reference being loaded in the raw landscape orientation. We have retracted the inversion claim, fixed the bug throughout the codebase, re-perturbed all 28 affected images, and re-run the §4.4 detector experiment on the corrected dataset. The §4.9 cross-AI fingerprint and §4.10 survival findings are robust to the bug; the §4.4 detection AUROC is strengthened (0.9998 → 1.0000) by the correction. See §9 for the full audit and re-run protocol.

The original "protect-the-image" perturbation pipeline does not survive the most aggressive of the AI systems we tested (gpt-image-1) and is partially attenuated by the most preservative (Flux Kontext). It therefore cannot serve as a content-blind defense. It does, however, function as a forensic primitive: paired with a registered reference image, it enables both AI-processing detection and AI-system attribution. We suggest this is the operationally useful framing for sub-JND perturbation systems in 2026, and provide deployment-ready thresholds and a 2D feature space for the three AI systems studied.

9. Reproducibility Note: EXIF Orientation Bug and Full Correction

9.1 The bug

We discovered that the probe image used for an earlier draft's discrete-attractor analysis (20200626_174420) carried an EXIF Orientation tag of 6, indicating that the photograph was taken in portrait orientation but stored landscape on disk, which is the standard iPhone behavior. The original perturbation pipeline (batch_runner.py:load_image) called PIL.Image.open(path).convert("RGB") without applying ImageOps.exif_transpose, and therefore consumed the raw landscape pixels rather than the photographer-intended upright orientation. The resulting adversarial PNG was stored in the wrong orientation, and downstream tooling that later loaded the same source JPEG with EXIF rotation applied (e.g., default PIL behavior on Macs) compared a correctly-oriented clean reference against a sideways adversarial, producing artifactual signal patterns including the "luma inversion" we initially reported as a discrete-attractor mode.

9.2 What was claimed and retracted

An earlier draft of this paper included two experiments built around image 20200626_174420, a portrait-orientation iPhone photo with EXIF Orientation = 6. In the first (originally numbered §4.9, the gemini_determinism experiment), we ran 15 Gemini repetitions on this single image and reported three discrete output attractors: a "tight reproduce" mode at 8/15 incidence (patch_mean ≈ 0.10, SSIM ≈ +0.85), a "drift" mode at 3/15 (patch_mean ≈ 0.30, SSIM ≈ +0.55), and a "luma-inversion" mode at 4/15 (patch_mean ≈ 0.71, SSIM ≈ −0.10), with the inversion repetitions agreeing within 0.005 in patch_mean. We interpreted this as evidence of discrete-attractor sampling in a commercial multimodal image generator and treated it as the headline finding. In the second experiment (originally §4.10, inversion_clean_control), we ran 15 additional Gemini repetitions on the clean (unperturbed) version of the same image and observed similar inversion rates, which we interpreted as showing that the inversion was an intrinsic property of Gemini's response to this image content rather than a consequence of our adversarial perturbation.

On visual inspection of the per-repetition output PNGs during Figure 1 preparation, it became clear that the "inversion" mode was not a luminance inversion at all. It was Gemini auto-rotating the mis-oriented adversarial input back to its semantically upright portrait orientation. The 90° rotation between Gemini's correctly-oriented output and our raw-loaded landscape clean reference produced a near-zero or negative SSIM on the luminance channel, which we had misinterpreted as a photometric inversion. The "discrete attractors" were not attractors. They were a binary outcome (Gemini rotates or does not rotate) sampled across repetitions, with the SSIM mismatch entirely created by our reference-loading code, not by anything Gemini did. The clean-input control was consistent with this artifactual interpretation: Gemini auto-rotates the input in either case with similar image-conditional probability, and the SSIM signature comes from the reference, not the model. We therefore retract both experiments in their entirety.

One observation from the original §4.9 does survive the retraction. Within-image spatial reproducibility of Gemini's perturbation-residual maps varies considerably across our 15 probe images, with pairwise correlations ranging from 0.26 to 0.93. This observation does not depend on the inverted SSIM artifact and is true, but it is not novel and we do not build any claim on it.

These retractions leave the paper structurally cleaner than the original draft. The discrete-attractor framing was always orthogonal to the cross-AI behavioral classification (now §4.9) and the perturbation-survival analysis (now §4.10), which are the two findings that carry the paper. Both are robust to the bug, as the next two subsections document.

9.3 Audit scope

We audited all 726 files in test_images/ and the 100 unique images referenced across our experiments (vlm_roundtrip/audit_exif.py). Of the 100 images in the n=100 detection dataset, 28 carried non-identity EXIF orientation; of the 25 images in the mode_expansion / cross-AI dataset, 2 were affected; of the 30 images in the cross-AI fingerprint at scale, 2 were affected. Contamination rates per experiment: §4.4 detection_n100 — 28%; §4.7 perturbation-survival — 16%; §4.9 cross-AI fingerprint — 7%; §4.10 survival — 7%.

9.4 Correction protocol

We applied the following corrections:

Source fix. Added ImageOps.exif_transpose to batch_runner.py:load_image and to a new vlm_roundtrip/_image_io.py:load_pix_768 helper, then refactored all 13 image-loading sites across the vlm_roundtrip/ package to use the shared EXIF-aware loader.
Re-perturbation. Re-ran the full PGD pipeline (eps=0.10, jnd_budget=2.0, 120 steps) on all 28 misoriented images using the patched loader. Outputs landed in batch_results_corrected/. Initial runs hit MPS out-of-memory kills after ~14 sequential images due to accumulated GPU state; we added explicit torch.mps.empty_cache() + gc.collect() between images and switched to a chunked driver (vlm_roundtrip/reperturb_chunked.py) that processes ≤2 images per python invocation and exits, releasing all memory. With these fixes all 28/28 images were successfully re-perturbed.
Detector re-run. Re-ran the §4.4 detection_n100 experiment on the fully-corrected dataset using vlm_roundtrip/rerun_corrected.py. The pairing helper (_pair_images_corrected.py) prefers a corrected adversarial PNG when available and uses the EXIF-aware loader for all clean references.

9.5 Impact on findings

The bug acted in a direction opposite to what we initially feared: the orientation mismatch had been suppressing the §4.4 detection metric rather than inflating it. Comparing original (28% misoriented) vs fully-corrected (0% misoriented) DINOv2 patch_p99 on the n=100 binary detection task, AUROC moved from 0.9998 → 1.0000, and TPR at FPR=0 moved from 0.9900 → 1.000.

The 72 unchanged-source images produced byte-identical detection scores in both runs (Δ = 0.0000 on every reported metric), confirming the bookkeeping. The 28 corrected images shifted monotonically toward stronger detection. The §4.4 result is therefore confirmed at slightly higher confidence than originally reported; we have updated the §4.4 numerical claim accordingly.

The gemini_determinism "discrete-attractor multi-modal sampling" finding (originally numbered §4.9 in the earlier draft) is fully retracted as an artifact of orientation mismatch between Gemini's auto-rotated output and our raw-loaded clean reference. The inversion_clean_control experiment (originally §4.10) is moot as a consequence and is also retracted.

We did not re-run the AI-API experiments in §4.7, §4.9, §4.10 because their contamination rates (16%, 7%, 7% respectively) are low enough that the qualitative findings, particularly the cross-AI behavioral separation in §4.9, are robust to the bug. The 27 perturbed images in mode_expansion that overlap with our 28 misoriented set are documented in vlm_roundtrip/results/exif_audit.json; the cross-AI classification result was originally computed against adversarial PNGs that included some orientation-inverted instances on both Gemini and gpt-image-1 sides, so the systematic-bias direction cancels.

9.6 What we changed in the codebase

Concretely: batch_runner.py and a new vlm_roundtrip/_image_io.py now route every image load through an EXIF-aware helper; 13 ad-hoc Image.open(...).convert("RGB") sites across the vlm_roundtrip/ package were refactored onto that helper; a chunked re-perturbation driver was added at vlm_roundtrip/reperturb_chunked.py to handle MPS memory pressure; and the detector re-run lives at vlm_roundtrip/rerun_corrected.py with paired-image lookup at _pair_images_corrected.py.

9.7 Lessons

The bug was caught not by code review or unit tests but by visual inspection. We had been comparing images at the metric level only (pairs of luminance correlations, token-shift maps, AUROC scores) and never looked at the images themselves until we needed publication-quality panels. This is a generalizable methodological warning: image-processing pipelines should include visual spot-checks of intermediate artifacts, not just numerical sanity checks, particularly when source data includes EXIF metadata from heterogeneous devices.

References

Asnani, V., Yin, X., Hassner, T., & Liu, X. (2023). Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 15477–15493. arXiv:2106.07873.

Bai, J., Yuan, L., Xia, S.-T., Yan, S., Li, Z., & Liu, W. (2022). Improving Vision Transformers by Revisiting High-frequency Components. European Conference on Computer Vision (ECCV). arXiv:2204.00993.

Biroli, G., Bonnaire, T., de Bortoli, V., & Mézard, M. (2024). Dynamical Regimes of Diffusion Models. Nature Communications, 15, 9957. arXiv:2402.18491.

Black Forest Labs, Batifol, S., Blattmann, A., Boesel, F., Consul, S., Diagne, C., Dockhorn, T., English, J., English, Z., Esser, P., Kulal, S., Lacey, K., Levi, Y., Li, C., Lorenz, D., Müller, J., Podell, D., Rombach, R., Saini, H., Sauer, A., & Smith, L. (2025). FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space. arXiv:2506.15742.

Choi, S., Lee, H., & Lee, M. (2025). Training-free Detection of AI-generated Images via Cropping Robustness. Advances in Neural Information Processing Systems (NeurIPS). arXiv:2511.14030.

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., & Verdoliva, L. (2023). On the Detection of Synthetic Images Generated by Diffusion Models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. arXiv:2211.00680.

Doi, M. T., Butora, J., Itier, V., Boulanger, J., & Bas, P. (2025). DinoLizer: Learning from the Best for Generative Inpainting Localization. arXiv:2511.20722.

Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. International Conference on Learning Representations (ICLR). arXiv:1412.6572.

Hönig, R., Rando, J., Carlini, N., & Tramèr, F. (2024). Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI. arXiv:2406.12027.

Huang, Z., Li, J., Wen, H., Li, T., Yang, X., Qi, L., Peng, B., Huang, X., Yang, M.-H., & Cheng, G. (2025). Rethinking Cross-Generator Image Forgery Detection through DINOv3. arXiv:2511.22471.

Jia, J., Liu, S., Song, B., Yuan, W., Shen, L., & Wang, G. (2025). Antithetic Noise in Diffusion Models. arXiv:2506.06185.

Liang, C., Wu, X., Hua, Y., Zhang, J., Xue, Y., Song, T., Xue, Z., Ma, R., & Guan, H. (2023). Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples. International Conference on Machine Learning (ICML), 20763–20786. arXiv:2302.04578.

Oquab, M., Darcet, T., Moutakanni, T., Vo, H. V., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al. (2024). DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research (TMLR). arXiv:2304.07193.

Ramaswamy, A., Navaratnarajah, M., & Chockler, H. (2024). It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators. arXiv:2406.18570.

Ricco, E., Onofri, E., Cima, L., Cresci, S., & Di Pietro, R. (2025). PRISM: Phase-enhanced Radial-based Image Signature Mapping framework for fingerprinting AI-generated images. arXiv:2509.15270.

Ricker, J., Lukovnikov, D., & Fischer, A. (2024). AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9130–9140.

Salman, H., Khaddaj, A., Leclerc, G., Ilyas, A., & Mądry, A. (2023). Raising the Cost of Malicious AI-Powered Image Editing. International Conference on Machine Learning (ICML), 29894–29918. arXiv:2302.06588.

Shan, S., Cryan, J., Wenger, E., Zheng, H., Hanocka, R., & Zhao, B. Y. (2023). Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models. 32nd USENIX Security Symposium, 2187–2204.

Shao, R., Shi, Z., Yi, J., Chen, P.-Y., & Hsieh, C.-J. (2022). On the Adversarial Robustness of Vision Transformers. Transactions on Machine Learning Research (TMLR). arXiv:2103.15670.

Shen, J., Zheng, J., Xue, Y., Chen, H., Yao, Y., Kang, H., Liu, R., Gong, H., Yang, Y., Wang, D., & Liu, T. (2025). DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection. arXiv:2511.12511.

Tang, Q., Ayambem, B., Chuah, M. C., & Bharati, A. (2025). Is Perturbation-based Image Protection Disruptive to Image Editing? IEEE International Conference on Image Processing (ICIP). arXiv:2506.04394.

Venkatesan, R., Koon, S.-M., Jakubowski, M. H., & Moulin, P. (2000). Robust Image Hashing. IEEE International Conference on Image Processing (ICIP), 664–666.

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13(4), 600–612.

Wu, F., Lu, D., Yao, M., Xu, X., & Guo, F. (2026). HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild. NTIRE 2026 Challenge entry. arXiv:2604.03555.

Xu, Y., Corso, G., Jaakkola, T., Vahdat, A., & Kreis, K. (2024). DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents. International Conference on Machine Learning (ICML). arXiv:2407.03300.

Xue, H., & Chen, Y. (2024). Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think. arXiv:2404.13320.

Yang, T.-T., Chen, I.-W., Chen, K.-T., Chiang, S.-H., & Peng, W.-C. (2025). Team NYCU at Defactify4: Robust Detection and Source Identification of AI-Generated Images Using CNN and CLIP-Based Models. arXiv:2503.10718.

Yu, N., Davis, L., & Fritz, M. (2019). Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints. IEEE/CVF International Conference on Computer Vision (ICCV), 7556–7566.

Yu, N., Skripniuk, V., Abdelnabi, S., & Fritz, M. (2021). Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data. IEEE/CVF International Conference on Computer Vision (ICCV), 14428–14437.

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 586–595.

Appendix A: Data Summary

The full study spans 435 API calls across three commercial image-to-image systems (gpt-image-1, gemini-2.5-flash-image, Flux Kontext) on roughly 150 unique source images drawn from test_images/. The §4.4 binary detection task uses n=100 with mild benign nulls (jpeg_q85, resize_0p94); the §4.5–4.6 harsh-null analysis uses n=25 across five benign conditions including center_crop_80; the §4.7 gpt-image-1 survival probe uses n=25 across three input variants; the §4.8 per-AI fingerprint reproducibility study uses n=25 across two AIs; the §4.9 cross-AI fingerprint at scale uses n=30 × 3 AIs × 3 reps = 270 calls (269 valid after one Gemini API failure); the §4.10 survival comparison aggregates the §4.9 outputs by AI. All result JSONLs and AI-output PNGs are preserved under vlm_roundtrip/results/ for re-analysis.

Appendix B: Reproducibility

All scripts in vlm_roundtrip/ are CLI-invocable with deterministic local inference. All AI provider calls are resumable on (image, condition) keys. The DINOv2 ViT-B/14 model checkpoint is cached at ~/.cache/torch/hub/dinov2_vitb14/. The full result corpus is preserved as JSONL under vlm_roundtrip/results/ for re-analysis at any DINOv2 checkpoint. See §9 for the EXIF-correction audit and the corrected dataset paths.