Academic

SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

Yifan Li, Mehrdad Salimitari, Taiyu Zhang, Guang Li, David Dreizin · March 7, 2026 · 1 min read · 12 views

#eess.IV #cs.AI #cs.CV #cs.LG

arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable and stable optimization. A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection. SALIENT improves generative realism, as reflected by higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5). In a separate downstream evaluation, SALIENT-augmented training improves long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. Optimal synthetic ratios shift from 2x to 4x as labeled seed size decreases, indicating a seed-dependent augmentation regime under low-label conditions. SALIENT demonstrates that frequency-aware diffusion enables controllable, computationally efficient precision rescue in long-tail CT detection.

Executive Summary

This study introduces SALIENT, a novel mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. By leveraging frequency-aware objectives, SALIENT disentangles target and background attributes, enabling interpretable and stable optimization. The method improves generative realism and long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. SALIENT's controllable and computationally efficient approach rescues precision in long-tail CT detection, with optimal synthetic ratios shifting from 2x to 4x as labeled seed size decreases. This breakthrough has significant implications for medical imaging analysis, particularly in the detection of rare lesions.

Key Points

▸ SALIENT introduces a novel mask-conditioned wavelet-domain diffusion framework for controllable CT augmentation.
▸ Frequency-aware objectives enable interpretable and stable optimization, disentangling target and background attributes.
▸ SALIENT improves generative realism and long-tail detection performance, yielding disproportionate AUPRC gains.

Merits

Strength in Frequency-Aware Diffusion

SALIENT's frequency-aware objectives enable controllable and efficient precision rescue in long-tail CT detection, leveraging the advantages of diffusion models in medical imaging analysis.

Improvement in Generative Realism

SALIENT achieves higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5), demonstrating improved generative realism in synthetic augmentation.

Enhanced Controllability

SALIENT's controllable augmentation regime enables optimal synthetic ratios to shift from 2x to 4x as labeled seed size decreases, adapting to low-label conditions.

Demerits

Computational Complexity

SALIENT's wavelet-domain diffusion framework may require significant computational resources, potentially limiting its adoption in real-world medical imaging applications.

Limited Generalizability

The study focuses on long-tail CT detection, and it is unclear whether SALIENT's performance extends to other medical imaging tasks or modalities.

Expert Commentary

SALIENT's breakthrough has significant implications for medical imaging analysis, particularly in the detection of rare lesions and long-tail regimes. The method's controllable and computationally efficient approach rescues precision in long-tail CT detection, with optimal synthetic ratios shifting from 2x to 4x as labeled seed size decreases. This achievement addresses a critical challenge in medical imaging analysis and demonstrates the potential of diffusion models in this field. However, the study's focus on CT detection and the potential computational complexity of SALIENT's wavelet-domain diffusion framework limit its generalizability and adoption in real-world applications. Nevertheless, SALIENT's frequency-aware objectives and controllable augmentation regime offer a promising direction for future research in medical imaging analysis.

Recommendations

✓ Future research should explore the generalizability of SALIENT's performance to other medical imaging tasks and modalities.
✓ Developing more efficient and scalable wavelet-domain diffusion frameworks is crucial for large-scale medical imaging applications.

Sources

arXiv - cs.AI

SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

AI Commentary

Executive Summary

Key Points

Merits

Strength in Frequency-Aware Diffusion

Improvement in Generative Realism

Enhanced Controllability

Demerits

Computational Complexity

Limited Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.