SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable
arXiv:2602.23447v1 Announce Type: cross Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable and stable optimization. A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection. SALIENT improves generative realism, as reflected by higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5). In a separate downstream evaluation, SALIENT-augmented training improves long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. Optimal synthetic ratios shift from 2x to 4x as labeled seed size decreases, indicating a seed-dependent augmentation regime under low-label conditions. SALIENT demonstrates that frequency-aware diffusion enables controllable, computationally efficient precision rescue in long-tail CT detection.
Executive Summary
This study introduces SALIENT, a novel mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. By leveraging frequency-aware objectives, SALIENT disentangles target and background attributes, enabling interpretable and stable optimization. The method improves generative realism and long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. SALIENT's controllable and computationally efficient approach rescues precision in long-tail CT detection, with optimal synthetic ratios shifting from 2x to 4x as labeled seed size decreases. This breakthrough has significant implications for medical imaging analysis, particularly in the detection of rare lesions.
Key Points
- ▸ SALIENT introduces a novel mask-conditioned wavelet-domain diffusion framework for controllable CT augmentation.
- ▸ Frequency-aware objectives enable interpretable and stable optimization, disentangling target and background attributes.
- ▸ SALIENT improves generative realism and long-tail detection performance, yielding disproportionate AUPRC gains.
Merits
Strength in Frequency-Aware Diffusion
SALIENT's frequency-aware objectives enable controllable and efficient precision rescue in long-tail CT detection, leveraging the advantages of diffusion models in medical imaging analysis.
Improvement in Generative Realism
SALIENT achieves higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5), demonstrating improved generative realism in synthetic augmentation.
Enhanced Controllability
SALIENT's controllable augmentation regime enables optimal synthetic ratios to shift from 2x to 4x as labeled seed size decreases, adapting to low-label conditions.
Demerits
Computational Complexity
SALIENT's wavelet-domain diffusion framework may require significant computational resources, potentially limiting its adoption in real-world medical imaging applications.
Limited Generalizability
The study focuses on long-tail CT detection, and it is unclear whether SALIENT's performance extends to other medical imaging tasks or modalities.
Expert Commentary
SALIENT's breakthrough has significant implications for medical imaging analysis, particularly in the detection of rare lesions and long-tail regimes. The method's controllable and computationally efficient approach rescues precision in long-tail CT detection, with optimal synthetic ratios shifting from 2x to 4x as labeled seed size decreases. This achievement addresses a critical challenge in medical imaging analysis and demonstrates the potential of diffusion models in this field. However, the study's focus on CT detection and the potential computational complexity of SALIENT's wavelet-domain diffusion framework limit its generalizability and adoption in real-world applications. Nevertheless, SALIENT's frequency-aware objectives and controllable augmentation regime offer a promising direction for future research in medical imaging analysis.
Recommendations
- ✓ Future research should explore the generalizability of SALIENT's performance to other medical imaging tasks and modalities.
- ✓ Developing more efficient and scalable wavelet-domain diffusion frameworks is crucial for large-scale medical imaging applications.