Problem

Spectral-Doppler velocity depends on the beam-to-vessel angle $\theta$ through $f_d = 2 f_0 v \cos\theta / c$, and angle correction is set by hand. Patil & Anand (EMBC 2019) learn $\theta$ directly from a single grayscale B-mode carotid image — no color Doppler, no segmentation. This project rebuilds that pipeline from scratch, replicates it cleanly, explains why it works, and pushes the estimator as far as it will go.

Approach

  • One typed, test-first library (Keras 3 / JAX, pixi), with the model written once and the backend chosen per machine.
  • Orientation-preserving grid pooling instead of global average pooling (global pooling is partly rotation-invariant — wrong for an orientation target). This is the load-bearing design choice that makes a frozen backbone reproduce the paper.
  • Two sampling protocols behind a config flag: image-level sampling (the paper’s standard augmented-corpus protocol) and patient-level sampling (cross-subject, holding out whole volunteers) — two complementary lenses, each reported and each tuned to its own best.
  • Optuna TPE hyperparameter search against cached frozen features (each trial a shallow head fit; one extraction per backbone serves both protocols), then a stacked ensemble of the tuned backbones.
  • Clinical-grade, post-hoc evaluation, all Keras-free: split-conformal intervals, Bland–Altman, calibration curves, patient-level nested CV, test-time augmentation, a classical structure-tensor prior + fusion, and Grad-CAM.
  • Every figure is regenerated from results/ by script; the whole thing is reproducible with pixi run all.

Headline results

  • Replication: a frozen DenseNet201 + grid pooling reproduces Table I at 5.84% MAPE (3.77° MAE) — the fix was the pooling, not the backbone (it lifts the frozen model from ~14% to 5.84%).
  • Best estimator, image-level sampling: an Optuna-tuned 5-model ensemble reaches 2.79% MAPE / 1.96° MAE ($R^2$ 0.995) — better than the paper’s best single model.
  • Best estimator, patient-level sampling: the tuned ensemble reaches 8.53% MAPE / 5.93° MAE ($R^2$ 0.952) on the stricter cross-subject regime.
  • Architecture bake-off: frozen DenseNet201 beats ConvNeXt and EfficientNetV2 — newer is not better for small-data frozen transfer.
  • Clinical-grade: split-conformal 90% intervals of ±20.5° at 95% coverage; Bland–Altman +4.3° bias vs the single reference reading (method-vs-reference, not inter-observer — honestly flagged); test-time augmentation cuts per-image MAE 7.8° → 4.7°.
  • Honest about the ceiling: end-to-end fine-tuning and modern self-supervised encoders (DINOv2, USFM) are deferred to a CUDA box — documented, not hidden.