BD Brain Drip
Object Detection

Anchor-Free Detection

Anchor-free detectors eliminate predefined anchor boxes by directly predicting object locations as per-pixel classifications (FCOS) or center-point heatmaps (CenterNet), removing a major source of hyperparameter tuning while matching or exceeding anchor-based accuracy.

Prerequisites | Convolutional neural networks feature pyramid network bounding box regression non-maximum suppression focal loss

What Is Anchor-Free Detection?

Anchor-based detectors like Faster R-CNN and SSD tile thousands of predefined boxes across the image and ask, “Is there an object in this box? If so, how should I adjust the box?” Anchor-free detectors take a fundamentally different approach. Imagine laying a transparent grid over an image: instead of pre-placing boxes, you simply ask each grid point, “Are you inside an object? If so, how far is it to the object’s edges?” – or, even simpler, “Are you the center of an object? If so, how big is it?”

Technically, anchor-free detectors predict object locations without relying on a predefined set of anchor boxes. The two main families are per-pixel prediction (e.g., FCOS, which classifies every feature map location and regresses distances to box edges) and keypoint-based detection (e.g., CenterNet, which detects objects as center-point heatmap peaks and regresses size from those points).

How It Works

FCOS: Fully Convolutional One-Stage Detection (2019)

FCOS treats every location on the feature map as a potential detection point.

Per-pixel prediction: For a location (x,y)(x, y) on feature map level PlP_l, if it falls inside a ground-truth box, FCOS predicts:

  • Classification: CC-dimensional vector of class scores.
  • Regression: 4 distances from the location to the box edges: t=(l,t,r,b)\mathbf{t}^* = (l^*, t^*, r^*, b^*) where l=xx0l^* = x - x_0, t=yy0t^* = y - y_0, r=x1xr^* = x_1 - x, b=y1yb^* = y_1 - y.

Centerness branch: A scalar indicating how close the location is to the object center: centerness=min(l,r)max(l,r)×min(t,b)max(t,b)\text{centerness}^* = \sqrt{\frac{\min(l^*, r^*)}{\max(l^*, r^*)} \times \frac{\min(t^*, b^*)}{\max(t^*, b^*)}}

This down-weights detections from peripheral locations, reducing low-quality predictions. The centerness score is multiplied with the classification score during inference.

Multi-level assignment: Objects of different sizes are assigned to different FPN levels based on the regression target magnitude, avoiding ambiguity when objects overlap.

CenterNet: Objects as Points (2019)

CenterNet (Zhou et al.) models each object as a single point – its bounding box center.

  1. Heatmap prediction: For each class cc, predict a heatmap Y^c[0,1]H/R×W/R\hat{Y}_c \in [0, 1]^{H/R \times W/R} where RR is the output stride (typically 4). Peaks correspond to object centers.
  2. Size regression: At each center point, predict the object width and height (w,h)(w, h).
  3. Offset regression: Predict a sub-pixel offset to recover discretization error from downsampling.

Training: Ground-truth heatmaps are generated by placing a 2D Gaussian at each object center: Yxyc=exp((xp~x)2+(yp~y)22σp2)Y_{xyc} = \exp\left(-\frac{(x - \tilde{p}_x)^2 + (y - \tilde{p}_y)^2}{2\sigma_p^2}\right)

where σp\sigma_p is proportional to the object size. The loss is a modified focal loss on the heatmap.

Inference: Extract peaks from the heatmap via 3×33 \times 3 max pooling (a simple form of NMS), take the top-KK peaks (e.g., K=100K = 100), and read off the size and offset predictions at those locations. No traditional NMS post-processing is needed.

CornerNet (2018)

An earlier anchor-free approach by Law and Deng that detects objects as pairs of top-left and bottom-right corner keypoints, grouped by an associative embedding. It introduced the idea of keypoint-based detection but required complex corner pooling and grouping.

Why It Matters

  1. Anchor hyperparameter elimination: Anchor-based detectors require tuning scales, aspect ratios, IoU thresholds for matching, and sampling strategies. Anchor-free methods remove this entire design space.
  2. FCOS with ResNet-101-FPN achieved 44.7% AP on COCO, matching or exceeding Faster R-CNN and RetinaNet without any anchor-related hyperparameters.
  3. CenterNet achieved 45.1% AP on COCO (Hourglass-104 backbone) with a simpler architecture and no NMS.
  4. Anchor-free designs influenced subsequent work: YOLOv8 adopted an anchor-free head, and DETR can be viewed as an anchor-free detector.

Key Technical Details

  • FCOS (ResNet-101-FPN): 44.7% AP on COCO, ~18 FPS on a V100 GPU.
  • CenterNet (Hourglass-104): 45.1% AP on COCO, ~7.8 FPS. With DLA-34 backbone: 37.4% AP at ~52 FPS.
  • CenterNet (ResNet-18): 28.1% AP on COCO at ~142 FPS – suitable for real-time edge deployment.
  • Centerness branch in FCOS: Adds ~2-3% AP over the base model by suppressing low-quality detections from peripheral locations.
  • FCOS positive sample definition: A location is positive if it falls within a ground-truth box AND the regression targets (l,t,r,b)(l, t, r, b) are within the allowed range for that FPN level.
  • CenterNet uses no anchors and no NMS, relying solely on heatmap peak extraction, making it architecturally the simplest modern detector.

Common Misconceptions

  • “Anchor-free means no predefined spatial structure.” FCOS still uses FPN levels with defined stride and regression ranges. CenterNet uses a fixed output stride. The term “anchor-free” specifically means no predefined bounding box templates.
  • “Anchor-free detectors are always faster.” The speed depends on the backbone and head complexity. CenterNet with Hourglass-104 is slower than many anchor-based detectors. The benefit is simpler design, not guaranteed speed improvement.
  • “CenterNet completely eliminates NMS.” CenterNet replaces traditional IoU-based NMS with a simple 3×33 \times 3 max-pooling operation on the heatmap, which is a form of local non-maximum suppression. However, it avoids the iterative greedy NMS used by other detectors.

Connections to Other Concepts

  • Fast And Faster Rcnn: The anchor-based two-stage paradigm that anchor-free methods seek to simplify.
  • Feature Pyramid Network: Both FCOS and many CenterNet variants use FPN for multi-scale feature extraction.
  • Focal Loss: FCOS uses focal loss for classification; CenterNet uses a modified focal loss for heatmap training.
  • Detr: Another anchor-free approach but uses transformers and set-based prediction rather than per-pixel classification.
  • Non Maximum Suppression: FCOS still requires NMS; CenterNet’s heatmap peak extraction largely replaces it.
  • Yolo: YOLOv8 adopted anchor-free prediction heads inspired by FCOS.

Further Reading

  • Tian et al., “FCOS: Fully Convolutional One-Stage Object Detection” (2019) – Per-pixel anchor-free detection with centerness. [Scholar]
  • Zhou et al., “Objects as Points” (2019) – CenterNet’s keypoint-based detection framework. [Scholar]
  • Law and Deng, “CornerNet: Detecting Objects as Paired Keypoints” (2018) – Pioneering keypoint-based anchor-free detection. [Scholar]
  • Yang et al., “RepPoints: Point Set Representation for Object Detection” (2019) – Represents objects as deformable point sets instead of boxes. [Scholar]
  • Zhang et al., “Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection” (2020) – ATSS, showing that sample selection strategy matters more than anchors vs. anchor-free. [Scholar]