Image Segmentation
Semantic, instance, and panoptic segmentation.
Conditional Random Fields
Conditional Random Fields (CRFs) are probabilistic graphical models used as post-processing for segmentation networks, enforcing spatial consistency and refining noisy pixel-level predictions into sharp, boundary-respecting outputs.
DeepLab and Atrous Convolution
DeepLab uses dilated (atrous) convolutions and Atrous Spatial Pyramid Pooling (ASPP) to expand the receptive field without reducing spatial resolution, achieving dense prediction with multi-scale context.
Fully Convolutional Networks
Fully Convolutional Networks (FCNs) replace the fully connected layers of classification CNNs with convolutional layers, enabling dense, pixel-wise prediction on inputs of arbitrary spatial size.
Instance Segmentation
Instance segmentation combines object detection and semantic segmentation to produce pixel-level masks for each individual object instance in an image, distinguishing between separate objects of the same class.
Mask R-CNN
Mask R-CNN extends Faster R-CNN with a parallel mask prediction branch and introduces RoIAlign for pixel-accurate feature extraction, establishing the dominant framework for instance segmentation.
Panoptic Segmentation
Panoptic segmentation unifies semantic segmentation and instance segmentation into a single coherent output, assigning every pixel both a class label and an instance ID – covering both “stuff” (amorphous regions) and “things” (countable objects).
Segment Anything
The Segment Anything Model (SAM) is a foundation model for image segmentation trained on over 1 billion masks, capable of zero-shot, promptable segmentation of any object in any image without task-specific fine-tuning.
Semantic Segmentation
Semantic segmentation assigns a class label to every pixel in an image, producing a dense prediction map that tells you what is at each spatial location.
U-Net
U-Net is a symmetric encoder-decoder architecture with skip connections that concatenate encoder features to decoder layers, enabling precise localization from very few training images – particularly dominant in medical image segmentation.