Quality control of segmentation results

Quality control of segmentation results#

Why Segmentation Quality Matters in Biological Imaging#

Segmentation is a foundational step in microscopy image analysis, whether it’s to quantify cell morphology, assess protein localization, or classify phenotypes. Poor segmentation cascades into poor science:

Biological Inference Depends on Accuracy:#

Cell shape and area are used as proxies for health, cell cycle stage, or treatment effects.
Subcellular segmentation informs protein distribution and co-localization analysis.

Downstream Analysis Tasks Affected:#

Counting: Number of cells, nuclei, or organelles.
Morphometrics: Size, shape, and orientation distributions.
Spatial Context: Distance to neighbors, clustering, localization to boundaries.
Time-lapse Analysis: Tracking errors arise from poor initial segmentation.

Experimental Reproducibility and Interpretability:#

Segmentations must be robust across batches, stains, and acquisition settings.
Quality control is essential for reproducible quantitative biology.

Types of Segmentation Errors in Microscopy#

Over-Segmentation (Fragmentation)#

Single nucleus or cell is split into multiple smaller fragments.
Common in: Aggressive watershed or Cellpose when boundary signals are strong.
Impact: Inflates cell counts; distorts shape descriptors like solidity or eccentricity.

overseg

Under-Segmentation (Merging)#

Adjacent cells or nuclei are grouped into a single blob.
Common in: Dense tissues or images with overlapping/poor contrast.
Impact: Loss of individual cell-level information; affects downstream tracking and phenotype calling.

underseg

Spurious Detections (False Positives)#

Non-cell regions (debris, staining artifacts, air bubbles) incorrectly labeled as cells.
Impact: Skewed statistics, especially when looking at low-frequency cell types or rare events.

Missed Detections (False Negatives)#

True cells/nuclei are not detected at all.
Common in: Dim or out-of-focus regions, shadowed objects.
Impact: Underestimation in population size; missed biological insights.

Boundary Errors#

Poor delineation of the cell membrane or nuclear edge.
Impact: Affects quantification of membrane-localized proteins, shape analysis, or proximity-based measurements (e.g., cell-cell contact).

boundary error

Topological Errors#

Holes in the nuclear region, disconnected fragments, broken membranes.
Impact: Miscalculates features like convexity, number of lobes, or cell polarity.

topology error

Real-World Impact of QC Failures in Microscopy#

Biological Task	Example Error	Consequence
Cell Counting (e.g., DAPI)	Over-segmentation of elongated nuclei	Overestimated cell counts
Organelle Localization	Missed mitochondria in low-SNR regions	Incorrect spatial distribution assessment
Drug Response Screening	Artifacts misclassified as apoptotic bodies	False positives in treatment effect
Tissue Analysis (Histology)	Domain shift between scanners	Unusable predictions; poor reproducibility

Qualitative Evaluation#

Quantitative metrics can miss:#

Systematic biases (e.g., always missing small regions)
Localization errors
Visual artifacts
Useful during development, debugging, and annotation correction

Visual Inspection Techniques#

Overlays

Overlay predicted mask onto the original image (use color-coded transparency)
Helpful to assess boundary alignment and mis-segmentation

Example: cellpose overlay

Tools and Viewers#

Napari:

Python-based interactive viewer
Supports image layers + masks
Plugins for segmentation QC (e.g., napari-annotator, napari-skimage-regionprops)

Fiji/ImageJ:

Overlay images and ROIs
Histogram/statistics on masks

QuPath:

For histopathology; robust QC on large slides

3D Slicer / ITK-SNAP:

For 3D volumetric medical image segmentation review

Native Python Libraries:

Matplotlib, scikit-image, open-cv

Best Practices for Visual QC#

Randomly sample across dataset and classes
Use stratified samples for small object classes
Run qualitative QC especially after augmentation or domain adaptation
Always log QC outputs during training for reproducibility

Quantative Evaluation#

Quantitative evaluation provides objective numerical measures of how well a segmentation model performs. These are essential for:

Comparing models
Performing hyperparameter tuning
Ensuring generalizability

Pixel-level#

1. Dice Similarity Coefficient (DSC) / F1 Score Measures overlap between ground truth 𝐺 and prediction 𝑃.

\( DSC = \frac{|G| + |P|}{2|G∩P|} \)

Value range: 0 (no overlap) to 1 (perfect match)

2. Jaccard Index (Intersection over Union - IoU) Measures the intersection (overlap) over the union.

\(IoU= \frac{∣G∪P∣}{∣G∩P∣}\)

Often reported for multiple thresholds (e.g., IoU@0.5, IoU@0.75)

3. Pixel-wise Accuracy Accuracy of correctly classifying the pixel class.

\(Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\)

Not reliable for imbalanced datasets

Object-level and Instance Metrics#

True Positive (TP), False Positive (FP), False Negative (FN)

TP: predicted object matches a ground truth
FP: extra prediction
FN: missed object

Metric Examples:

Precision: \(\frac{TP+FP}{TP}\)

- Recall: \(\frac{TP+FN}{TP}\)

- F1 Score at object level

Mean Average Precision (mAP)

Considers overlap between matched instances using IoU thresholds (e.g., @0.5)

Code Tools for Evaluation#

scikit-learn: precision_score, recall_score, f1_score
monai.metrics: for HD, ASD, Dice, IoU
medpy.metric: specialized medical metrics
torchmetrics: ready-to-use PyTorch metrics

Coding Examples#

Qualitatie Evaluation#

import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.measure import label
from skimage.segmentation import mark_boundaries
from skimage.color import label2rgb
import numpy as np

# File names
images = ['1GRAY.tif', '2GRAY.tif']
masks  = ['1.tif',  '2.tif']

fig, axes = plt.subplots(nrows=2, ncols=4, figsize=(16, 8))

for i in range(len(images)):
    # Load image and mask
    img = imread(f'data/{images[i]}', as_gray=True)
    mask = imread(f'data/{masks[i]}', as_gray=True) > 0.5

    # Normalize image for blending
    img_norm = (img - img.min()) / (img.max() - img.min())

    # Create overlay
    labeled_mask = label(mask)
    overlay = label2rgb(labeled_mask, image=img_norm, bg_label=0, alpha=0.4)

    # Create boundary-marked image
    boundary = mark_boundaries(img_norm, mask.astype(int), color=(1, 0, 0))  # Red boundaries


    # Plot original, mask, overlay
    axes[i, 0].imshow(img, cmap='gray')
    axes[i, 0].set_title(f'Image {i+1}')
    
    axes[i, 1].imshow(mask, cmap='gray')
    axes[i, 1].set_title(f'Mask {i+1}')
    
    axes[i, 2].imshow(overlay)
    axes[i, 2].set_title(f'Overlay {i+1}')

    axes[i, 3].imshow(boundary)
    axes[i, 3].set_title(f'Boundary {i+1}')

# Clean up axes
for ax in axes.flatten():
    ax.axis('off')

plt.tight_layout()
plt.show()

../../_images/4c461f515d66a9c4c8aa92eb600d2b6525fa96ee7642a04da2e16385ad2995b9.png

Quantitative Evaluation#

import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.measure import label
from skimage.segmentation import mark_boundaries
from skimage.color import label2rgb
import numpy as np

# File names
images = ['1GRAY.tif', '2GRAY.tif']
masks  = ['1.tif',  '2.tif']
preds = ['1PRED.tif', '2PRED.tif']

fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(12, 8))

for i in range(len(images)):
    # Load image and mask
    img = imread(f'data/{images[i]}', as_gray=True)
    mask = imread(f'data/{masks[i]}', as_gray=True) > 0.5
    pred = imread(f'data/{preds[i]}', as_gray=True) > 0.5
    
    # Normalize image for blending
    img_norm = (img - img.min()) / (img.max() - img.min())

    
    # Plot original, mask, overlay
    axes[i, 0].imshow(img, cmap='gray')
    axes[i, 0].set_title(f'Image {i+1}')
    
    axes[i, 1].imshow(mask, cmap='gray')
    axes[i, 1].set_title(f'Mask {i+1}')
    
    axes[i, 2].imshow(pred, cmap='gray')
    axes[i, 2].set_title(f'Predicted {i+1}')


# Clean up axes
for ax in axes.flatten():
    ax.axis('off')

plt.tight_layout()
plt.show()

../../_images/af12ac2573ff3390e6c67be5c0524851cac45cfba50e027290dbacaee9ba37f9.png

import numpy as np
from skimage.io import imread

# Load masks (convert 0/255 to binary 0/1)
gt = imread('data/1.tif', as_gray=True) > 0.5
pred = imread('data/1PRED.tif', as_gray=True) > 0.5

# Flatten for metric calculation
gt_flat = gt.flatten()
pred_flat = pred.flatten()

# Dice Coefficient
intersection = np.sum(gt_flat * pred_flat)
dice = (2. * intersection) / (np.sum(gt_flat) + np.sum(pred_flat))
print(f"Dice Coefficient: {dice:.4f}")

# IoU
union = np.sum(np.logical_or(gt_flat, pred_flat))
iou = intersection / union
print(f"IoU: {iou:.4f}")

Dice Coefficient: 0.8110
IoU: 0.6821