6 Bayesian Optimization of Segmentation Thresholds
6.1 Introduction
After preparing and partitioning our dataset in Chapter 5, we now face a critical challenge: finding the optimal threshold values for cell segmentation. Rather than using fixed, predetermined thresholds or exhaustively searching through all possibilities, we’ll employ Bayesian Optimization to intelligently search the threshold parameter space.
In this chapter, we:
- Learn how Bayesian Optimization works through intuitive examples and visualizations
- Define the optimization problem: minimizing segmentation error on the validation set
- Implement Bayesian Optimization to find optimal threshold values (nucleus minimum and cytoplasm maximum)
- Apply optimized thresholds to the test set
- Comprehensively evaluate results with metrics and visualizations
6.2 Parameters, Pre-parameters, and Hyperparameters
Before diving into the implementation, let’s clarify the different types of parameters we’ll encounter in this pipeline. Understanding this distinction is crucial for proper experimental design.
6.2.1 Pre-parameters: Image Processing Pipeline
These are parameters that control the preprocessing pipeline—the transformation applied to images before they reach the prediction model. They affect the quality and characteristics of the predictions themselves.
Pre-parameters in a typical segmentation pipeline:
- Denoising (Non-Local Means / NLM)
h: Filter strength (higher = more denoising, but loses detail)templateWindowSize: Size of template patchsearchWindowSize: Size of search area
- Unsharp Masking
sigma: Gaussian blur sigma for the unsharp maskstrength: Amount of sharpening (0.5 - 2.0)threshold: Pixel difference threshold to apply sharpening
- Gaussian Blur
sigma: Standard deviation of the Gaussian kernelkernelSize: Size of the convolution kernel
- Morphological Operations
kernelSize: Size of the structural elementkernelShape: Shape (rectangular, elliptical, cross)iterations: Number of dilation/erosion iterations
- Contrast Enhancement
clipLimit: Threshold for contrast limiting (CLAHE)tileGridSize: Size of grid for local histogram equalization
- Normalization
targetMean: Desired mean pixel intensitytargetStd: Desired standard deviation
Key point: These pre-parameters affect the input to the thresholding step. Poor preprocessing choices here can make threshold optimization impossible—garbage in, garbage out.
6.2.2 Parameters: Threshold Optimization
These are the parameters being directly optimized by Bayesian Optimization. They are applied after predictions are generated.
Parameters in Chapter 6:
- Nucleus Minimum Threshold (
nucleus_min)- Range: [0.3, 0.7]
- Meaning: Minimum probability to classify a pixel as nucleus
- Cytoplasm Maximum Threshold (
cytoplasm_max)- Range: [0.6, 0.95]
- Meaning: Maximum probability threshold for cytoplasm classification
Terminology note: Should we call these “parameters”? - Yes, “parameters” is appropriate here. They parameterize the decision boundary (the threshold function). - They’re distinct from “hyperparameters” because they’re not part of the training procedure—they’re part of the inference (prediction application) procedure. - They’re distinct from “pre-parameters” because they operate on already-generated predictions, not on raw images. - Alternative names: “post-processing parameters” or “decision threshold parameters”
6.2.3 Hyperparameters: Bayesian Optimization Configuration
These parameters control the optimization procedure itself—how Bayesian Optimization searches the parameter space. They don’t directly affect segmentation quality, but rather control the search strategy.
Hyperparameters for Bayesian Optimization:
- Gaussian Process Kernel Parameters
length_scale: Controls how quickly correlation decays with distance- Low (0.05): Assumes function changes rapidly
- High (0.5): Assumes function is smooth
kernel_variance: Overall scale of the kernel
- Acquisition Function Parameters
kappa: Exploration vs. exploitation trade-off- Low (0.5): Exploit known good regions
- High (3.0): Explore uncertain regions
- Higher kappa early on, lower kappa later is common
- Search Strategy
n_initial: Number of random initial evaluations (default: 5)n_iterations: Number of Bayesian optimization iterations (default: 15)n_restarts: Number of L-BFGS-B restarts when optimizing acquisition function (default: 5)
- Optimization Algorithm Parameters
optimizer: Which algorithm to use for acquisition maximization (L-BFGS-B, differential evolution, etc.)tolerance: Convergence tolerance for the optimizer
Key distinction: Changing kappa doesn’t change what thresholds are optimal—it only changes how fast Bayesian Optimization finds them.
6.3 Summary: The Three Parameter Types
| Type | Controls | Example | Optimized By | Affects |
|---|---|---|---|---|
| Pre-parameters | Image preprocessing | Gaussian blur sigma | Manual tuning on validation set | Prediction quality |
| Parameters | Threshold decision boundaries | nucleus_min, cytoplasm_max | Bayesian Optimization (on validation set) | How predictions are converted to labels |
| Hyperparameters | Optimization procedure | kappa, length_scale, n_iterations | Manual or grid search | Speed/efficiency of finding optimal parameters |
6.4 Introduction to Bayesian Optimization
6.4.1 The Problem
We have a black-box function that we want to optimize:
\[f(\theta) = \text{Segmentation Error given threshold parameters } \theta\]
where \(\theta = (\theta_{\text{nucleus}}, \theta_{\text{cytoplasm}})\) are our two threshold values.
Key challenges: - Each function evaluation is expensive (requires processing all validation images) - The function landscape is unknown (black-box) - We cannot compute gradients directly - We want to find optimal thresholds with as few evaluations as possible
Traditional approaches fail: - Grid search: Exhaustive but inefficient (if we test 20×20 = 400 combinations, each requiring full dataset processing) - Random search: Unguided and wasteful - Gradient descent: Requires continuous derivatives (not suitable here)
Bayesian Optimization succeeds because it learns from each evaluation and intelligently selects the next point to test, balancing exploration (testing uncertain regions) with exploitation (refining promising areas).
6.4.2 How Bayesian Optimization Works
Bayesian Optimization operates in a loop:
- Build a probabilistic model (Gaussian Process) from past observations
- Use an acquisition function (Upper Confidence Bound) to decide where to sample next
- Evaluate the function at the selected point
- Repeat until convergence
Gaussian Process (GP): Our Belief about the Function
The GP provides two pieces of information at any point:
- Mean \(\mu(x)\): Our best estimate of \(f(x)\)
- Variance \(\sigma^2(x)\): Our uncertainty about \(f(x)\)
Near observed points: Low uncertainty (we know what the function is) Far from observations: High uncertainty (complete ignorance)
The GP uses an RBF (Radial Basis Function) kernel to correlate nearby points:
\[k(x, x') = \exp\left(-\frac{\|x - x'\|^2}{2\ell^2}\right)\]
where \(\ell\) is the length scale controlling how quickly correlation decays with distance.
Acquisition Function: Where to Sample Next?
The Upper Confidence Bound (UCB) acquisition function balances two competing goals:
\[\text{UCB}(x) = \mu(x) + \kappa \cdot \sigma(x)\]
- First term (\(\mu(x)\)): Exploitation—favors regions with high predicted function values
- Second term (\(\kappa \cdot \sigma(x)\)): Exploration—favors uncertain regions where we might find something better
The parameter \(\kappa\) controls the trade-off: - Low \(\kappa\): Exploit known good regions - High \(\kappa\): Explore uncertain areas (especially useful early on)
6.4.3 Illustrative Example: The Step Function
To build intuition, imagine optimizing a continuous step function with multiple plateaus and a peak around x* ≈ 1.08.
Initial observations (at x = -2.5, -2.0): - Very limited information about the function - GP is highly uncertain everywhere except the observed points - Uncertainty “sausage” is very wide
Iteration 1: First exploration - UCB is maximized around x = 0.246 (shown with green star) - Why x = 0.246 and not x = 1.0 or beyond? - At x = 1.0: Mean has decayed nearly to zero (far from observations) - At x = 0.246: Mean is still positive (RBF kernel maintains correlation from nearby observations) - The “sweet spot” balances mean (which is extrapolating toward zero) with variance (which is high everywhere)
Result: We discover f(0.246) = 3.398 (huge improvement from 0.5!)
Key insight: Found the optimum in just 6 evaluations instead of blindly sampling hundreds of combinations.
6.4.4 Why Bayesian Optimization Works for Threshold Tuning
In our cell segmentation problem:
- Evaluation cost: Each threshold combination requires processing ~100-1000 validation images and computing segmentation metrics
- Dimension: Only 2 parameters (nucleus min, cytoplasm max)—perfect for Bayesian Optimization
- Smoothness: The error landscape is likely smooth and unimodal (one dominant peak)
- Efficiency goal: We want to find optimal thresholds with minimal computational cost
Bayesian Optimization will strategically explore this 2D parameter space, converging to optimal thresholds much faster than grid search.
6.5 Complete Implementation
Let’s implement Bayesian Optimization using scikit-optimize’s gp_minimize() function. This section is executable - run it cell by cell in Colab or Jupyter.
6.5.1 Preliminary Functions
We will need several sets of functions as part of the pipeline including the Metrics, Data Cleaning Functions, and the main Pipeline Functions:
Loading of Libraries and the Data
from skimage.io import imread
import matplotlib.pyplot as plt
import numpy as np, pandas as pd
from skimage.color import rgb2gray
from skimage.filters import sobel, scharr, apply_hysteresis_threshold, gaussian
from skimage.feature import canny
from skimage.measure import regionprops,regionprops_table
from skimage.morphology import binary_opening, disk, binary_closing, binary_dilation, binary_erosion
from skimage.draw import rectangle_perimeter
from scipy.ndimage import label as scilabel
import cv2
import matplotlib; matplotlib.rcParams['figure.dpi']=300
from scipy.ndimage import gaussian_filter
from skimage.restoration import denoise_nl_means
from skimage.filters import unsharp_mask
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
from scipy.optimize import minimize
from sklearn.model_selection import train_test_split
# ============================================================================
# LOAD CEDARS SINAI UROTHELIAL CELL DATA
# ============================================================================
# Note: If running in Jupyter/Colab, uncomment and run:
# !git clone https://github.com/emilsar/Cedars.git
# %cd Cedars/Project3
# !python 1_prepdata.py
# Load the preprocessed data — tries Windows path first, falls back to local Cedars clone
try:
urothelial_cells = pd.read_pickle("C:/Cedars/Project3/urothelial_cell_toy_data.pkl")
except FileNotFoundError:
urothelial_cells = pd.read_pickle("Cedars/Project3/urothelial_cell_toy_data.pkl")
# Convert to uint8 image format (0-255)
# Original shape: (batch, channels, height, width)
# Target shape: (batch, height, width, channels)
images = np.transpose(urothelial_cells["X"].numpy() * 255, (0, 2, 3, 1)).astype(np.uint8)
labels = urothelial_cells["y"]Metrics
import numpy as np
def dice_coefficient(pred, target, class_label):
"""
Calculate Dice coefficient for a specific class.
Parameters:
-----------
pred : np.ndarray
Predicted segmentation (integer labels)
target : np.ndarray
Ground truth segmentation (integer labels)
class_label : int
The class to evaluate (0, 1, or 2)
Returns:
--------
dice : float
Dice coefficient for this class (0 to 1)
"""
pred_binary = (pred == class_label).astype(float)
target_binary = (target == class_label).astype(float)
intersection = np.sum(pred_binary * target_binary)
sum_pred = np.sum(pred_binary)
sum_target = np.sum(target_binary)
# Handle edge case where both pred and target are empty
if sum_pred == 0 and sum_target == 0:
return 1.0
if sum_pred + sum_target == 0:
return 0.0
dice = 2 * intersection / (sum_pred + sum_target)
return dice
def iou_coefficient(pred, target, class_label):
"""
Calculate Intersection over Union (IoU) for a specific class.
Parameters:
-----------
pred : np.ndarray
Predicted segmentation (integer labels)
target : np.ndarray
Ground truth segmentation (integer labels)
class_label : int
The class to evaluate (0, 1, or 2)
Returns:
--------
iou : float
IoU for this class (0 to 1)
"""
pred_binary = (pred == class_label).astype(float)
target_binary = (target == class_label).astype(float)
intersection = np.sum(pred_binary * target_binary)
union = np.sum(np.logical_or(pred_binary, target_binary))
# Handle edge case where union is empty
if union == 0:
return 1.0 if intersection == 0 else 0.0
iou = intersection / union
return iou
def calculate_all_metrics(pred, target, class_labels=[0, 1, 2]):
"""
Calculate Dice and IoU for all classes plus mean metrics.
Returns:
--------
metrics : dict
Dictionary containing per-class and mean metrics
"""
metrics = {}
dice_scores = []
iou_scores = []
for class_label in class_labels:
dice = dice_coefficient(pred, target, class_label)
iou = iou_coefficient(pred, target, class_label)
class_name = ['Background', 'Cytoplasm', 'Nucleus'][class_label]
metrics[f'Dice_{class_name}'] = dice
metrics[f'IoU_{class_name}'] = iou
dice_scores.append(dice)
iou_scores.append(iou)
metrics['Dice_Mean'] = np.mean(dice_scores)
metrics['IoU_Mean'] = np.mean(iou_scores)
return metrics
def print_metrics(metrics):
"""Pretty-print metrics table."""
print("=" * 70)
print(f"{'Class':<15} {'Dice':<15} {'IoU':<15}")
print("=" * 70)
class_names = ['Background', 'Cytoplasm', 'Nucleus']
for i, class_name in enumerate(class_names):
dice = metrics[f'Dice_{class_name}']
iou = metrics[f'IoU_{class_name}']
print(f"{class_name:<15} {dice:<15.4f} {iou:<15.4f}")
print("-" * 70)
print(f"{'Mean':<15} {metrics['Dice_Mean']:<15.4f} {metrics['IoU_Mean']:<15.4f}")
print("=" * 70)
def compute_aggregate_metrics(all_metrics):
"""
Compute mean and standard deviation of metrics across all images.
Parameters:
-----------
all_metrics : list of dict
Metrics for each image
Returns:
--------
aggregate : dict
Mean and std for each metric
"""
aggregate = {}
# Get all metric keys from first image
if not all_metrics:
return aggregate
metric_keys = all_metrics[0].keys()
for key in metric_keys:
values = [m[key] for m in all_metrics]
aggregate[f'{key}_mean'] = np.mean(values)
aggregate[f'{key}_std'] = np.std(values)
return aggregate
def print_aggregate_metrics(aggregate_metrics):
"""Pretty-print aggregate metrics."""
print("\n" + "="*85)
print(f"{'Metric':<25} {'Mean':<20} {'Std Dev':<20}")
print("="*85)
class_names = ['Background', 'Cytoplasm', 'Nucleus']
# Print per-class Dice
for class_name in class_names:
key = f'Dice_{class_name}'
mean = aggregate_metrics.get(f'{key}_mean', 0)
std = aggregate_metrics.get(f'{key}_std', 0)
print(f"{key:<25} {mean:<20.4f} {std:<20.4f}")
print("-"*85)
# Print Mean Dice
key = 'Dice_Mean'
mean = aggregate_metrics.get(f'{key}_mean', 0)
std = aggregate_metrics.get(f'{key}_std', 0)
print(f"{key:<25} {mean:<20.4f} {std:<20.4f}")
print("-"*85)
# Print per-class IoU
for class_name in class_names:
key = f'IoU_{class_name}'
mean = aggregate_metrics.get(f'{key}_mean', 0)
std = aggregate_metrics.get(f'{key}_std', 0)
print(f"{key:<25} {mean:<20.4f} {std:<20.4f}")
print("-"*85)
# Print Mean IoU
key = 'IoU_Mean'
mean = aggregate_metrics.get(f'{key}_mean', 0)
std = aggregate_metrics.get(f'{key}_std', 0)
print(f"{key:<25} {mean:<20.4f} {std:<20.4f}")
print("="*85)
def visualize_metric_distributions(all_metrics):
"""
Create histograms showing distribution of metrics across dataset.
Parameters:
-----------
all_metrics : list of dict
Metrics for each image
"""
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle('Distribution of Segmentation Metrics Across Dataset',
fontsize=14, fontweight='bold')
# Extract metric values
dice_bg = [m['Dice_Background'] for m in all_metrics]
dice_cyto = [m['Dice_Cytoplasm'] for m in all_metrics]
dice_nuc = [m['Dice_Nucleus'] for m in all_metrics]
iou_bg = [m['IoU_Background'] for m in all_metrics]
iou_cyto = [m['IoU_Cytoplasm'] for m in all_metrics]
iou_nuc = [m['IoU_Nucleus'] for m in all_metrics]
# Dice histograms
axes[0, 0].hist(dice_bg, bins=30, alpha=0.7, color='blue', edgecolor='black')
axes[0, 0].set_title('Dice - Background', fontweight='bold')
axes[0, 0].set_xlabel('Dice Score')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].grid(alpha=0.3)
axes[0, 0].axvline(np.mean(dice_bg), color='red', linestyle='--', linewidth=2,
label=f'Mean: {np.mean(dice_bg):.3f}')
axes[0, 0].legend()
axes[0, 1].hist(dice_cyto, bins=30, alpha=0.7, color='orange', edgecolor='black')
axes[0, 1].set_title('Dice - Cytoplasm', fontweight='bold')
axes[0, 1].set_xlabel('Dice Score')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].grid(alpha=0.3)
axes[0, 1].axvline(np.mean(dice_cyto), color='red', linestyle='--', linewidth=2,
label=f'Mean: {np.mean(dice_cyto):.3f}')
axes[0, 1].legend()
axes[0, 2].hist(dice_nuc, bins=30, alpha=0.7, color='green', edgecolor='black')
axes[0, 2].set_title('Dice - Nucleus', fontweight='bold')
axes[0, 2].set_xlabel('Dice Score')
axes[0, 2].set_ylabel('Frequency')
axes[0, 2].grid(alpha=0.3)
axes[0, 2].axvline(np.mean(dice_nuc), color='red', linestyle='--', linewidth=2,
label=f'Mean: {np.mean(dice_nuc):.3f}')
axes[0, 2].legend()
# IoU histograms
axes[1, 0].hist(iou_bg, bins=30, alpha=0.7, color='blue', edgecolor='black')
axes[1, 0].set_title('IoU - Background', fontweight='bold')
axes[1, 0].set_xlabel('IoU Score')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].grid(alpha=0.3)
axes[1, 0].axvline(np.mean(iou_bg), color='red', linestyle='--', linewidth=2,
label=f'Mean: {np.mean(iou_bg):.3f}')
axes[1, 0].legend()
axes[1, 1].hist(iou_cyto, bins=30, alpha=0.7, color='orange', edgecolor='black')
axes[1, 1].set_title('IoU - Cytoplasm', fontweight='bold')
axes[1, 1].set_xlabel('IoU Score')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].grid(alpha=0.3)
axes[1, 1].axvline(np.mean(iou_cyto), color='red', linestyle='--', linewidth=2,
label=f'Mean: {np.mean(iou_cyto):.3f}')
axes[1, 1].legend()
axes[1, 2].hist(iou_nuc, bins=30, alpha=0.7, color='green', edgecolor='black')
axes[1, 2].set_title('IoU - Nucleus', fontweight='bold')
axes[1, 2].set_xlabel('IoU Score')
axes[1, 2].set_ylabel('Frequency')
axes[1, 2].grid(alpha=0.3)
axes[1, 2].axvline(np.mean(iou_nuc), color='red', linestyle='--', linewidth=2,
label=f'Mean: {np.mean(iou_nuc):.3f}')
axes[1, 2].legend()
plt.tight_layout()
plt.show()
def calculate_nc_ratio(label_image):
"""
Calculate the nucleus-to-cytoplasm area ratio.
Parameters:
-----------
label_image : np.ndarray
Segmented label image (0=background, 1=cytoplasm, 2=nucleus)
Returns:
--------
nc_ratio : float
Ratio of nucleus area to cytoplasm area
Returns 0 if cytoplasm area is 0 (to avoid division by zero)
"""
nucleus_area = np.sum(label_image == 2)
cytoplasm_area = np.sum(label_image == 1)
if cytoplasm_area == 0:
return 0.0
nc_ratio = nucleus_area / cytoplasm_area
return nc_ratio
def compute_nc_ratios(labels_list):
"""
Compute nucleus-to-cytoplasm ratios for all images.
Parameters:
-----------
labels_list : list of np.ndarray
List of label images
Returns:
--------
nc_ratios : np.ndarray
Array of NC ratios, one per image
"""
nc_ratios = np.array([calculate_nc_ratio(label_img) for label_img in labels_list])
return nc_ratios
def analyze_nc_ratios(labels_true, labels_pred):
"""
Compute and visualize N/C ratio metrics.
Parameters:
-----------
labels_true : list of np.ndarray
Ground truth label images
labels_pred : list of np.ndarray
Predicted segmentation images
dataset_name : str
Name for output printing
"""
# Compute N/C ratios
nc_ratio_true = compute_nc_ratios(labels_true)
nc_ratio_pred = compute_nc_ratios(labels_pred)
# Calculate errors
absolute_errors = np.abs(nc_ratio_pred - nc_ratio_true)
relative_errors = np.abs(nc_ratio_pred - nc_ratio_true) / (nc_ratio_true + 1e-6)
# Print statistics
print(f"N/C Ratio Statistics:")
print(f" True N/C Ratio - Mean: {np.mean(nc_ratio_true):.4f}, Std: {np.std(nc_ratio_true):.4f}")
print(f" Pred N/C Ratio - Mean: {np.mean(nc_ratio_pred):.4f}, Std: {np.std(nc_ratio_pred):.4f}")
print(f" Absolute Error - Mean: {np.mean(absolute_errors):.4f}, Std: {np.std(absolute_errors):.4f}")
print(f" Relative Error - Mean: {np.mean(relative_errors):.4f}, Std: {np.std(relative_errors):.4f}")
# Create scatter plot
fig, ax = plt.subplots(figsize=(4, 4))
ax.scatter(nc_ratio_pred, nc_ratio_true, alpha=0.6, s=8)
ax.plot([0, 5], [0, 5], ls="--", color='red', linewidth=2, label='Perfect Prediction')
ax.set_xlabel("Predicted N/C Ratio", fontsize=5, fontweight='bold')
ax.set_ylabel("True N/C Ratio", fontsize=5, fontweight='bold')
ax.set_title(f"N/C Ratio: Predicted vs. True ({dataset_name})", fontsize=5, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend(fontsize=11)
max_val = max(nc_ratio_pred.max(), nc_ratio_true.max()) * 1.1
ax.set_xlim([0, max_val])
ax.set_ylim([0, max_val])
plt.tight_layout()
plt.show()Data Cleaning Functions
import numpy as np
from skimage.measure import label
def count_nuclei(label_image, nucleus_value=2):
"""
Count the number of distinct nuclei (connected components) in a labeled image.
Parameters:
-----------
label_image : np.ndarray
2D array where each pixel is labeled with integer values
(0=background, 1=cytoplasm, 2=nucleus)
nucleus_value : int
The intensity value representing nucleus pixels (default: 2)
Returns:
--------
num_nuclei : int
The number of connected components with nucleus label
component_map : np.ndarray
2D array where each nucleus component is assigned a unique ID
(starting from 1; background is 0)
"""
# Extract only the nucleus pixels (create binary mask)
nucleus_mask = (label_image == nucleus_value).astype(int)
# Apply connected component labeling
# Create 2-connectivity structure (8-connected neighborhood)
from scipy.ndimage import generate_binary_structure
structure = generate_binary_structure(2, 2)
component_map, num_nuclei = scilabel(nucleus_mask, structure=structure)
return num_nuclei, component_map
def analyze_nucleus_distribution(labels_list):
"""
Analyze the distribution of nucleus counts across the entire dataset.
Parameters:
-----------
labels_list : list of np.ndarray
List of all ground truth label images in the dataset
Returns:
--------
nucleus_counts : np.ndarray
Array where index i contains the count of images with i nuclei
image_nucleus_info : list of tuples
List of (image_index, num_nuclei) for each image
"""
image_nucleus_info = []
max_nuclei = 0
for idx, label_image in enumerate(labels_list):
num_nuclei, _ = count_nuclei(label_image, nucleus_value=2)
image_nucleus_info.append((idx, num_nuclei))
max_nuclei = max(max_nuclei, num_nuclei)
# Create histogram
nucleus_counts = np.zeros(max_nuclei + 1, dtype=int)
for idx, num_nuclei in image_nucleus_info:
nucleus_counts[num_nuclei] += 1
return nucleus_counts, image_nucleus_info
def filter_images_by_nucleus_count(images, labels, min_nuclei=1, max_nuclei=1):
"""
Filter images to keep only those with an acceptable number of nuclei.
Parameters:
-----------
images : list of np.ndarray
List of raw cell images
labels : list of np.ndarray
List of corresponding label images
min_nuclei : int
Minimum acceptable number of nuclei (default: 1)
max_nuclei : int
Maximum acceptable number of nuclei (default: 1)
Returns:
--------
filtered_images : list of np.ndarray
Filtered image list
filtered_labels : list of np.ndarray
Filtered label list
valid_indices : list of int
Indices of images that passed the filter in original dataset
filtering_report : dict
Detailed statistics about the filtering
"""
valid_indices = []
excluded_indices = []
excluded_reasons = {
'no_nucleus': [],
'too_few_nuclei': [],
'too_many_nuclei': []
}
for idx, label_img in enumerate(labels):
num_nuclei, _ = count_nuclei(label_img, nucleus_value=2)
if num_nuclei == 0:
excluded_indices.append(idx)
excluded_reasons['no_nucleus'].append(idx)
elif num_nuclei < min_nuclei:
excluded_indices.append(idx)
excluded_reasons['too_few_nuclei'].append(idx)
elif num_nuclei > max_nuclei:
excluded_indices.append(idx)
excluded_reasons['too_many_nuclei'].append(idx)
else:
valid_indices.append(idx)
filtered_images = [images[i] for i in valid_indices]
filtered_labels = [labels[i] for i in valid_indices]
filtering_report = {
'original_count': len(images),
'filtered_count': len(filtered_images),
'excluded_count': len(excluded_indices),
'no_nucleus': len(excluded_reasons['no_nucleus']),
'too_few_nuclei': len(excluded_reasons['too_few_nuclei']),
'too_many_nuclei': len(excluded_reasons['too_many_nuclei']),
'exclusion_rate': len(excluded_indices) / len(images) * 100
}
return filtered_images, filtered_labels, valid_indices, filtering_report
def visualize_nucleus_distribution(nucleus_counts, labels_list=None):
"""
Visualize the distribution of nucleus counts in the dataset.
Parameters:
-----------
nucleus_counts : np.ndarray
Array where index i contains count of images with i nuclei
labels_list : list of np.ndarray, optional
If provided, compute distribution from scratch
"""
if labels_list is not None:
nucleus_counts, _ = analyze_nucleus_distribution(labels_list)
fig, ax = plt.subplots(figsize=(5,3))
nuclei_indices = np.arange(len(nucleus_counts))
colors = ['red' if i != 1 else 'green' for i in nuclei_indices]
bars = ax.bar(nuclei_indices, nucleus_counts, color=colors, alpha=0.7, edgecolor='black')
ax.set_xlabel('Number of Nuclei per Image', fontsize=5, fontweight='bold')
ax.set_ylabel('Count of Images', fontsize=5, fontweight='bold')
ax.set_title('Distribution of Nucleus Counts in Dataset', fontsize=5, fontweight='bold')
ax.set_xticks(nuclei_indices)
ax.grid(axis='y', alpha=0.3)
# Add count labels on bars
for bar in bars:
height = bar.get_height()
if height > 0:
ax.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}',
ha='center', va='bottom', fontweight='bold')
# Add legend
from matplotlib.patches import Patch
legend_elements = [
Patch(facecolor='green', alpha=0.7, edgecolor='black', label='Keep (1 nucleus)'),
Patch(facecolor='red', alpha=0.7, edgecolor='black', label='Exclude (not 1 nucleus)')
]
ax.legend(handles=legend_elements, fontsize=11, loc='upper right')
plt.tight_layout()
plt.show()
# Print statistics
print("\n" + "="*70)
print("NUCLEUS DISTRIBUTION STATISTICS")
print("="*70)
print(f"{'# Nuclei':<15} {'Count':<15} {'Percentage':<15}")
print("-"*70)
total = np.sum(nucleus_counts)
for i, count in enumerate(nucleus_counts):
percentage = count / total * 100 if total > 0 else 0
status = "[KEEP]" if i == 1 else "[EXCLUDE]"
print(f"{i:<15} {count:<15} {percentage:<15.2f}% {status}")
print("="*70)Pipeline Functions
def apply_stage4_pipeline(img, labels_true, params):
"""
Apply the complete 4-stage preprocessing pipeline to a single image.
Parameters:
-----------
img : np.ndarray
RGB image (H, W, 3)
labels_true : np.ndarray
Ground truth segmentation labels
params : dict
Pipeline parameters:
- nucleus_max, cytoplasm_min, cytoplasm_max: thresholds
- nl_means_strength: denoising parameter h
- unsharp_radius, unsharp_amount: edge enhancement
- blur_sigma: Gaussian blur standard deviation
- morph_disk_size: morphological opening kernel
- closing_disk_size: morphological closing kernel
Returns:
--------
segmented : np.ndarray
Final predicted segmentation
metrics : dict
Calculated metrics (Dice, IoU for each class)
"""
# STAGE 1: Grayscale Conversion
img_uint8 = img.astype(np.uint8)
img_gray = cv2.cvtColor(img_uint8, cv2.COLOR_RGB2GRAY).astype(np.float32) / 255.0
# STAGE 2: Non-Local Means Denoising
img_denoised = denoise_nl_means(
img_gray,
h=params['nl_means_strength'],
fast_mode=True,
patch_size=10,
patch_distance=10
)
# STAGE 3: Edge Enhancement and Blur
img_enhanced = unsharp_mask(
img_denoised,
radius=params['unsharp_radius'],
amount=params['unsharp_amount']
)
img_blurred = gaussian_filter(img_enhanced, sigma=params['blur_sigma'])
# Intensity-Based Segmentation
nucleus_mask = (img_blurred < params['nucleus_max']).astype(int)
cytoplasm_mask = np.logical_and(
img_blurred >= params['cytoplasm_min'],
img_blurred <= params['cytoplasm_max']
).astype(int)
# STAGE 4: Morphological Operations
nucleus_opened = binary_opening(nucleus_mask, disk(params['morph_disk_size'])).astype(int)
cytoplasm_opened = binary_opening(cytoplasm_mask, disk(params['morph_disk_size'])).astype(int)
nucleus_closed = binary_closing(nucleus_opened, disk(params['closing_disk_size'])).astype(int)
cytoplasm_closed = binary_closing(cytoplasm_opened, disk(params['closing_disk_size'])).astype(int)
# Combine into 3-class segmentation
segmented = np.zeros_like(img_blurred, dtype=int)
segmented[nucleus_closed == 1] = 2
segmented[cytoplasm_closed == 1] = 1
# Calculate metrics
metrics = calculate_all_metrics(segmented, labels_true, class_labels=[0, 1, 2])
return segmented, metrics
def apply_pipeline_to_dataset(images, labels, params, verbose=True):
"""
Apply the preprocessing pipeline to all images in the dataset.
Parameters:
-----------
images : list of np.ndarray
List of RGB images
labels : list of np.ndarray
List of ground truth segmentations
params : dict
Pipeline parameters
verbose : bool
Print progress updates
Returns:
--------
segmented_images : list of np.ndarray
Preprocessed segmentations for all images
all_metrics : list of dict
Metrics for each image
aggregate_metrics : dict
Mean and std of metrics across dataset
"""
segmented_images = []
all_metrics = []
total_images = len(images)
for idx, (img, label_true) in enumerate(zip(images, labels)):
if verbose and (idx + 1) % 10 == 0:
print(f"Processing image {idx + 1}/{total_images}...")
segmented, metrics = apply_stage4_pipeline(img, label_true, params)
segmented_images.append(segmented)
all_metrics.append(metrics)
# Compute aggregate metrics
aggregate_metrics = compute_aggregate_metrics(all_metrics)
if verbose:
print(f"Completed processing all {total_images} images.")
return segmented_images, all_metrics, aggregate_metrics6.5.2 Filter Data
from skopt import gp_minimize
from skopt.space import Real
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
print("✓ All Bayesian Optimization libraries loaded")
# ============================================================================
# FILTER DATA: Keep only images with exactly 1 nucleus
# ============================================================================
print("\n" + "="*70)
print("FILTERING DATA: Removing images with 0, >1 nuclei")
print("="*70)
filtered_images, filtered_labels, valid_indices, report = filter_images_by_nucleus_count(
images, labels, min_nuclei=1, max_nuclei=1
)
print(f"\nFiltering Report:")
print(f" Original images: {report['original_count']}")
print(f" Filtered images: {report['filtered_count']}")
print(f" Excluded: {report['excluded_count']} ({report['exclusion_rate']:.1f}%)")6.5.3 Train/Validation/Test Split
print("\n" + "="*70)
print("TRAIN/VALIDATION/TEST SPLIT (60/20/20)")
print("="*70)
# Split cleaned data
train_idx, temp_idx = train_test_split(
np.arange(len(filtered_images)),
test_size=0.4,
random_state=42
)
val_idx, test_idx = train_test_split(
temp_idx,
test_size=0.5,
random_state=42
)
train_images = np.array([filtered_images[i] for i in train_idx])
train_labels = np.array([filtered_labels[i] for i in train_idx])
val_images = np.array([filtered_images[i] for i in val_idx])
val_labels = np.array([filtered_labels[i] for i in val_idx])
test_images = np.array([filtered_images[i] for i in test_idx])
test_labels = np.array([filtered_labels[i] for i in test_idx])
print(f"Total cleaned images: {len(filtered_images)}")
print(f"Train set: {len(train_images)} ({len(train_images)/len(filtered_images)*100:.1f}%)")
print(f"Validation set: {len(val_images)} ({len(val_images)/len(filtered_images)*100:.1f}%)")
print(f"Test set: {len(test_images)} ({len(test_images)/len(filtered_images)*100:.1f}%)")6.5.4 Bayesian Optimization Objective Function
# ============================================================================
# BAYESIAN OPTIMIZATION OBJECTIVE FUNCTION
# ============================================================================
def evaluate_pipeline_loss(params):
"""
Objective function for Bayesian Optimization.
Takes a list [nucleus_max, cytoplasm_max] and returns negative Dice.
cytoplasm_min is automatically set to nucleus_max.
Evaluates on TRAINING set to find optimal parameters.
"""
nucleus_max, cytoplasm_max = params
cytoplasm_min = nucleus_max # Automatically set
# Apply pipeline with these parameters
pipeline_params = {
'nucleus_max': nucleus_max,
'cytoplasm_min': cytoplasm_min,
'cytoplasm_max': cytoplasm_max,
'nl_means_strength': 1.0,
'unsharp_radius': 2.0,
'unsharp_amount': 1.0,
'blur_sigma': 2.0,
'morph_disk_size': 5,
'closing_disk_size': 8
}
segmented, _, agg_metrics = apply_pipeline_to_dataset(
train_images, train_labels, pipeline_params, verbose=False
)
# Loss is negative mean Dice (to minimize)
loss = -agg_metrics['Dice_Mean_mean']
return loss6.5.5 Callback Function for Iteration Display
# ============================================================================
# CALLBACK FUNCTION TO DISPLAY ITERATION RESULTS
# ============================================================================
iteration_count = [0] # Use list to allow modification in nested function
def callback(result):
"""Called after each iteration to display results."""
iteration_count[0] += 1
current_loss = result.func_vals[-1]
best_loss = result.fun
current_params = result.x_iters[-1]
best_params = result.x
print(f"Iter {iteration_count[0]}: nucleus_max={current_params[0]:.4f}, "
f"cytoplasm_max={current_params[1]:.4f}, loss={current_loss:.4f}")
print(f" → Best so far: nucleus_max={best_params[0]:.4f}, "
f"cytoplasm_max={best_params[1]:.4f}, loss={best_loss:.4f}")
print("✓ Callback function defined")6.5.6 Run Bayesian Optimization
print("\n" + "="*70)
print("STEP 1: BAYESIAN OPTIMIZATION OF PIPELINE PARAMETERS")
print("="*70)
# Define parameter space
space = [
Real(0.2, 0.5, name='nucleus_max'),
Real(0.5, 1.0, name='cytoplasm_max')
]
# Run Bayesian Optimization
print("\nRunning Bayesian Optimization...")
print("(This may take a few minutes)")
print("-" * 70)
opt_results = gp_minimize(
func=evaluate_pipeline_loss,
dimensions=space,
n_calls=20, # 5 initial + 15 BO iterations
n_initial_points=5,
acq_func='LCB', # Lower Confidence Bound (minimizes loss)
kappa=2.0,
random_state=42,
callback=callback,
verbose=0
)
# Extract best parameters
best_nucleus_max = opt_results.x[0]
best_cytoplasm_max = opt_results.x[1]
best_cytoplasm_min = best_nucleus_max # Automatically set
best_loss = opt_results.fun
best_dice = -best_loss
print("\n" + "="*70)
print("OPTIMIZATION COMPLETE")
print("="*70)
print(f"\nBest Parameters Found:")
print(f" Nucleus Max: {best_nucleus_max:.6f}")
print(f" Cytoplasm Min: {best_cytoplasm_min:.6f} (= Nucleus Max)")
print(f" Cytoplasm Max: {best_cytoplasm_max:.6f}")
print(f" Training Loss: {best_loss:.6f}")
print(f" Training Dice: {best_dice:.6f}")
print(f"\nTotal Evaluations: {len(opt_results.func_vals)}")6.5.7 Evaluate on All Sets
# ============================================================================
# EVALUATE OPTIMIZED PARAMETERS ON ALL SETS
# ============================================================================
print("\n" + "="*70)
print("STEP 2: EVALUATING OPTIMIZED PARAMETERS ON ALL SETS")
print("="*70)
best_pipeline_params = {
'nucleus_max': best_nucleus_max,
'cytoplasm_min': best_cytoplasm_min,
'cytoplasm_max': best_cytoplasm_max,
'nl_means_strength': 1.0,
'unsharp_radius': 2.0,
'unsharp_amount': 1.0,
'blur_sigma': 2.0,
'morph_disk_size': 5,
'closing_disk_size': 8
}
print("\nApplying optimized parameters to training set...")
train_segmented, train_metrics, train_agg = apply_pipeline_to_dataset(
train_images, train_labels, best_pipeline_params, verbose=False
)
print("Applying optimized parameters to validation set...")
val_segmented, val_metrics, val_agg = apply_pipeline_to_dataset(
val_images, val_labels, best_pipeline_params, verbose=False
)
print("Applying optimized parameters to test set...")
test_segmented, test_metrics, test_agg = apply_pipeline_to_dataset(
test_images, test_labels, best_pipeline_params, verbose=False
)
print("✓ Evaluation complete")
print("\n" + "="*70)
print("RESULTS WITH OPTIMIZED PARAMETERS")
print("="*70)
print("\nTRAINING SET:")
for class_name in ['Background', 'Cytoplasm', 'Nucleus']:
dice_mean = train_agg[f'Dice_{class_name}_mean']
dice_std = train_agg[f'Dice_{class_name}_std']
print(f" Dice_{class_name}: {dice_mean:.4f} ± {dice_std:.4f}")
print(f" Dice_Mean: {train_agg['Dice_Mean_mean']:.4f}")
print("\nVALIDATION SET:")
for class_name in ['Background', 'Cytoplasm', 'Nucleus']:
dice_mean = val_agg[f'Dice_{class_name}_mean']
dice_std = val_agg[f'Dice_{class_name}_std']
print(f" Dice_{class_name}: {dice_mean:.4f} ± {dice_std:.4f}")
print(f" Dice_Mean: {val_agg['Dice_Mean_mean']:.4f}")
print("\nTEST SET:")
for class_name in ['Background', 'Cytoplasm', 'Nucleus']:
dice_mean = test_agg[f'Dice_{class_name}_mean']
dice_std = test_agg[f'Dice_{class_name}_std']
print(f" Dice_{class_name}: {dice_mean:.4f} ± {dice_std:.4f}")
print(f" Dice_Mean: {test_agg['Dice_Mean_mean']:.4f}")6.6 Visualization and Evaluation
6.6.1 Convergence Plot
import matplotlib.pyplot as plt
print("\n" + "="*70)
print("STEP 3: VISUALIZATION AND RESULTS")
print("="*70)
# Create convergence plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: Loss convergence
axes[0].plot(opt_results.func_vals, 'o-', linewidth=2, markersize=6, color='steelblue')
axes[0].axhline(opt_results.fun, color='red', linestyle='--', linewidth=2, label='Best loss')
axes[0].set_xlabel('Iteration', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Loss (Negative Dice)', fontsize=12, fontweight='bold')
axes[0].set_title('Bayesian Optimization Convergence', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)
axes[0].legend()
# Plot 2: Parameter trajectory
nucleus_max_vals = [opt_results.x_iters[i][0] for i in range(len(opt_results.x_iters))]
cytoplasm_max_vals = [opt_results.x_iters[i][1] for i in range(len(opt_results.x_iters))]
axes[1].scatter(nucleus_max_vals, cytoplasm_max_vals, c=range(len(nucleus_max_vals)),
cmap='viridis', s=100, alpha=0.7, edgecolors='black', linewidth=1)
axes[1].scatter(best_nucleus_max, best_cytoplasm_max, color='red', s=300, marker='*',
edgecolors='black', linewidth=2, label='Best', zorder=5)
axes[1].set_xlabel('Nucleus Max', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Cytoplasm Max', fontsize=12, fontweight='bold')
axes[1].set_title('Parameter Search Trajectory', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)
axes[1].legend()
cbar = plt.colorbar(axes[1].collections[0], ax=axes[1])
cbar.set_label('Iteration', fontweight='bold')
plt.tight_layout()
plt.savefig('images/chapter6/chapter6_optimization_convergence.png', dpi=150, bbox_inches='tight')
plt.show()
print("✓ Convergence plot saved")6.6.2 Metrics Comparison
# Create comprehensive metrics comparison
fig, axes = plt.subplots(3, 1, figsize=(12, 10))
datasets = ['Train', 'Validation', 'Test']
agg_metrics = [train_agg, val_agg, test_agg]
colors = ['#2E86AB', '#A23B72', '#F18F01']
# Plot 1: Per-class Dice
class_names = ['Background', 'Cytoplasm', 'Nucleus']
for i, (dataset_name, agg, color) in enumerate(zip(datasets, agg_metrics, colors)):
dice_values = [agg[f'Dice_{cn}_mean'] for cn in class_names]
dice_stds = [agg[f'Dice_{cn}_std'] for cn in class_names]
axes[0].errorbar([j + i*0.25 for j in range(3)], dice_values, yerr=dice_stds,
label=dataset_name, marker='o', capsize=5, color=color, linewidth=2)
axes[0].set_ylabel('Dice Score', fontsize=11, fontweight='bold')
axes[0].set_title('Per-Class Dice Scores with Optimized Parameters', fontsize=12, fontweight='bold')
axes[0].set_xticks([0.25, 1.25, 2.25])
axes[0].set_xticklabels(class_names)
axes[0].legend()
axes[0].grid(True, alpha=0.3, axis='y')
axes[0].set_ylim([0, 1])
# Plot 2: Mean Dice comparison
mean_dices = [agg['Dice_Mean_mean'] for agg in agg_metrics]
mean_stds = [agg['Dice_Mean_std'] for agg in agg_metrics]
bars = axes[1].bar(datasets, mean_dices, yerr=mean_stds, capsize=10,
color=colors, edgecolor='black', linewidth=2, alpha=0.8)
axes[1].set_ylabel('Mean Dice Score', fontsize=11, fontweight='bold')
axes[1].set_title('Overall Performance Across Train/Val/Test Sets', fontsize=12, fontweight='bold')
axes[1].set_ylim([0, 1])
axes[1].grid(True, alpha=0.3, axis='y')
# Add value labels on bars
for bar, val in zip(bars, mean_dices):
height = bar.get_height()
axes[1].text(bar.get_x() + bar.get_width()/2., height,
f'{val:.4f}', ha='center', va='bottom', fontweight='bold')
# Plot 3: Generalization gap
train_mean = train_agg['Dice_Mean_mean']
val_mean = val_agg['Dice_Mean_mean']
test_mean = test_agg['Dice_Mean_mean']
gaps = [0, train_mean - val_mean, train_mean - test_mean]
gap_labels = ['Baseline', 'Train→Val', 'Train→Test']
axes[2].bar(gap_labels, gaps, color=['gray', '#FF6B6B', '#FF6B6B'], edgecolor='black', linewidth=2)
axes[2].set_ylabel('Dice Difference', fontsize=11, fontweight='bold')
axes[2].set_title('Generalization Gap (Overfitting Check)', fontsize=12, fontweight='bold')
axes[2].axhline(0, color='black', linestyle='-', linewidth=1)
axes[2].grid(True, alpha=0.3, axis='y')
# Add value labels
for i, (label, gap) in enumerate(zip(gap_labels, gaps)):
axes[2].text(i, gap, f'{gap:.4f}', ha='center', va='bottom' if gap >= 0 else 'top', fontweight='bold')
plt.tight_layout()
plt.savefig('images/chapter6/chapter6_metrics_comparison.png', dpi=150, bbox_inches='tight')
plt.show()
print("✓ Metrics comparison saved")6.6.3 Summary Statistics
print("\n" + "="*70)
print("GENERALIZATION ANALYSIS")
print("="*70)
print("\nTraining Performance:")
print(f" Mean Dice: {train_agg['Dice_Mean_mean']:.4f} ± {train_agg['Dice_Mean_std']:.4f}")
print("\nValidation Performance:")
print(f" Mean Dice: {val_agg['Dice_Mean_mean']:.4f} ± {val_agg['Dice_Mean_std']:.4f}")
print(f" Drop from training: {(train_agg['Dice_Mean_mean'] - val_agg['Dice_Mean_mean']):.4f}")
print("\nTest Performance:")
print(f" Mean Dice: {test_agg['Dice_Mean_mean']:.4f} ± {test_agg['Dice_Mean_std']:.4f}")
print(f" Drop from training: {(train_agg['Dice_Mean_mean'] - test_agg['Dice_Mean_mean']):.4f}")
if abs(train_agg['Dice_Mean_mean'] - val_agg['Dice_Mean_mean']) < 0.05:
print("\n✓ GOOD: Validation performance is close to training (minimal overfitting)")
else:
print("\n⚠ WARNING: Large gap between training and validation (possible overfitting)")
if abs(val_agg['Dice_Mean_mean'] - test_agg['Dice_Mean_mean']) < 0.05:
print("✓ GOOD: Test performance matches validation (parameters generalize)")
else:
print("⚠ WARNING: Test performance differs from validation")6.6.4 Testing the optimized Parameters
Let’s now test the optimized thresholds as well as the pre-parameters on the entire database just as we were doing in Chapter 4, and see what has improved in terms of the histogram of Dice coefficients.
print("\nFiltering dataset...")
filtered_images, filtered_labels, valid_indices, report = filter_images_by_nucleus_count(
images,
labels,
min_nuclei=1,
max_nuclei=1
)
# Example workflow
print("Applying pipeline to entire dataset...")
# Define parameters (determined from tuning on sample images in Chapter 4)
pipeline_params = {
'nucleus_max': 0.417168,
'cytoplasm_min': 0.417168,
'cytoplasm_max': 0.915328,
'nl_means_strength': 1,
'unsharp_radius': 2.0,
'unsharp_amount': 1.0,
'blur_sigma': 2,
'morph_disk_size': 5,
'closing_disk_size': 8
}
#Apply Pipeline. Returns segmented_images, all_metrics, aggregate_metrics
segmented_images, all_metrics, aggregate_metrics = apply_pipeline_to_dataset(
filtered_images,
filtered_labels,
pipeline_params,
verbose=True
)
print_aggregate_metrics(aggregate_metrics)
visualize_metric_distributions(all_metrics)
analyze_nc_ratios(filtered_labels, segmented_images)6.7 Final Summary
print("\n" + "="*70)
print("FINAL SUMMARY")
print("="*70)
print(f"\n✓ OPTIMIZED PARAMETERS:")
print(f" Nucleus Max: {best_nucleus_max:.6f}")
print(f" Cytoplasm Min: {best_cytoplasm_min:.6f} (= Nucleus Max)")
print(f" Cytoplasm Max: {best_cytoplasm_max:.6f}")
print(f"\n✓ PERFORMANCE ACROSS SETS:")
print(f" Training Dice: {train_agg['Dice_Mean_mean']:.6f}")
print(f" Validation Dice: {val_agg['Dice_Mean_mean']:.6f}")
print(f" Test Dice: {test_agg['Dice_Mean_mean']:.6f}")
print(f"\n✓ GENERALIZATION:")
train_val_gap = abs(train_agg['Dice_Mean_mean'] - val_agg['Dice_Mean_mean'])
train_test_gap = abs(train_agg['Dice_Mean_mean'] - test_agg['Dice_Mean_mean'])
print(f" Train→Validation gap: {train_val_gap:.6f}")
print(f" Train→Test gap: {train_test_gap:.6f}")
if train_val_gap < 0.05 and train_test_gap < 0.05:
print(f"\n✓ EXCELLENT: Model generalizes well across all sets!")
elif train_val_gap < 0.10 and train_test_gap < 0.10:
print(f"\n✓ GOOD: Acceptable generalization")
else:
print(f"\n⚠ WARNING: Potential overfitting detected")
print(f"\n✓ EFFICIENCY:")
print(f" Total Evaluations: {len(opt_results.func_vals)}")
print(f" Grid Equivalence: ~400 (20×20 grid)")
print(f" Speedup: {400/len(opt_results.func_vals):.1f}x")
print(f"\n✓ VISUALIZATIONS SAVED:")
print(f" - chapter6_optimization_convergence.png")
print(f" - chapter6_metrics_comparison.png")
print("\n" + "="*70)
print("Chapter 6 Complete!")
print("="*70)6.8 Key Takeaways
6.8.1 Why Bayesian Optimization Wins
- Efficiency: 20 evaluations vs. ~400 for grid search (20x speedup)
- Principled: Uses Gaussian Processes for probabilistic modeling
- Intelligent: Balances exploration vs. exploitation via acquisition functions
- Practical: Works in minutes on standard hardware
- Scalable: Handles 2-5 parameters efficiently (grid search becomes prohibitive)
6.8.2 The Optimization Framework
We used gp_minimize() from scikit-optimize with: - Parameter space: nucleus_max ∈ [0.2, 0.5], cytoplasm_max ∈ [0.5, 1.0] - Objective function: Negative mean Dice (loss to minimize) - Acquisition function: Lower Confidence Bound (LCB) for intelligent sampling - Initial points: 5 random samples + 15 Bayesian iterations
6.8.3 Validation & Generalization
The complete workflow: 1. Optimize on training set - find parameters that maximize training performance 2. Validate on validation set - verify parameters generalize to new data 3. Test on test set - final held-out evaluation
Small gaps between training→validation→test metrics indicate good generalization.
6.8.4 Next Steps
Use the optimized parameters in Chapter 7 to refine your segmentation model with proven threshold values.
6.9 Exercises
Exercise 6.1: Modify the acquisition function to ‘EI’ (Expected Improvement) instead of ‘LCB’. How does the search trajectory change?
Exercise 6.2: Expand the parameter bounds: nucleus_max ∈ [0.1, 0.6] and cytoplasm_max ∈ [0.4, 1.0]. Do you find better parameters?
Exercise 6.3: Create a 2D heatmap showing Dice as a function of both parameters (sample ~50 points). Compare with BO trajectory.
Exercise 6.4: Add a third parameter (e.g., nl_means_strength) and re-run optimization. How many additional evaluations are needed?
Exercise 6.5: Compare validation Dice with test Dice for all three classes. Is the generalization gap similar for all classes?
Exercise 6.6: Reduce n_calls to 10 total (3 initial + 7 BO). How does performance suffer compared to 20 evaluations?