7 Feature Engineering for Image Segmentation

7.1 Why Feature Maps Matter

In a bladder-cytology segmentation project, the task is to label each pixel as nucleus, cytoplasm, or background. Once the nucleus and cytoplasm are separated, one can estimate the nucleus-to-cytoplasm (N/C) ratio, a key morphologic criterion in urinary cytology for assessing high-grade urothelial carcinoma. Doing that well requires more than the raw RGB triple at each pixel: an isolated colour does not tell you whether you are looking at a nucleus boundary, textured chromatin inside a nucleus, or smooth cytoplasm. Those distinctions are properties of neighbourhoods, not of single pixels — which is exactly what a feature map captures: a new image, the same size as the input, where each value summarises the pattern present in a small patch around the corresponding input pixel.

The filters below are also the right conceptual bridge to convolutional neural networks. Most of them take a small neighborhood — a kernel sliding across the image — and produce a feature map emphasizing a particular signal: smoothness, boundaries, or texture. The mechanic is the same for every linear filter in this chapter: centre the kernel on a pixel, multiply each kernel weight by the image value directly beneath it, sum those nine products, and write the single resulting number into the output feature map at that pixel. Slide the kernel one position over and repeat. The output ends up the same shape as the input, but each value now reports how strongly the pattern encoded by the kernel is present at that location. Different kernels emphasise different patterns; CNNs generalize this idea by learning the best kernels from data. Studying handcrafted filters first makes that leap concrete.

A 3×3 kernel hovering over an image, computing a dot product at each pixel position.

7.2 The Kernel Convolution Widget

Before exploring specific filters, build intuition for how any linear filter works. The widget shows a 3×3 kernel sliding over a small image of pixel values (0–1). Edit the kernel weights and watch the output feature map update live.

How to use: Click any pixel to see the dot-product breakdown. Edit the nine kernel weights to define any 3×3 filter. Use the preset buttons to load common filters — the active preset is highlighted blue. Click ▶ Play to animate the kernel scan row by row (the output feature map fills in live). Use Prev / Next to step manually, or ⏮ to reset.

The key insight: changing the kernel weights changes what the feature map emphasizes. CNNs learn these weights automatically from data.

7.3 Padding: What Happens at the Image Border

When you click the corner of the image in the widget above, the kernel’s 3×3 window cannot fit entirely inside the image — there are no pixels at row \(-1\) or column \(8\). To compute an output value at every input position, we have to invent values for the missing positions. The choice of how to invent them is called the padding mode.

Zero-padding — what the widget does. The faint grey halo of 0.00 cells around the 8×8 image is not decoration: it is the implicit border the widget uses whenever the kernel pokes off the image. Off-image positions contribute \(w \times 0 = 0\) to the dot product, and the readout panel marks those terms with a small pad subscript so you can see exactly which contributions came from the border. With a 3×3 kernel and a single ring of zero-padding, the output stays the same shape as the input (8×8). This is called “same” convolution.

Output size formula. For a 1-D image of length \(n\), kernel size \(k\), padding \(p\) (added to each side), and stride 1, the output length is \[\text{out} = n - k + 2p + 1.\]

Plug in \(n=8, k=3, p=1\) and you get \(\text{out}=8\) (the same convolution the widget does). Drop padding entirely (\(p=0\)) and the output shrinks: \(\text{out}=6\).

Three common alternatives.

Valid (no padding). Only positions where the kernel fully fits inside the image are computed; the output shrinks (\(8\times 8 \to 6\times 6\) for a 3×3 kernel). The manual-convolution snippet you’ll see in chapter 8 uses this mode.
Reflect. Mirror the image across the border, so pixel(-1, c) becomes pixel(1, c). This avoids the artificial darkening that zero-padding imposes on a blur kernel near edges, because the reflected pixel has roughly the same intensity as its neighbour.
Replicate (edge). Copy the boundary pixel outward, so pixel(-1, c) becomes pixel(0, c). Cheaper than reflect and very common in image-processing libraries.

In PyTorch, the option nn.Conv2d(..., padding=1) is exactly the zero-padding mode you saw here; that is why every convolution in chapter 8’s MinimalCNN keeps its feature maps at \(256 \times 256\).

7.4 Raw Intensity and Color-Channel Maps

Before you engineer any feature, notice that the image already hands you a few for free. Every pixel comes with one or more numbers attached to it, and each of those numbers is itself a feature map — a value defined at every \((x,y)\) location.

These are your base channels, the raw input to everything that follows in this chapter. If a grayscale image is enough to separate nucleus from cytoplasm from background, you may not need to engineer anything else. More often, you’ll use them as the starting material that the filters below transform into something more discriminative.

What you get for free:

Grayscale image: a single intensity map \(I(x,y)\).
Color image: three maps stacked together, \[I(x,y) = \big(R(x,y),\,G(x,y),\,B(x,y)\big)\] Each channel — red, green, blue — is its own feature map you can feed to a classifier.

What you can derive without filtering. A few simple recombinations of the channels are still “free” features in the sense that no neighborhood operation is involved — you’re just remixing the numbers at each pixel:

A grayscale projection, \[Y(x,y)=0.2126\,R(x,y)+0.7152\,G(x,y)+0.0722\,B(x,y),\] which collapses color into a single brightness map.
A conversion from RGB to HSV, giving you hue \(H(x,y)\), saturation \(S(x,y)\), and value \(V(x,y)\) as three new candidate maps. Stained nuclei often pop out in saturation even when they look similar to cytoplasm in brightness.

Why this is the right place to start. For your nucleus / cytoplasm / background problem:

Nucleus often differs from cytoplasm by darkness or stain concentration — visible in \(I\) or \(V\).
Cytoplasm may separate more cleanly in one color channel than another.
Background is often relatively uniform in value or saturation.

So before reaching for a Sobel edge map or a Gabor filter, ask yourself: which channel already gives the cleanest visual separation between the three classes? That channel becomes your baseline feature, and every engineered feature from here on is judged by whether it adds something the raw channels missed.

Quiz: Raw Channels as Features

When we say a grayscale image’s intensity \(I(x,y)\) is itself a feature map, what does that imply about the role of raw channels in feature engineering?

They are the baseline features the image already provides — every engineered filter in this chapter is built from these raw channels and is justified only if it adds something the raw channels alone cannot capture

Raw channels are not features; only the output of a kernel convolution counts as a feature, so we always have to engineer at least one filter before doing classification

Raw channels are useful only for visualization; feature engineering ignores them and starts from edge maps and texture descriptors

Raw channels are interchangeable with engineered features — using R, G, B is mathematically equivalent to using Sobel and Gabor outputs, so either choice gives the same classifier

Quiz: HSV Saturation

A staining protocol leaves nuclei a deep purple while the cytoplasm is pale pink and the background is white. Why might converting to HSV and using the saturation map \(S(x,y)\) help separate these regions, even when the brightness alone does not?

Saturation measures how “colorful” each pixel is, regardless of how bright it is. Stained nuclei can have similar brightness to cytoplasm but much higher saturation, so \(S\) shows a sharp contrast where the brightness map shows little

Saturation increases the spatial resolution of the image, making fine structures inside the nucleus easier to see than in the original brightness map

HSV is a denoising transform — converting to saturation suppresses random pixel noise the way a Gaussian blur does

Saturation is computed by taking a Sobel gradient of the V channel, which is why it picks up boundaries the brightness map misses

7.5 Gaussian Blur and Mean Blur

These are the clearest examples of a kernel sliding across the image — exactly what the widget above demonstrates.

A mean blur with kernel \(K\) computes: \[F(x,y)=(K * I)(x,y)=\sum_{u=-r}^{r}\sum_{v=-r}^{r} K(u,v)\,I(x-u,y-v)\]

For a \(3\times 3\) mean filter: \[K_{\text{mean}}=\frac{1}{9} \begin{bmatrix} 1&1&1\\ 1&1&1\\ 1&1&1 \end{bmatrix}\]

A Gaussian blur uses a kernel whose values follow a 2D normal distribution: \[G_\sigma(u,v)=\frac{1}{2\pi\sigma^2}\exp\!\left(-\frac{u^2+v^2}{2\sigma^2}\right)\]

Why this matters:

Reduces random pixel noise before stronger feature extraction.
Produces a coarse low-frequency map that separates broad cell regions from background.
Makes downstream edge maps more stable.

Caution

Too much blur weakens the boundaries you want to preserve. Apply conservatively, and always before edge detection.

7.5.1 On a Real Cell Image

Show code

import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter

img = np.load('imagedata/X/7.npy').mean(axis=0)
out = gaussian_filter(img, sigma=2)

fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].imshow(img, cmap='gray'); ax[0].set_title('Original'); ax[0].axis('off')
ax[1].imshow(out, cmap='gray'); ax[1].set_title('Gaussian σ=2'); ax[1].axis('off')
plt.tight_layout()

Quiz: Why Blur First

The chapter recommends applying a Gaussian blur before computing edge or texture features. What is the primary reason?

Random pixel noise produces large local intensity changes that derivative filters like Sobel will misread as edges. Smoothing first averages out the noise so the gradient responds to real boundaries instead of speckle

Gaussian blur sharpens edges and increases contrast, making boundaries easier to detect

Gaussian blur converts a color image into a single grayscale channel, which is required by Sobel and Gabor

Gaussian blur removes the dependence on illumination, so the same edge will produce the same gradient magnitude regardless of overall brightness

Quiz: Gaussian vs Mean Kernel

Both the mean and Gaussian filters compute a weighted sum of pixels in a sliding window. What is the essential difference between their kernels?

The mean kernel weights every pixel equally (all \(1/9\) for a 3×3). The Gaussian kernel gives the most weight to the center pixel and progressively less to pixels farther from the center, following a 2D normal distribution

The mean kernel uses positive weights everywhere, while the Gaussian kernel uses positive weights at the center and negative weights at the edges

The mean kernel is applied once, while the Gaussian kernel is applied repeatedly (one pass per pixel) until the image converges

The mean kernel sorts the values before averaging them, while the Gaussian kernel multiplies them — this is the same distinction as median vs. mean

7.6 Median and Rank Filters

Unlike mean and Gaussian blur, a median filter does not multiply and sum — it sorts the values in a local window and picks the middle one. This single difference makes it remarkably effective at removing isolated noise spikes without blurring edges.

7.6.1 How it works — step by step

For every output pixel, the filter:

Places a 3×3 window centred on that pixel — collecting 9 values.
Sorts those 9 values from smallest to largest.
Returns the 5th value (rank 5 of 9) — the median.

\[F(x,y)=\operatorname{median}\{I(i,j):(i,j)\in W_{x,y}\}\]

Example — a salt spike (value 9) buried in the nucleus (value 1):

Step	Values
Window contents	1, 1, 1, 1, 9, 1, 1, 1, 1
Sorted	1, 1, 1, 1, 1, 1, 1, 1, 9
Output (rank 5)	1 ✓ spike removed

A mean filter would give \(\frac{8\times1+9}{9} \approx 1.9\) — a residual artifact. The median gives exactly 1, because the spike is pushed to the end of the sorted list and never reaches rank 5.

Example — a pepper spike (value 0) in the bright background (value 8):

Step	Values
Window contents	8, 8, 8, 8, 0, 8, 8, 8, 8
Sorted	0, 8, 8, 8, 8, 8, 8, 8, 8
Output (rank 5)	8 ✓ spike removed

7.6.2 Interactive Median/Rank Filter Explorer

The grid below uses values 0–9 (0 = black, 9 = white), matching the real staining convention: nucleus = 1 (dark), cytoplasm = 5 (medium), background = 8 (bright). Two noise spikes are planted: a salt spike at row 3, col 4 (value 9 in the dark nucleus — click it first) and a pepper spike at row 7, col 3 (value 0 in the bright background). Click any pixel to see its 3×3 window, the sorted values, and the median output. Click ▶ Play to animate the filter scan pixel by pixel (output fills in live), or use Prev / Next to step manually.

7.6.3 Rank filters — a generalisation

A rank filter returns the k-th ordered value rather than always the median:

\[F(x,y)=\operatorname{rank}_k\{I(i,j):(i,j)\in W_{x,y}\}\]

k	Name	Effect
1	Erosion (minimum)	Shrinks and darkens bright regions
5 of 9	Median	Removes spikes; preserves edges
9	Dilation (maximum)	Expands and brightens bright regions

Why this matters:

Excellent at suppressing impulse noise (salt-and-pepper artifacts).
More edge-preserving than mean blur — the output is always one of the actual neighborhood values, never a blend.
Cleans isolated specks in the background while keeping cell boundaries sharp.

Note

Unlike mean and Gaussian blur, the median filter cannot be written as a convolution \(K * I\) with a fixed kernel — it is inherently nonlinear. This is why it does not appear as a preset in the kernel widget above.

7.6.4 On a Real Cell Image

Show code

import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import median_filter

img = np.load('imagedata/X/7.npy').mean(axis=0)
out = median_filter(img, size=5)

fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].imshow(img, cmap='gray'); ax[0].set_title('Original'); ax[0].axis('off')
ax[1].imshow(out, cmap='gray'); ax[1].set_title('Median 5×5'); ax[1].axis('off')
plt.tight_layout()

Quiz: Median Preserves Edges

Why does the median filter preserve edges so much better than mean or Gaussian blur?

The median is always one of the actual pixel values in the window — it never invents a new in-between value. At an edge the window contains values from two regions; the median picks one or the other (whichever side has more pixels in the window), keeping the boundary sharp

The median filter uses a smaller kernel than mean or Gaussian, so it averages over fewer pixels and naturally blurs less

The median filter detects edges first and then skips applying any operation to those pixels, leaving the boundary untouched

The median filter is linear, so it commutes with derivatives — applying it does not affect the gradient magnitude that defines the edge

Quiz: Maximum Rank Filter

A rank filter with \(k = 9\) on a 3×3 window returns the maximum value in the window. Applied to a grayscale image, what is the visual effect?

Bright regions expand: each pixel becomes the brightest value in its 3×3 neighborhood, so bright objects grow outward by one pixel and small dark specks inside them get filled in. This is morphological dilation of the bright regions

Bright regions shrink: taking the maximum erodes them by one pixel because the kernel only fires at the brightest pixel in its window

The image is replaced by its standard deviation: \(k=9\) measures the spread of the window rather than picking a single value

The image is left unchanged: \(k=9\) is the “do nothing” rank because picking the largest of nine values is the same as picking any of them when intensities are similar

7.7 Sobel / Scharr Gradient Magnitude

Sobel estimates local image derivatives using two fixed kernels:

\[K_x= \begin{bmatrix} -1&0&1\\ -2&0&2\\ -1&0&1 \end{bmatrix}, \qquad K_y= \begin{bmatrix} -1&-2&-1\\ 0&0&0\\ 1&2&1 \end{bmatrix}\]

Gradient components and magnitude: \[G_x = K_x * I,\quad G_y = K_y * I,\quad M(x,y)=\sqrt{G_x^2 + G_y^2}\]

Load Sobel X or Sobel Y in the widget to see how these kernels respond to edges in the example image.

Why this matters:

Highlights nucleus boundaries and cell boundaries directly.
Bright responses in \(M\) align with visible contours in the image.
One of the most interpretable handcrafted maps for segmentation.

Caution

Derivative-based maps are noise-sensitive. Apply a Gaussian blur first.

7.7.1 Live Sobel on a Real Cell Image

Pick a cell from the dropdown, then slide between \(G_x\), \(G_y\), and the gradient magnitude \(|G|\). Increase the pre-blur σ to see how Gaussian smoothing tames noise before differentiation. Every combination has been pre-rendered at build time so the slider feels live.

Show pre-render code

import os
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import sobel, gaussian_filter

IMAGE_INDICES = [7, 20, 80, 170]
BASE = 'images/chapter7/sobel_filter'
SIGMAS = [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4]

for idx in IMAGE_INDICES:
    out_dir = f'{BASE}/img{idx}'
    os.makedirs(out_dir, exist_ok=True)
    img = np.load(f'imagedata/X/{idx}.npy').mean(axis=0)
    plt.imsave(f'{out_dir}/original.png', img, cmap='gray')
    for si, s in enumerate(SIGMAS):
        smoothed = img if s == 0 else gaussian_filter(img, sigma=s)
        gx = sobel(smoothed, axis=1)
        gy = sobel(smoothed, axis=0)
        mag = np.hypot(gx, gy)
        plt.imsave(f'{out_dir}/sobel_d0_s{si}.png', gx, cmap='gray')
        plt.imsave(f'{out_dir}/sobel_d1_s{si}.png', gy, cmap='gray')
        plt.imsave(f'{out_dir}/sobel_d2_s{si}.png', mag, cmap='gray')

Quiz: Sobel Gradient

The Sobel operator computes \(G_x\) and \(G_y\) with two separate kernels, then combines them into a magnitude map \(G = \sqrt{G_x^2 + G_y^2}\). What does a high value in this magnitude map indicate?

A location where intensity changes rapidly in at least one direction — an edge or boundary between two regions

A bright pixel — one whose absolute intensity is high relative to the image mean

A location of repeating texture at a specific spatial frequency, similar to what a Gabor filter detects

A pixel where the Gaussian-blurred image differs significantly from the original, indicating noise

Quiz: Sobel Noise Sensitivity

The chapter cautions that Sobel is “noise-sensitive” and recommends applying a Gaussian blur first. Why are derivative filters like Sobel especially vulnerable to noise?

A derivative measures the difference between neighbouring pixels. A single noisy pixel that jumps high or low creates a large local difference, so \(|G|\) spikes there even when no real edge is present. Smoothing first averages the spike away before the derivative is taken

Sobel kernels have integer weights, so they cannot represent fractional intensity changes accurately. Pre-blurring converts the image to floating-point, which the kernels can then handle

Sobel only works on perfectly uniform regions. Any variation — including the natural texture of cytoplasm — confuses the operator, and Gaussian blur removes that texture entirely

Without pre-blurring the Sobel kernels overflow and saturate to 255, making the output uniform white. Gaussian blur first reduces the dynamic range so the result fits

7.8 Gabor Filters

A Gabor filter is a Gaussian-modulated sinusoid — it detects texture at a specific orientation and spatial frequency. Where Sobel detects any edge regardless of frequency, Gabor responds selectively to periodic patterns (e.g., chromatin texture in a nucleus).

7.8.1 From Formula to Matrix: Sampling a Continuous Function

A common point of confusion: the Gabor formula looks abstract, but it is just a recipe for filling in a kernel matrix. The same principle applies to the Gaussian kernel — both are continuous functions that you evaluate at discrete integer pixel offsets to produce the actual numbers in the sliding window.

For a 3×3 kernel, the offsets are \((a, b) \in \{-1, 0, +1\} \times \{-1, 0, +1\}\), where \(a\) is the row offset (negative = up, positive = down) and \(b\) is the column offset (negative = left, positive = right). Plug each pair into the formula to get the corresponding matrix entry:

\[K[a,\,b] = \underbrace{\exp\!\left(-\frac{x'^2 + \gamma^2 y'^2}{2\sigma^2}\right)}_{\text{Gaussian envelope}} \cdot \underbrace{\cos\!\left(\frac{2\pi\, x'}{\lambda}\right)}_{\text{sinusoidal wave}}\]

\[x' = a\cos\theta + b\sin\theta, \qquad y' = -a\sin\theta + b\cos\theta\]

Two pieces working together:

The Gaussian envelope (\(\exp\) term) weights entries by distance from center — full weight at \((0,0)\), tapering toward the edges. This is exactly like a Gaussian blur kernel.
The sinusoidal wave (\(\cos\) term) creates alternating positive and negative bands across the kernel. \(\lambda\) sets the band width; \(\theta\) controls which direction the bands run.

Multiply the two and you get a kernel that responds to a specific texture frequency at a specific orientation.

7.8.2 What θ Does: Rotating the Stripe Pattern

The angle \(\theta\) rotates the coordinate system before the cosine is applied. At θ = 0°, \(x' = a\) (the row direction), so the cosine varies across rows → horizontal bands. At θ = 90°, \(x' = b\) (the column direction), so the cosine varies across columns → vertical bands.

Here are the two kernels computed from the formula with \(\lambda=2\), \(\sigma=1\), \(\gamma=1\):

\[K_{0°} \approx \begin{bmatrix} -0.37 & -0.61 & -0.37 \\ 0.61 & 1.00 & 0.61 \\ -0.37 & -0.61 & -0.37 \end{bmatrix} \qquad K_{90°} \approx \begin{bmatrix} -0.37 & 0.61 & -0.37 \\ -0.61 & 1.00 & -0.61 \\ -0.37 & 0.61 & -0.37 \end{bmatrix}\]

\(K_{0°}\) fires when the center row is bright and the rows above and below are dark — a horizontal band crossing the kernel. \(K_{90°}\) fires when the center column is bright and the columns left and right are dark — a vertical band. Notice they are transposes of each other; rotating by 90° is equivalent to transposing the matrix.

For diagonal orientations (45°, 135°), the stripe pattern tilts, but a 3×3 grid is too coarse to display the diagonal clearly. A 5×5 kernel makes the orientation far more legible — which is why the widget below uses 5×5.

7.8.3 The Other Three Parameters

Parameter	Concrete meaning	Smaller value	Larger value
\(\lambda\)	Stripe width (wavelength)	Tight stripes — detects fine texture	Wide stripes — detects coarse texture
\(\sigma\)	Gaussian bell width	Small window, few stripes visible	Large window, more stripes captured
\(\gamma\)	Aspect ratio of the Gaussian ellipse	Elongated along the stripe direction	More circular envelope

A useful rule of thumb: keep \(\sigma \approx \lambda / \pi\) so that roughly one full stripe cycle fits within the Gaussian envelope.

7.8.4 Interactive Gabor Explorer

Edit the 10×10 image (click any cell to change its value), then drag the sliders to see the 5×5 kernel and its output feature map update live.

The default image has horizontal stripes — try θ = 0° first, then rotate to 90° and watch the output respond differently. Red cells in the output mean a strong positive match; blue cells mean a strong negative response (the inverse of the target texture).

How to use: Drag θ from 0° to 90° and watch the kernel’s stripe pattern rotate — the output’s bright-red regions shift accordingly. Try λ = 2 for tight stripes vs λ = 6 for wide ones. Increase σ to widen the Gaussian envelope and let more rows contribute. Click any input cell to type a new value (0.0–1.0), then draw your own texture and observe how the output responds.

Load the Gabor (0°) preset in the kernel convolution widget at the top of the chapter to see this same 3×3 version applied to the fixed example image.

7.8.5 Gabor Filter Banks

In practice a filter bank is used: multiple Gabor filters at several orientations (0°, 45°, 90°, 135°) and scales. Each filter produces one feature map. Together they form a rich multi-channel texture descriptor.

Why this matters for nucleus / cytoplasm / background:

Nucleus has distinctive chromatin texture — fine-grained periodic patterns visible at specific orientations and frequencies.
Cytoplasm is smoother and less structured.
Background is largely uniform, responding weakly to all Gabor filters.
A Gabor filter bank can distinguish these regions where Sobel or blur alone cannot.

Note

Gabor filters are linear convolutions — they are fixed kernels applied via \(F = K_{\text{Gabor}} * I\). A CNN trained on texture data tends to learn filters that closely resemble Gabor kernels in its early layers.

7.8.6 Live Gabor on a Real Cell Image

Pick a cell from the dropdown, then drag the sliders to apply a Gabor filter with the chosen orientation θ, frequency, and σ. The right-hand pane updates instantly — every combination has been pre-rendered at build time so the slider feels live.

Show pre-render code

import os
import numpy as np
import matplotlib.pyplot as plt
from skimage.filters import gabor

IMAGE_INDICES = [7, 20, 80, 170]
BASE = 'images/chapter7/gabor_angles'

ANGLES = [i * 10 for i in range(18)]            # 0, 10, ..., 170
FREQS  = [0.10, 0.15, 0.20, 0.25, 0.30,
          0.35, 0.40, 0.45, 0.50]
SIGMAS = [1, 2, 3, 4, 5, 6, 7]

for idx in IMAGE_INDICES:
    out_dir = f'{BASE}/img{idx}'
    os.makedirs(out_dir, exist_ok=True)
    img = np.load(f'imagedata/X/{idx}.npy').mean(axis=0)
    plt.imsave(f'{out_dir}/original.png', img, cmap='gray')
    for ai, a in enumerate(ANGLES):
        for fi, f in enumerate(FREQS):
            for si, s in enumerate(SIGMAS):
                real, _ = gabor(img, frequency=f, theta=np.deg2rad(a),
                                sigma_x=s, sigma_y=s)
                plt.imsave(f'{out_dir}/gabor_a{ai:02d}_f{fi}_s{si}.png',
                           real, cmap='gray')

Quiz: Gabor Filters

A Gabor filter bank uses four orientations: 0°, 45°, 90°, and 135°. Why use all four when segmenting urothelial cells?

Chromatin texture within nuclei varies in direction — a filter tuned to one orientation may miss texture running in another direction, so all four are needed to detect the nucleus reliably regardless of its orientation

Each orientation targets a different colour channel: 0° responds to red, 45° to green, 90° to blue, and 135° averages all three

More orientations increase the effective kernel size, giving the filter a larger receptive field to capture global structure across the whole cell

Using four orientations is equivalent to applying the filter four times with different random seeds, reducing variance through averaging

Quiz: Gabor Wavelength

In the Gabor formula, \(\lambda\) controls the wavelength of the cosine — the spacing of the bright/dark stripes in the kernel. Suppose chromatin granules in a nucleus appear as fine repeating dots roughly 3 pixels apart. Should you use a small \(\lambda\) or a large \(\lambda\) to detect them, and why?

A small \(\lambda\) (around 3 pixels) — the stripe spacing of the kernel must match the spatial frequency of the texture you want to detect. A small wavelength produces tightly spaced stripes that resonate with fine repetitive patterns; a large \(\lambda\) would only respond to coarse, slowly-varying texture

A large \(\lambda\) — the wider the kernel stripes, the more pixels they cover, so the filter has more “evidence” to detect any pattern, including fine ones

\(\lambda\) has no effect on what the filter detects — only \(\theta\) (orientation) matters. \(\lambda\) controls how fast the kernel runs at compute time

A small \(\lambda\) — but only because small wavelengths reduce the kernel’s overall magnitude, weakening the response so it does not overwhelm the classifier

7.9 Gray Level Co-Occurrence Matrix (GLCM)

The Gray Level Co-Occurrence Matrix (GLCM) is a fundamentally different kind of feature descriptor. It does not produce a feature map by convolution. Instead, it computes second-order statistics — how often specific pairs of intensity values appear together at a given spatial offset.

7.9.1 Building the GLCM

The offset \(\Delta\). The GLCM is always computed for a specific direction and distance, written as \(\Delta = (\Delta r, \Delta c)\): \(\Delta r\) is how many rows to step, \(\Delta c\) is how many columns. For example:

\(\Delta=(0,1)\) — compare each pixel to its immediate right-hand neighbor (horizontal)
\(\Delta=(1,0)\) — compare each pixel to the one directly below (vertical)
\(\Delta=(1,1)\) — compare each pixel to the one diagonally below-right (45°)

Choosing different offsets reveals whether a texture has directional structure (anisotropy). In practice, all four orientations are often averaged.

What the matrix looks like. For an image with \(N_g\) discrete gray levels, the GLCM is always an \(N_g \times N_g\) matrix — its size depends on the number of gray levels, not the image dimensions. A 256×256 image with \(N_g = 8\) bins produces an 8×8 GLCM; the entire image is summarized into those 64 cells. Formally:

\[C_{\Delta}(i,\,j) = \#\bigl\{(r,c) : I(r,c)=i \text{ and } I(r+\Delta r,\,c+\Delta c)=j\bigr\}\]

Row \(i\), column \(j\) of \(C\) counts how many pixel pairs exist where the first pixel has intensity \(i\) and its neighbor (at offset \(\Delta\)) has intensity \(j\).

Gray levels must be discrete. Our urothelial cell images store intensities as continuous floats in \([0,\,1]\). GLCM requires integer bin labels, so the image is quantized first — the continuous range \([0,1]\) is divided into \(N_g\) equal bins and each pixel is assigned its bin index. Choosing \(N_g=8\) gives bins \(\{0,1,\dots,7\}\); choosing \(N_g=256\) preserves more detail but makes the GLCM much larger and sparser. In practice \(N_g \in \{8,\,16,\,32\}\) is the standard choice for microscopy texture analysis. skimage.feature.graycomatrix() handles this automatically when you pass levels=N_g alongside an integer-rescaled image.

A small example with \(N_g=4\) gray levels and offset \(\Delta=(0,1)\):

\[I = \begin{bmatrix}0&1&2\\1&2&3\\2&3&1\end{bmatrix} \qquad\Rightarrow\qquad C_{(0,1)} = \begin{bmatrix}0&1&0&0\\0&0&1&0\\0&0&0&2\\0&1&0&0\end{bmatrix}\]

Reading the matrix: \(C(0,1)=1\) because the pair \((0\!\to\!1)\) appears once (top-left pixel); \(C(2,3)=2\) because the pair \((2\!\to\!3)\) appears twice (row 1 and row 2).

7.9.2 Haralick Features

The term comes from Robert M. Haralick, who in 1973 published “Textural Features for Image Classification” (IEEE Transactions on Systems, Man, and Cybernetics). He introduced the GLCM framework and derived 14 scalar statistics from it, each capturing a different aspect of texture. The 4 most widely used are listed below; they are collectively called Haralick features in his honor.

The GLCM is rarely used directly as a matrix. Instead, these scalar summaries are extracted:

Feature	Formula	What it captures
Contrast	\(\sum_{i,j}(i-j)^2\,\tilde{C}(i,j)\)	Local intensity variation
Energy	\(\sum_{i,j}\tilde{C}(i,j)^2\)	Textural uniformity
Homogeneity	\(\sum_{i,j}\frac{\tilde{C}(i,j)}{1+\|i-j\|}\)	Diagonal dominance
Correlation	\(\sum_{i,j}\frac{(i-\mu_i)(j-\mu_j)\tilde{C}(i,j)}{\sigma_i\sigma_j}\)	Linear gray-level dependency

where \(\tilde{C}\) is the normalized GLCM (\(\sum_{i,j}\tilde{C}(i,j)=1\)).

7.9.3 Interactive GLCM Explorer

Click any cell in the input image to cycle through gray levels (0–3). Choose an offset direction, then watch the GLCM heatmap and Haralick features update live.

Try these patterns to build intuition:

Uniform — single gray level → GLCM has one entry on the diagonal → Energy = 1, Contrast = 0
Alternating — 2↔︎3 checkerboard → all counts off-diagonal → Contrast high, Homogeneity low
Gradient — left-to-right ramp → counts form a band near the super-diagonal → Correlation high

Key patterns to recognize:

Diagonal-heavy GLCM → uniform or slowly varying texture; neighboring pixels share similar intensities → high Energy, low Contrast
Off-diagonal counts → high-contrast texture; neighbors differ greatly → high Contrast, low Homogeneity
Narrow super-diagonal band → smooth gradient; each pixel is one step brighter than its neighbor → high Correlation

7.9.4 Using GLCM for Segmentation

A single GLCM describes the whole image. For pixel-level segmentation, compute the GLCM in a sliding window (e.g., 15×15 pixels) centered on each pixel. Each window produces one GLCM, yielding one Haralick feature value per pixel — a new scalar feature map.

Why this matters for nucleus / cytoplasm / background:

Nucleus: high contrast (chromatin granules), low homogeneity, high energy in certain orientations.
Cytoplasm: lower contrast, more uniform, higher homogeneity.
Background: very uniform, very high energy, very high homogeneity.

These differences make GLCM features among the most discriminative classical texture descriptors for cell segmentation.

Note

GLCM features capture relationships between pixel pairs, not just individual pixel values. This is called a second-order statistic — Gabor and Sobel are first-order (they operate on single pixel values in a neighborhood). Both types are complementary.

Note

Can a CNN replicate GLCM? Unlike Sobel or Gabor filters — which are linear convolutions and which CNNs do learn to approximate in their early layers — GLCM cannot be reproduced by a single convolutional layer. Convolution is a linear local operation; GLCM computes a joint probability distribution over all pixel-pair intensities within a region, which is a fundamentally non-linear, count-based statistic. No conv kernel sliding over the image can tally co-occurrence frequencies that way.

That said, a sufficiently deep network with a large receptive field can approximate GLCM-like information implicitly — it just won’t produce an interpretable \(N_g \times N_g\) matrix or named Haralick features. GLCM therefore retains a practical advantage wherever interpretability matters: each feature (Contrast, Energy, Homogeneity, Correlation) has a concrete geometric meaning tied to the texture of the image patch.

7.9.5 Live GLCM on a Real Cell Image

Pick a cell from the dropdown, then sweep the Haralick feature, the offset direction, and the window size. For every pixel a small patch around it is taken, its GLCM is built at the chosen offset, and one Haralick scalar is recorded — yielding a feature map. All combinations are pre-rendered at build time.

Show pre-render code

import os
import numpy as np
import matplotlib.pyplot as plt
from skimage.feature import graycomatrix, graycoprops
from skimage.transform import resize

IMAGE_INDICES = [7, 20, 80, 170]
BASE = 'images/chapter7/glcm_widget'
WINDOW_SIZES = [11, 15, 21]
FEATURES = ['contrast', 'energy', 'homogeneity', 'correlation']
ANGLES = [0.0, np.pi / 4, np.pi / 2, 3 * np.pi / 4]
LEVELS = 32
TARGET = 128   # downsample source so sliding-window GLCM finishes in seconds

for idx in IMAGE_INDICES:
    out_dir = f'{BASE}/img{idx}'
    os.makedirs(out_dir, exist_ok=True)
    img_full = np.load(f'imagedata/X/{idx}.npy').mean(axis=0)
    img_f = resize(img_full, (TARGET, TARGET), preserve_range=True, anti_aliasing=True)
    plt.imsave(f'{out_dir}/original.png', img_f, cmap='gray')
    img_q = np.clip(img_f * (LEVELS - 1), 0, LEVELS - 1).astype(np.uint8)
    Hsrc, Wsrc = img_q.shape
    for wi, W in enumerate(WINDOW_SIZES):
        Hh, Ww = Hsrc - W + 1, Wsrc - W + 1
        fm = np.zeros((len(FEATURES), len(ANGLES), Hh, Ww), dtype=np.float32)
        for r in range(Hh):
            for c in range(Ww):
                g = graycomatrix(img_q[r:r+W, c:c+W], [1], ANGLES,
                                 levels=LEVELS, symmetric=True, normed=True)
                for fi, feat in enumerate(FEATURES):
                    fm[fi, :, r, c] = graycoprops(g, feat)[0]
        for fi in range(len(FEATURES)):
            for ai in range(len(ANGLES)):
                plt.imsave(f'{out_dir}/glcm_f{fi}_a{ai}_w{wi}.png',
                           fm[fi, ai], cmap='magma')

Quiz: GLCM Texture

A 15×15 sliding window centred on the nucleus yields a GLCM with high Contrast and low Homogeneity. A window on the background yields low Contrast and high Homogeneity. What does this tell you about the two regions?

The nucleus has large intensity jumps between neighbouring pixels (chromatin granules create rapid local variation), while the background is nearly uniform with neighbouring pixels sharing similar values

The nucleus is brighter on average than the background — GLCM Contrast measures the difference in mean intensity between the two regions

The nucleus window is noisier because more cells overlap there — Contrast increases with the number of objects present in a window

High Contrast means the Gaussian-blurred nucleus differs greatly from the raw image, indicating that denoising removed significant texture

Quiz: GLCM Homogeneity

The Haralick Homogeneity feature is defined as \(\sum_{i,j} \tilde{C}(i,j)/(1 + |i-j|)\). The denominator \((1 + |i-j|)\) down-weights off-diagonal entries. What kind of image patch will produce a high Homogeneity value?

A patch where neighbouring pixels almost always have similar intensity values — for example, a smooth, uniform region of background. Most pairs \((i, j)\) are on or near the diagonal \(i \approx j\) where \(|i-j|\) is small, so the contribution to the sum is large

A patch with strong, repetitive texture like nuclear chromatin — Homogeneity is highest where neighbouring pixels frequently have very different values

A bright patch — Homogeneity is essentially the average pixel intensity, normalized by \(\tilde{C}\)

A patch with high spatial frequency — Homogeneity rewards rapid local oscillations because off-diagonal counts dominate the GLCM

7.10 From Feature Engineering to CNNs

The filters in this chapter form a clean progression:

Raw channels — which intensity channel gives the best separation?
Mean / Gaussian blur — the sliding kernel in its simplest form; smooths noise.
Median filter — a nonlinear local statistic; impulse-noise resistant.
Sobel — derivative kernels; detects where intensity changes rapidly.
Gabor — Gaussian-modulated sinusoids; detects texture at specific orientations and frequencies.
GLCM — second-order statistics; captures relationships between pixel pairs.

At that point the conceptual leap to CNNs is short:

A CNN layer still slides learned kernels over the image — but the kernels are trained from data rather than hand-designed. Early CNN layers learn filters that closely resemble Gaussian, Sobel, and Gabor kernels. Deeper layers combine these to detect higher-level structures.

This is exactly what Chapter 8 covers.

7.11 Domain Context: N/C Ratio and Segmentation

In urinary cytology, an elevated N/C ratio is a central criterion for assessing high-grade urothelial carcinoma. To estimate it computationally, the pipeline must separate nucleus from cytoplasm and both from background. These feature maps are tools for making those three regions more separable — not arbitrary filters.

7.12 Coding Exercises

#| exercise-id: ch7_ex_1.1
# Exercise 7.1: Mean filter
# Apply a 3x3 mean filter using scipy.ndimage.uniform_filter(size=3).
# Display the original and filtered images side-by-side.

import numpy as np
from scipy.ndimage import uniform_filter
import matplotlib.pyplot as plt

np.random.seed(42)
image = np.random.rand(64, 64) * 0.15
image[20:44, 20:44] += 0.75
image = np.clip(image, 0, 1)

# Write your code below:

#| exercise-id: ch7_ex_1.2
# Exercise 7.2: Sobel gradient magnitude
# Compute Gx, Gy, and the gradient magnitude using scipy.ndimage.sobel.
# Display the three maps side-by-side.

import numpy as np
from scipy.ndimage import sobel
import matplotlib.pyplot as plt

np.random.seed(42)
image = np.random.rand(64, 64) * 0.05
image[20:44, 20:44] = 0.9

# Write your code below:

#| exercise-id: ch7_ex_1.3
# Exercise 7.3: Median vs mean filter on impulse noise
# Add salt-and-pepper noise, then compare uniform_filter vs median_filter.

import numpy as np
from scipy.ndimage import uniform_filter, median_filter
import matplotlib.pyplot as plt

np.random.seed(0)
image = np.zeros((64, 64))
image[16:48, 16:48] = 0.8
noise_mask = np.random.rand(64, 64) < 0.05
image[noise_mask] = 1.0

# Write your code below:

#| exercise-id: ch7_ex_1.4
# Exercise 7.4: Gabor filter bank
# Apply skimage.filters.gabor at orientations 0, 45, 90, 135 degrees (theta in radians).
# Use frequency=0.2. Display the real part of each response.

import numpy as np
from skimage.filters import gabor
import matplotlib.pyplot as plt

np.random.seed(1)
image = np.random.rand(64, 64) * 0.05
# Add a "nucleus" region with horizontal texture
for i in range(20, 44, 4):
    image[i, 20:44] = 0.85

# Write your code below:

#| exercise-id: ch7_ex_1.5
# Exercise 7.5: GLCM texture features
# Use skimage.feature.graycomatrix and graycoprops to compute
# contrast, energy, and homogeneity for two regions:
# (a) the bright square (nucleus-like), (b) the dark background.
# Compare the values.

import numpy as np
from skimage.feature import graycomatrix, graycoprops

np.random.seed(42)
image = np.random.rand(64, 64) * 0.15
image[20:44, 20:44] = np.random.rand(24, 24) * 0.4 + 0.55
image = (image * 255).astype(np.uint8)

# Write your code below:
# Hint: graycomatrix expects integer-valued image.
# Use distances=[1], angles=[0], levels=256, symmetric=True, normed=True

7.1 Why Feature Maps Matter

7.2 The Kernel Convolution Widget

7.3 Padding: What Happens at the Image Border

7.4 Raw Intensity and Color-Channel Maps

Quiz: Raw Channels as Features

Quiz: HSV Saturation

7.5 Gaussian Blur and Mean Blur

7.5.1 On a Real Cell Image

Quiz: Why Blur First

Quiz: Gaussian vs Mean Kernel

7.6 Median and Rank Filters

7.6.1 How it works — step by step

7.6.2 Interactive Median/Rank Filter Explorer

7.6.3 Rank filters — a generalisation

7.6.4 On a Real Cell Image

Quiz: Median Preserves Edges

Quiz: Maximum Rank Filter

7.7 Sobel / Scharr Gradient Magnitude

7.7.1 Live Sobel on a Real Cell Image

Quiz: Sobel Gradient

Quiz: Sobel Noise Sensitivity

7.8 Gabor Filters

7.8.1 From Formula to Matrix: Sampling a Continuous Function

7.8.2 What θ Does: Rotating the Stripe Pattern

7.8.3 The Other Three Parameters

7.8.4 Interactive Gabor Explorer

7.8.5 Gabor Filter Banks

7.8.6 Live Gabor on a Real Cell Image

Quiz: Gabor Filters

Quiz: Gabor Wavelength

7.9 Gray Level Co-Occurrence Matrix (GLCM)

7.9.1 Building the GLCM

7.9.2 Haralick Features

7.9.3 Interactive GLCM Explorer

7.9.4 Using GLCM for Segmentation

7.9.5 Live GLCM on a Real Cell Image

Quiz: GLCM Texture

Quiz: GLCM Homogeneity

7.10 From Feature Engineering to CNNs

7.11 Domain Context: N/C Ratio and Segmentation

7.12 Coding Exercises

📚 Gradebook

✏️ Speed Grader

Sign in to save progress