Details can be found in this link: https://inst.eecs.berkeley.edu/~cs180/fa24/hw/proj1/
Essentially, from an image with separate RGB channels, we first align the channels and then create the corresponding RGB image.
Using skimage
and numpy
, we can easily extract image data and align the channels and create the RGB image.
Images consist of 3 vertically stacked images of same size that are split in 3 channels.
Cathedral | Tobolsk | Monastery | Harvesters | Melons | Onion Church | Self Portrait | Three Generations | Train | Sculpture | Lady | Church | Emir | Icon |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Therefore, we can extract the 3 channels by splitting the image into 3 parts.
The three channels as images are not aligned and therefore we need to align them. We can do this by cropping the borders and then using the skimage
library to align the images. We also crop the borders so that they do not interfere with the alignment. I crop 15% of the outer borders for convenience.
To align them, we think of displacing a channel to left/right/top/bottom with respect to the other channels. The best alignment either minimizes the L2 norm of the difference between the channels or maximizes the normalized cross-correlation (NCC) between the channels. Note that the window of displacement is important because it affects performance, so we don't want to large of a window, but at the same time we don't want to small of a window otherwise we might miss the best alignment. For small images, a window of 15 pixels is sufficient.
Once aligned, we can simply stack the images to create the RGB image.
After trying both L2 norm and NCC, I found that NCC works better. The output image for small images is as follows:
Cathedral | Tobolsk | Monastery |
---|---|---|
For large images, we need to use a larger window size for alignment. But it's unreasonable to use a large window size for the entire image because it would be computationally expensive. Therefore, we can use a pyramid approach where we first downsample the image and then align the downsampled image. Once aligned, we can upsample the image and then align the image again. We can repeat this process until we reach the original image size. The algorithm is as follows:
This is implemented recursively in the align
function. This way, we reduce the computational cost of aligning large images by having a smaller window size for alignment in total.
This actually takes 24 minutes in total to run on a MacBook Pro 64 Gb RAM, M1 Max Processor for a total of 11 TIFF images. This means an average of 2 minutes per image.
The output image for large images is as follows:
Harvesters | |
Melons | |
Onion Church | |
Self Portrait | |
Three Generations | |
Train | |
Sculpture | |
Lady | |
Church | |
Emir | |
Icon |
The results show promise and the algorithm works well for both small and large images. The alignment is accurate and the RGB images are created successfully.