Image Processor


Filters & Warps

“Traditional” image processing techniques can be separated into two broad categories, filters and warps. Filters are image manipulations which alter pixel color, whereas warps alter pixel positions. Filters, warps, their combinations, and machine learning are largely the difference between a modern phone’s 12MP photo, and a 12MP photo taken on a 5-6 year old phone.

I won’t be going over machine learning techniques in this article, but if it wasn’t already evident by the bombardment of tech ads and the news, machine learning is at the cutting edge of image processing.

This article is dedicated to explaining a variety of artistic and practical image filters and warps, all of which I’ve implemented to better understand image processing. For both filters and warps I've listed them in order of ascending complexity and required knowledge, but none are so complex as to require coming in with a high technical background.

I've included pseudocode with all of my explanations, but you can also find all my c++ implementations on Github

Representing Images

Before getting into the nitty-gritty of image transformations you need to understand how images are represented.

Color

Images are represented as a grid of tiny squares called pixels, (crazy I know) which make up images. For grayscale images each pixel has one "channel" value, which represents the brightness of the pixel. For color images each pixel has three channel's: red, green, and blue (in that order). These channels when combined together form the color of the pixels you see on your screen.

All channels of an image share the same valid range of values, however that range can be different depending on the image. Typically a bit-depth of 8 is used, which means each channel is in \([0, 2^8-1]=[0, 255]\). Now this may not sound like a lot, but as the color of a pixel is the combination of its channels, the number of color combinations is \(255^3 = 16,581,375\) colors, for each pixel.

Representing the Pixel Grid

If you've ever worked with 2-D data before then you know the standard representation uses an array of array's. This method does work for images, however it is typically not done for efficiency purposes.

The array's we represent an image with must be dynamic, as we want to have the ability to change the image in any way we want. However this poses a problem with the array of array's approach, as we can no longer guarantee contiguous memory our caching performance will suffer.

Generally we don't have to deal with matrices large enough for us to care, however images can require an incredible amount of elements to represent. For example my phone takes pictures with a resolution of 4080 by 3072. This means I need \(4080 \cdot 3072 \cdot 3 = 37,601,280\) indices (the times three is because of the three color channels).

As such we use a single dynamic array to represent the pixel data, and then manually map the (row, column) positions to the correct indices. Luckily it's fairly easy to calculate this: given a row and column position, the correct pixel index is \((width_{image} \cdot row + col) \cdot |channels|\). For a grayscale image this will give you the one brightness value, for color this calculates the position of the red value. This means for color images to get the green and blue values as well, you need to add 1 and 2 respectively to the computed index.

With all that being said, for simple transformations where per-pixel values are determined independently of all other pixels, we don't even need to muck-about with the index calculation. Instead we can simply iterate over the image with a step equal to the number of color channels.

Table of Contents

Filters

Color Inversion

One of the simplest, if not the simplest image filter. To invert the color channels all you have to do is subtract each channel by the max color value.

To understand why this works you can think about the numberline; the opposite of -2 is 2, for -1 it's 1, and 0 goes to 0. This is exactly what we're doing for each channel, but now the range is [0, max_color]. So assuming 8-bit color, the opposite of 0 is 255, 1 is 254, etc.

You can also geometrically think about a 1-dimensional inverse as simply being the reflection over its midpoint.


  FUNCTION invert_colors()
    FOR i TO img_data_length STEP num_channels
        img_data[i] := max_color - img_data[i];

        IF num_channels = 3 THEN
            img_data[i] := max_color - img_data[i + 1];
            img_data[i] := max_color - img_data[i + 2];
        END IF
    END FOR

  END FUNCTION
          

Grayscale

Grayscale does exactly what it sounds like, it takes all three color channels for a pixel and combines them into one gray channel. The simplest way to do this is to simply set the pixel as the average of its color channels.

The issue with this is that humans perceive the brightness of red, green, and blue differently. As such a more accurate way to grayscale an image is a weighted average which takes this into account. There are two standards for this, REC 709 and REC 601. REC 709 is the modern standard, and used for displaying HD content, whereas 601 is for older SD content.

*Note if you don't wish to switch to a grayscale image format, you can simply set each color channel to the computed gray value.


  FUNCTION grayscale()
    new_img_data := ARRAY OF (img_width * img_height)

    // weights determined by REC 709 
    magic_weight_r := 0.2126
    magic_weight_g := 0.7152
    magic_weight_b := 0.0722

    // num_channels should be 3
    FOR i TO img_data_length STEP num_channels
      gray := img_data[i] * magic_weight_r + img_data[i + 1] * magic_weight_g + img_data[i + 2] * magic_weight_b

      // always divisible by num_channels, as STEP = num_channels
      new_img_data[i / num_channels] := gray
    END FOR
    img_data := new_img_data

  END FUNCTION
          

Thresholding

Thresholding is the most basic form of image segmentation. The purpose of segmentation is to programmatically separate out different parts of an image. This is very useful for visual analysis, and targeting specific area's to apply image filters and/or warps.

Thresholding works by taking some threshold value epsilon, and setting all pixels greater than epsilon to the max color value, and the rest 0.

It is also common to swap the black and white case's, as depending on the image a white subject on a black background may be preferable to a black subject on a white background.


  // assumes image is grayscale
  FUNCTION threshold(epsilon)

    FOR i TO img_data_length STEP num_channels
      IF data[i] > epsilon 
        data[i] := max_color
      END IF
      ELSE
        data[i] := 0
      END ELSE
    END FOR

  END FUNCTION
          

Tint

Tinting is just as it sounds, making an image appear as if it were tinted by a dye. This is mainly done for artistic purposes, and is pretty straightforward process:

  1. Get the (r, g, b) values of the wanted tint color
  2. Normalize the tint channels, i.e. divide each channel by the max possible color value
  3. Multiply each pixel channel by their respective tint channel
  4. (optional) If the output image is dark, multiply each channel by some constant > 1, to brighten up the image. However you must then also ensure the color value is not greater than the max.

  FUNCTION tint(r, g, b, brighten:=1)

    // normalize channels
    r := r / max_color
    g := g / max_color
    b := b / max_color

    FOR i TO img_data_length STEP num_channels
      data[i] := data[i] * r * brighten
      data[i + 1] := data[i + 1] * g * brighten
      data[i + 2] := data[i + 2] * b * brighten

      // if any color channel not in [0, max_color], clips to fit
      clip_pixel(i)
    END FOR

  END FUNCTION
          

Chroma Shift

A chroma shift is done by taking the color channels of all pixels and shifting them left or right by some amount. This creates a “glitchy” effect as the color channels are now misaligned. This algorithm can also use thresholding to great effect, as it allows you to shift only selected parts of the image.

*Note this algorithm should not be done in-place, as that causes you to calculate shifts based on previously shifted channels.


  FUNCTION chroma_shift(r_shift, g_shift, b_shift)
    new_img_data := img_data           

    r_shift := r_shift * num_channels
    g_shift := g_shift * num_channels
    b_shift := b_shift * num_channels

    FOR row TO img_height
      FOR col TO img_width
        pix_index := (width * row + col) * num_channels            

        // conditionals prevent out-of-bounds indexing
        IF pixIndex + r_shift > 0 && pix_index + r_shift < img_data_length
          new_img_data[pix_index + r_shift] := img_data[pix_index]
        END IF

        IF pix_index + g_shift > 0 && pix_index + g_shift < img_data_length
          new_img_data[pix_index + g_shift + 1] := img_data[pix_index + 1]
        END IF

        IF pix_index + b_shift > 0 && pix_index + b_shift < img_data_length
          new_img_data[i + b_shift + 2] := img_data[pix_index + 2]
        END IF
      END FOR
    END FOR

    img_data = new_img_data

  END FUNCTION
          

Salt & Pepper Noise

Salt and pepper noise is a specific type of noise found in grayscale images. It commonly occurs from data transmisson errors, and analog to digital conversion errors. Emulating this noise is a pretty easy:

  1. Grayscale the image if not already
  2. Decide how much of the image you want to be noise
  3. Generate a random pixel index
  4. Set the random pixel black for pepper noise, or white for salt noise
  5. Repeat until you've generated the wanted amount of noise

  // assumes image is grayscale

  FUNCTION salt_noise(noise_density)
    num_salt := noise_density * img_width * img_height

    FOR i TO num_salt
      // assumes range is inclusive
      rand_row := uniform_rand_int(0, img_height - 1)
      rand_col := uniform_rand_int(0, img_width - 1)

      img_data[img_width * rand_row + rand_col] := max_color
    END FOR

  END FUNCTION

  FUNCTION pepper_noise(noise_density)
    num_pepper := noise_density * img_width * img_height

    FOR i TO num_pepper
      // assumes range is inclusive
      rand_row := uniform_rand_int(0, img_height - 1)
      rand_col := uniform_rand_int(0, img_width - 1)

      img_data[img_width * rand_row + rand_col] := 0
    END FOR

  END FUNCTION
          

Warps

Flips

Flips are conceptually quite simple, but have some annoying index computations.

The first thing you need to do for both horizontal and vertical flips, is get the bounds you're iterating over correct. Given that these are symmetrical operations, you only need to iterate over half the image, calculating and swapping the current index with it's inverse position in the wanted dimension.

Calculating the left index in the horizontal case, and the top index in the vertical case is the same. You just use the standard calculation of \((width_{image} \cdot row + col) \cdot |channels|\).

The remaining two index calculations are bit tricky to come up with, so I recommend just performing the calculations on a matrix by-hand to understand all the details.


  FUNCTION horizontal_flip()
    FOR row TO img_height
      FOR col TO (img_width / 2)
        left_index := (width * row + col) * num_channels
        right_index := (( (width * row) + (width - 1) ) - col) * num_channels
        
        pixel_swap(left_index, right_index)
      END FOR
    END FOR

  END FUNCTION

  FUNCTION vertical_flip()
    FOR row TO (img_height / 2)
      FOR col TO img_width
        top_index := (width * row + col) * num_channels
        bottom_index := ( (height - row - 1) * width + col) * num_channels

        pixel_swap(top_index, bottom_index)
      END FOR
    END FOR

  END FUNCTION
          

Reflections

Of the image transformations I've implemented, reflections generate some of the most visually stunning images. And the nice thing is, if you've implemented the image flip algorithm's, you're basically already done. The only difference between flips and reflections is that you don't swap the computed indices, instead you simply override one with the other.


  FUNCTION horizontal_reflection(direction)
    FOR row TO img_height
      FOR col TO (img_width / 2)

        IF direction = 'L'
          FOR i to num_channels
            left_index := (width * row + col) * num_channels + i
            right_index := (( (width * row) + (width - 1) ) - col) * num_channels + i

            img_data[right_index] := img_data[left_index]
          END FOR
        END IF

        IF direction = 'R'
          FOR i to num_channels
            left_index := (width * row + col) * num_channels + i
            right_index := (( (width * row) + (width - 1) ) - col) * num_channels + i

            img_data[left_index] := img_data[right_index]
          END FOR
        END IF
       
      END FOR
    END FOR

  END FUNCTION

  FUNCTION verical_reflection(direction)
    FOR row TO (img_height / 2)
      FOR col TO img_width

        IF direction = 'T'
          FOR i to num_channels
            top_index := (width * row + col) * num_channels + i
            bottom_index := ( (height - row - 1) * width + col) * num_channels + i

            img_data[bottom_index] := img_data[top_index]
          END FOR
        END IF

        IF direction = 'B'
          FOR i to num_channels
            top_index := (width * row + col) * num_channels + i
            bottom_index := ( (height - row - 1) * width + col) * num_channels + i

            img_data[top_index] := img_data[bottom_index]
          END FOR
        END IF
       
      END FOR
    END FOR

  END FUNCTION
          

Rectangular Crop

Once again an image transformation whose main complexity stems from its index calculations.

There are more than one way to crop an image, this simple method only takes in the new origin point, and the new image dimensions. All you need to do is iterate over the new dimensions, and grab the pixels positions from the old image, using the new origin.

Once again to understand the calculation details, I'd recommend simply carrying out this operation on a matrix by hand.


  // assumes new origin is upper left corner of cropped image
  FUNCTION rect_crop(new_origin, new_width, new_height)
    new_img_data := ARRAY OF (new_width * new_height * num_channels)

    FOR row TO new_height
      FOR col to new_width
        old_index := ((new_origin[0] + row) * width + (new_origin[1] + col)) * num_channels
        new_index := (new_width * row + col) * num_channels

        new_img_data[new_index] = img_data[old_index]
        IF num_chanels = 3
          new_img_data[new_index + 1] := img_data[old_index + 1]
          new_img_data[new_index + 2] := img_data[old_index + 2]
        END IF
      END FOR
    END FOR

  END FUNCTION

          

Scaling

Image scaling is a classic example of being quite simple once you know the correct answer, but can be dubious otherwise.

The first step is obvious, let's compute the dimensions of the new image using our scale factors. Now let's take the logical next step, for each pixel in the unscaled image, map it to the new image by multipling it's indices by their respective scale factors.

Bam, done, all the pixels have been mapped from the old image to the new. And while this is true, if you think about what you've done, you'll realize most of the new image is empty.

The problem here being your intution, as if it's lead you down this path, you've been lead astray. While maybe unintutive, what you need to do is work backwards from the empty scaled image. More specifically, iterate over the empty scaled image, and multiply each pixel's coordinates by the inverse of their respective scale factors. Now use this to index into the original image, and set your new pixel to that pixel.

The only issue with this method is that you will get decimal index value's, you can deal with this by simply rounding, or taking the floor of the index. It should also be noted that this technique is called nearest neighbor interpolation, and there are other more advanced methods which result in smoother scaling. However as of yet I haven't had the time to implement them, so I won't be going over them until then.


  FUNCTION scale(width_scale_factor, height_scale_factor)
    new_width := img_width * width_scale_factor
    new_height := img_height * height_scale_factor

    inv_width_scale_factor := 1 / width_scale_factor
    inv_height_scale_factor := 1 / width_height_factor

    new_img_data := ARRAY OF (new_width * new_height * num_channels)

    FOR row TO new_height
      FOR col TO new_width
        new_index := (new_width * row + col) * num_channels

        // nearest neighbor interpolation
        old_row_index := round(row * inv_width_scale_factor)
        old_col_index := round(col * inv_height_scale_factor)
        old_index := (img_width * old_row_index + old_col_index) * num_channels

        new_img_data[new_index] := img_data[old_index]
        IF num_channels = 3
          new_img_data[new_index + 1] := img_data[old_index + 1]
          new_img_data[new_index + 2] := img_data[old_index + 2]
        END IF
      END FOR
    END FOR

    img_data := new_img_data
    img_width := new_width
    img_height := new_height

  END FUNCTION
          

Additional Image Processing Resources

If you want a more formal understanding of image processing the Gonzalez & Woods textbook is to my understanding the standard textbook for image processing; personally I’ve been going through it and it’s been very good. There’s also some great lectures on YouTube by Rich Radke, which go over a good portion of the book. With that being said the content is not the most easily digestible, especially if you haven’t taken college level math courses.

The YouTube channel ComputerPhile has an image processing playlist that is quite good; their channel is also a great resource for a variety of CS related topics. If you’re more interested in artistic image processing, The Coding Train is a treasure trove of artistic image manipulations and animations.

Finally the last resource I have to recommend is ChatGPT. If it generates code for you there’s a good chance it won’t work, and it may get small details wrong, but I’ve found it the best place to start when researching how to create a certain effect. Usually I use it as jumping off point to dig deeper into a topic, as it gives you enough background information to more easily search google, and understand more technical articles.