How recognising images works – when you’re an algorithm

There are many different types of image recognition. The main aim of such practises is to correctly match a new picture to an existing group of pictures. In some instances it is not dissimilar to how a clustering algorithm works, in that common features are identified and grouped together (Moosmann et al., 2008).

Although several methods of image recognition exist, this blog post will give an overview of the method of facial image recognition that has been patented by Tsai (2004). This invention creates a method of Multiple Stage Face Image Identification. Before any image recognition can begin, the picture has to be processed – this is done using the following three steps:

1. Inputting an original picture of a face

2. Trimming the picture so that it contains only a complete facial image, before processing can begin

3. Decompose the picture into N resolutions, each having M channels, where N x 2 and M x 2, so that the facial image is decomposed into N x M sub-images

Once the picture has been processed, the learning stage can commence. A front-on facial image with a neutral expression is used as a learning image. The sub-images, resulting from the processing, are inputted into a N x M self-organising map neural network. This is then used to perform unsupervised classification learning. Unsupervised learning can be explained by supposing that there exists a set of N observations (x1, x2, x3,… , xN) of a random p-vector X, having joint density Pr(X). The desired outcome is to directly infer the properties of this probability density function without the aid of a supervisor, or teacher, providing the answers (Hastie et al., 2009). This estimation is done using a variety of statistical methods. The method depends upon the size of the data N and its complexity. 

When the neural networks have completed a predetermined learning process, the sub-images of the learning image are input, for a second time, into M neural networks to complete the learning process. Each neural network generates a winning unit, with a correct identification. The learning process creates a recognition decision process for determining distances from the M winning units to the winning units of each learning image. This is put into a corresponding self-organising map neural network, which finds all possible candidates. If there is only one candidate, the candidate automatically becomes the winner and the decision process is finished. When there is more than one candidate, the candidates are retained to perform a decision process in a relatively high level of resolution. A summary of the end to end learning process is shown in Figure 1. 

Once the learning process has been defined using the test image, other images can be matched. An original image needs to be identified so that other pictures can be matched to it. The image to be identified then undergoes the picture processing stage previously outlined. These sub-images are then input into the neural net where the learning process, a product of unsupervised learning, is applied. Winning units should be produced, matching the new image to the original. 

This is just an overview on one specific algorithm, but there are so many out there! Image recognition is really useful for identifying and categorising your products in public posts, which could be used for future promotion. If you have any questions on image recognition or have a related project please get in touch with us. 


Hastie, T., Tibshirani, R. and Friedman, J. (2009),Unsupervised Learning, Springer New York, New York, NY, pp. 485–585.

Moosmann, F., Nowak, E. and Jurie, F. (2008), ‘Randomized clustering forests for image classification’,IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1632–1646.

Tsai, K.-C. (2004), ‘Method of multi-level facial image recognition and system using the same’. US Patent 6,697,504.

Written by

Subscribe to our newsletter: