9 Dec 2022

Benefits of AI-based text recognition in the reader pen

One of the main functions of a reader pen is to recognise printed text. Usually, this is achieved by using mathematical algorithms that interpret the text, a method that isn’t all that flexible and reliable. Artificial intelligence (AI) makes optical character recognition more intuitive and exact.

Making OCR more robust with AI

By using artificial intelligence, optical character recognition (OCR) works more like when you train yourself to decode text. Often you recognise the letters you see, but if you stumble across something unknown, you ask somebody who will tell you what it is. Next time you see the letter, you know what it is. In the same way AI-based OCR, unlike an algorithm-based OCR that works by trying to figure out font similarities, focuses more on how the font looks and the context it is in, making it less sensitive to minor deviations.

For example, sometimes an algorithm-based OCR cannot handle a text with distorted letters, while an AI-based one will see past the distortion and correctly identify the letter. If you have poorly printed text – a letter is cut off because of scratch marks, or the print is too bad – there will be a break in the line. When the algorithm analyses a letter with a gap pixel by pixel, the line suddenly goes from black pixels to white pixels. The algorithm interprets this as the letter’s end and thinks a new character has begun when the pixel turns black again. So, if using an algorithm, you must mathematically try to describe what is a typical distance between two letters and what is an incorrect one.

Recognising more fonts

Similarly, suppose you have underlined text and a letter that touches it, such as lowercase “p” or “g”. In that case, the algorithm-based OCR could assume that the line is part of the character because they are connected and therefore fail to see what kind of letter it is. An AI-based solution, given proper training, sees past that and realises that the line has nothing to do with the letter and can interpret the character correctly. Overall, an AI OCR sees the letter in a larger context and becomes more robust.

For the programmer, the AI-based OCR is more flexible and more accessible to adjust than an algorithm-based one. An algorithm-based OCR programmed for fonts like regular Arial or Times New Roman doesn’t recognise characters if you try to scan a comic book with a strange font or German Fraktur style. To successfully scan that font you must rewrite the algorithm, which is quite a huge task. With an AI-based OCR, you can teach the engine by scanning text with the new font and then telling the engine what it means. If you feed it enough text and explain the input data, it will eventually learn that font. You train it, rather than mathematically figuring out what a font looks like.

AI-based OCR can create a larger context based on various input combinations. You also get some built-in intelligence that handles deviations in a better, more intelligent way than what can be done with algorithms. It’s trained to the point where it can interpret more than just what you trained it on. The methodology is the same as when you try to teach an AI to recognise a dog. Ultimately, it can identify a dog regardless of breed because there are enough similarities to conclude that a poodle is a dog, even if the AI has only ever seen German Shepherds.