WO2022100970A1

WO2022100970A1 - Computer-implemented methods, apparatus, computer programs, non-transitory computer-readable storage mediums for performing optical character recognition on an image in a data file

Info

Publication number: WO2022100970A1
Application number: PCT/EP2021/078894
Authority: WO
Inventors: Muhannad ALOMARI; Arevinth VIGNARAJASARMA; Stephen Moore; Mohammad TANWEER
Original assignee: Rolls-Royce Plc
Priority date: 2020-11-11
Filing date: 2021-10-19
Publication date: 2022-05-19
Also published as: GB202017772D0

Abstract

A computer-implemented method of performing optical character recognition on an image in a data file. A computer-implemented method of performing optical character recognition on an image in a data file, the method comprising: processing the image in the data file using a first machine learning algorithm to generate machine-encoded text; processing the image in the data file using a second machine learning algorithm to generate machine-encoded text, the second machine learning algorithm being different to the first machine learning algorithm; determining a similarity of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm; controlling a request for user input to identify the correct machine-encoded text when the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm have at least one difference; determining which of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm is correct using the received user input; and controlling output of the machine-encoded text determined to be correct.

Description

TITLE

Computer-implemented methods, apparatus, computer programs, non-transitory computer-readable storage mediums for performing optical character recognition on an image in a data file

TECHNOLOGICAL FIELD

The present disclosure concerns computer-implemented methods, apparatus, computer programs, non-transitory computer-readable storage mediums for performing optical character recognition on an image in a data file.

BACKGROUND

Organisations and individuals have historically stored paper documentation (for example, contracts, manuals and invoices) in physical archives. Accessing and processing the information in such paper documentation usually presents a challenge to the organisation or individual. For example, some information in the paper documentation may be stored in tables and may need to be manually entered into a computer to enable the information to be processed. Such manual extraction of information may be time consuming and therefore costly and/or impractical for the organisation or individual.

Some organisations and individuals may additionally or alternatively store documentation digitally in the Portable Document Format (PDF) where the information is stored as raster images. This may present similar challenges to paper documentation because the information in the PDF files may need to be manually read and entered into a computer to enable the information to be processed.

BRIEF SUMMARY

According to a first aspect there is provided a computer-implemented method of performing optical character recognition on an image in a data file, the method comprising: processing the image in the data file using a first machine learning algorithm to generate machine-encoded text; processing the image in the data file using a second machine learning algorithm to generate machine-encoded text, the second machine learning algorithm being different to the first machine learning algorithm; determining a similarity of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm; controlling a request for user input to identify the correct machine-encoded text when the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm have at least one difference; determining which of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm is correct using the received user input; and controlling output of the machine-encoded text determined to be correct.

The computer-implemented method may further comprise: controlling output of the machine-encoded text generated by the first machine learning algorithm when it is determined that the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm are the same.

The computer-implemented method may further comprise: determining which of the first machine learning algorithm and the second learning algorithm generated incorrect machine-encoded text; providing the machine-encoded text determined to be correct to enable training of the machine learning algorithm that output the incorrect machine-encoded text.

The computer-implemented method may further comprise: rotating a first image to generate a second data file comprising a second image; identifying a table image in the second image of the second data file and storing the identified table image in a third data file, the identified table image comprising an image of text and at least one line; determining whether the table image in the third data file has a missing line; inserting a line in the table image of the third data file where a line is determined to be missing to generate a fourth data file; and determining a cell structure of the table image in the fourth data file using the lines of the table, the cell structure comprising a plurality of cells, and the image comprising an image of text in at least one cell of the cell structure.

Rotating the first image may comprise: determining a first angle using a third machine learning algorithm and the first image in the first data file; and rotating the first image using the determined first angle to generate the second data file.

Rotating the first image may comprise: determining a first angle using a third machine learning algorithm and the first image in the first data file; rotating the first image using the determined first angle to generate a further data file comprising a rotated image; determining a second angle using the rotated image and an algorithm, the second angle being smaller in magnitude than the first angle; and rotating the rotated image using the determined second angle to generate the second data file comprising the second image.

Determining whether the table in the third data file has a missing line may comprise: separating the lines of the table image and the image of text of the table image; identifying an empty row and/or an empty column in the image of text; determining if the empty row and/or the empty column have a width greater than a threshold value; determining that the table image has a missing line where the determined width is greater than the threshold value.

According to a second aspect there is provided an apparatus for determining a structure of a first image in a first data file, the apparatus comprising: a processor; a memory storing a computer program that, when executed by the processor, causes performance of the computer-implemented method as described in any of the preceding paragraphs of the brief summary.

According to a third aspect there is provided a system comprising: an apparatus as described in the preceding paragraph; a first device configured to generate the image and to transmit the image to the apparatus; and a second device configured to receive the machine-encoded text determined to be correct from the apparatus. According to a fourth aspect there is provided a computer program that, when executed by a processor, causes performance of the computer-implemented method as described in any of the preceding paragraphs of the brief summary.

According to a fifth aspect there is provided a non-transitory computer readable storage medium comprising computer readable instructions that, when executed by a processor, causes performance of the computer implemented method as described in any of the preceding paragraphs of the brief summary.

The skilled person will appreciate that except where mutually exclusive, a feature described in relation to any one of the above aspects may be applied mutatis mutandis to any other aspect. Furthermore, except where mutually exclusive any feature described herein may be applied to any aspect and/or combined with any other feature described herein.

BRIEF DESCRIPTION

Embodiments will now be described by way of example only, with reference to the Figures, in which:

Fig. 1 illustrates a schematic diagram of a system according to various examples;

Fig. 2 illustrates a flow diagram of a computer-implemented method of determining a structure of a first image in a first data file according to an example;

Fig. 3 illustrates a flow diagram of a computer-implemented method of performing optical character recognition on an image in a data file according to an example;

Fig. 4 illustrates a flow diagram of a computer-implemented method of determining a structure of a first image in a first data file and performing optical character recognition according to an example;

Fig. 5 illustrates a first image in a first data file according to an example; Fig. 6 illustrates a rotated image in a further data file according to an example;

Fig. 7 illustrates a second image in a second data file according to an example;

Fig. 8 illustrates lines of a table image in the second image according to an example;

Fig. 9 illustrates the identified lines of the table image in the second data file according to an example;

Fig. 10 illustrates the table image in a third data file according to an example;

Figs. 11A and 11 B illustrate separated lines of the table image, and the image of text of the table image respectively, according to an example;

Fig. 12 illustrates identified empty rows and empty columns in the image of text according to an example;

Fig. 13 illustrates the table image in the fourth data file having an inserted line, according to example; and

Fig. 14 illustrates a cell structure of the table image in the fourth data file according to an example.

DETAILED DESCRIPTION

In the following description, the terms ‘connected’ and ‘coupled’ mean operationally connected and coupled. It should be appreciated that there may be any number of intervening components between the mentioned features, including no intervening components.

Fig. 1 illustrates a schematic diagram of a system 10 according to various examples. The system 10 includes an apparatus 12, a first device 14 and a second device 16. In summary, the first device 14 is configured to scan physical media (such as paper) to generate a data file comprising an image. The apparatus 12 may be configured to receive the data file from the first device 14 and determine the structure of the image in the data file. Additionally, or alternatively, the apparatus 12 may be configured to receive the data file from the first device 14 and perform optical character recognition on the image in the data file. The second device 16 is configured to receive the output from the apparatus 12 (for example, machine-encoded text) and display the output to a user.

The apparatus 12 includes a processor 18, a memory 20 and a transceiver 22. In some examples, the apparatus 12 may be a module. As used herein, the wording ‘module’ refers to a device or apparatus where one or more features are included at a later time and, possibly, by another manufacturer or by an end user. For example, where the apparatus 12 is a module, the apparatus 12 may only include the processor 18 and the memory 20, and the remaining features (such as the transceiver 22) may be added by another manufacturer, or by an end user.

The processor 18 may include at least one microprocessor and may comprise a single core processor, may comprise multiple processor cores (such as a quad core processor), or may comprise a plurality of processors (at least one of which may comprise multiple processor cores). Additionally, the one or more cores of the processor 18 may be multi-threaded.

The memory 20 may be any suitable non-transitory computer readable storage medium, data storage device or devices, and may comprise a hard disk drive (HDD) and/or a solid-state drive (SSD). The memory 20 may be permanent nonremovable memory, or may be removable memory (such as a universal serial bus (USB) flash drive or a secure digital card). The memory 20 may include: local memory employed during actual execution of the computer program; bulk storage; and cache memories which provide temporary storage of at least some computer readable or computer usable program code to reduce the number of times code may be retrieved from bulk storage during execution of the code.

The memory 20 stores one or more computer programs 24 comprising computer readable instructions that, when executed by the processor 18, causes performance of the methods described herein, and as illustrated in Figs. 2, 3 and 4. The computer program 24 may be software or firmware, or may be a combination of software and firmware.

The computer program 24 may be stored on a non-transitory computer readable storage medium 26. The computer program 24 may be transferred from the non- transitory computer readable storage medium 26 to the memory 20. The non- transitory computer readable storage medium 26 may be, for example, a Universal Serial Bus (USB) flash drive, a secure digital (SD) card, or an optical disc (such as a compact disc (CD), a digital versatile disc (DVD) or a Blu-ray disc). In some examples, the computer program 24 may be transferred to the memory 20 via a signal 28, such as a wireless signal or a wired signal.

The transceiver 22 is coupled to the processor 18 and is configured to enable the apparatus 12 to communicate data (wirelessly and/or via a wired connection) with other apparatus and devices (such as the first device 14 and the second device 16).

The first device 14 may be a dedicated image scanning device such as a handheld scanner, a flatbed scanner, or a drum scanner. Alternatively, the first device 14 may be a multi-purpose device (such as a printer) that includes an image scanning device. The first device 14 includes a controller 30, a user input device 32, a display 34, a transceiver 36, and an image scanner 38.

The controller 30 may comprise any suitable circuitry to cause performance of the methods described herein in relation to the operation of the first device 14. The controller 30 may comprise: a processor and memory arrangement as described above in relation to the processor 18 and the memory 20; control circuitry; and/or processor circuitry; and/or at least one application specific integrated circuit (ASIC); and/or at least one field programmable gate array (FPGA); and/or single or multi-processor architectures; and/or sequential/parallel architectures; and/or at least one programmable logic controllers (PLCs); and/or at least one microprocessor; and/or at least one microcontroller; and/or a central processing unit (CPU); and/or a graphics processing unit (GPU), to perform the described methods.

The user input device 32 may comprise any suitable device for enabling a user to at least partially control the first device 14. For example, the user input device 32 may comprise one or more of: a keyboard; a keypad; a touchpad; a touchscreen display; and a computer mouse. The controller 30 is configured to receive signals from the user input device 32.

The display 34 may be a liquid crystal display (LCD), a light emitting diode (LED) display, an active matrix organic light emitting diode (AMOLED) display, or a thin film transistor (TFT) display, or a cathode ray tube (CRT) display. The controller 30 is configured to provide a signal to the display 34 to cause the display 34 to convey information to the user of the first device 14.

The transceiver 36 is coupled to the controller 30 and is configured to enable the first device 14 to communicate data (wirelessly and/or via a wired connection) with other apparatus and devices (such as the apparatus 12 and the second device 16).

The image scanner 38 is configured to optically scan physical media and generate a digital image. The controller 30 is configured to control the operation of the image scanner 38 and to receive the digital image from the image scanner 38 upon completion of a scan. The image scanner 38 may use a charge-coupled device (CCD) or a contact image sensor (CIS) as the image sensor. Alternatively, the image scanner 38 may be a drum scanner and use a photomultiplier tube (PMT) as the image sensor.

The second device 16 may be any personal computer or computing client, and may be for example a desktop personal computer, a laptop personal computer, a tablet personal computer or a smart phone. The second device 16 includes a controller 40, a user input device 42, a display 44, and a transceiver 46. The controller 40 may comprise any suitable circuitry to cause performance of the methods described herein in relation to the operation of the second device 16. The controller 40 may comprise: a processor and memory arrangement as described above in relation to the processor 18 and the memory 20; control circuitry; and/or processor circuitry; and/or at least one application specific integrated circuit (ASIC); and/or at least one field programmable gate array (FPGA); and/or single or multi-processor architectures; and/or sequential/parallel architectures; and/or at least one programmable logic controllers (PLCs); and/or at least one microprocessor; and/or at least one microcontroller; and/or a central processing unit (CPU); and/or a graphics processing unit (GPU), to perform the described methods.

The user input device 42 may comprise any suitable device for enabling a user to at least partially control the second device 16. For example, the user input device 42 may comprise one or more of: a keyboard; a keypad; a touchpad; a touchscreen display; and a computer mouse. The controller 40 is configured to receive signals from the user input device 42.

The display 44 may be a liquid crystal display (LCD), a light emitting diode (LED) display, an active matrix organic light emitting diode (AMOLED) display, or a thin film transistor (TFT) display, or a cathode ray tube (CRT) display. The controller 40 is configured to provide a signal to the display 44 to cause the display 44 to convey information to the user of the second device 16.

The transceiver 46 is coupled to the controller 40 and is configured to enable the second device 16 to communicate data (wirelessly and/or via a wired connection) with other apparatus and devices (such as the apparatus 12 and the first device 14).

In some examples, the apparatus 12, the first device 14 and the second device 16 are separate devices and may be positioned at the same or different geographic locations. For example, the first device 14 may be an image scanner located in a first city/county/state/country, the second device 16 may be a personal computer also located in the first city/county/state/country, and the apparatus 14 may be a server located in a second city/county/state/country, different to the first city/county/state/country.

In other examples, the apparatus 12, the first device 14 and/or the second device 16 may be integrated and positioned at the same geographic location. For example, the first device 14 may be an image scanner and the apparatus 12 may be integrated into the first device 14 (that is, the processor 18, the memory 20 and the transceiver 22 are part of the image scanner 14). By way of another example, the second device 16 may be a personal computer and the apparatus 12 may be integrated into the second device 14 (that is, the processor 18, the memory 20 and the transceiver 22 are part of the personal computer 16). In a further example, the apparatus 12, the first device 14 and the second device 16 may be integrated into a single device (such as a personal computer) where the image scanner 38 is a peripheral device and is controlled by the processor 18.

Fig. 2 illustrates a flow diagram of a computer-implemented method of determining a structure of a first image in a first data file according to an example.

At block 48, the computer-implemented method may include receiving a first image in a first data file 49. For example, the controller 30 of the first device 14 may control the image scanner 38 to scan physical media to generate a digital image, and then control the transceiver 36 to transmit the digital image to the apparatus 12. The transceiver 22 of the apparatus 12 receives the digital image and stores the digital image in the memory 20 as the first data file 49.

The physical media has a mixture of content printed on at least one surface of the physical media. The content includes text and one of more tables. The text may be in any language, in any font type, and in any font size. The one or more tables are arrangements of text in rows and columns. The rows and columns are delineated by vertical lines and horizontal lines which form a plurality of cells (a cell structure).

The digital image scanned by the image scanner 38 may be a bitmap of the content printed on the physical media. Consequently, the text and one or more tables printed on the physical media are mapped to a bit array to form the digital image in the first data file 49 (such digital images are also known as raster images). The first data file 49 may have an uncompressed file format (such as the BMP file format), or may have a compressed file format (such as the JPEG or PNG file formats). In some examples, the first data file 49 may be in the Portable Document Format (PDF) where the text and one or more tables are encoded as raster images.

It should be appreciated that the user of the first device 14 may load the physical media into the image scanner 38 in any orientation. For example, the user of the first device 14 may load the physical media into the image scanner 38 with an orientation that causes the image to appear upside down (that is, rotated through one hundred and eighty degrees) when the image is viewed on the display 34 of the first device 14 or on the display 44 of the second device 16. Furthermore, it should be appreciated that the operation of the image scanner 38 may cause a change in orientation of the physical media before it is scanned by the image scanner 38. For example, where the image scanner 38 is a flatbed scanner, the closing of the lid to the scanner may cause some rotation of the physical media. By way of another example, where the image scanner 38 is a drum scanner, slippage of the physical media on the drum may cause rotation of the physical media.

By way of another example, the first data file 49 may already be stored in the memory 20 and the processor 18 may perform block 48 by reading the first data file 49. For example, the memory 20 may store a Portable Document Format (PDF) file comprising the first image and the processor 18 may perform block 48 by reading the Portable Document Format file in the memory 20.

At block 50, the computer-implemented method includes rotating the first image to generate a second data file comprising a second image. In some examples, the processor 18 may use an algorithm to determine the orientation error of the first image, and then rotate the first image using the determined orientation error to generate a second data file 51 comprising a second image. The processor 18 may then store the second data file 51 in the memory 20, either by over-writing the first data file 49, or by storing the second data file 51 as a separate file to the first data file 49.

The processor 18 may use any suitable algorithm to determine the orientation error of the first image in the first data file 49. In a first example, the processor 18 may use a machine learning algorithm to classify the image orientation (that is, the orientation error) of the first image. The machine learning algorithm may classify the image orientation into one of several angles (for example, zero degrees, ninety degrees, one hundred and eighty degrees and two hundred and seventy degrees). In a second example, the processor 18 may identify lines in the first image and then use trigonometry to determine the orientation error by comparing the identified lines against horizontal and vertical frames of reference. In a third example, the processor 18 may threshold the first image and calculate the sum of squares of the image histogram over several angles to determine the orientation error.

In other examples, the processor 18 may not determine an orientation error, but may instead rotate the first image by a predetermined angle. For example, a user may operate the user input device 32 ofthe first device 14 to enter a rotation angle, and the controller 30 may control the transceiver 36 to transmit the rotation angle to the apparatus 12 to enable the performance of block 50. In another example, the apparatus 12 may be pre-configured and store the predetermined angle in the memory 20 prior to performing the computer-implemented method illustrated in Fig. 2.

At block 52, the computer-implemented method includes identifying a table image in the second image of the second data 51 and storing the identified table image in a third data file 53. For example, the processor 18 may identify horizontal and vertical lines in the second image in the second data file 51 to determine the presence of a table image. The processor 18 may then apply contour detection on the identified horizontal and vertical lines to determine the table images location within the second image. The processor 18 may then extract the table image and store the table image in the third data file 53 in the memory 20. At block 54, the computer-implemented method may include determining whether the table image in the third data file 53 has a missing line. For example, the processor 18 may determine whether the table image in the third data file 53 has a missing line using any suitable algorithm.

At block 56, the computer-implemented method may include inserting a line in the table image of the third data file 53 where a line is determined to be missing to generate a fourth data file 57. For example, where the processor 18 determines there is a missing line at block 54, the processor 18 may insert a vertical line and/or a horizontal line into the table image to generate the fourth data file 57. The processor 18 may store the fourth data file 57 in the memory 20, either by overwriting the third data file 53, or by storing the fourth data file in the memory 20 as a separate file to the third data file 53.

At block 58, the computer-implemented method includes determining a cell structure of the table image in the fourth data file 57 using the lines of the table image. For example, the processor 18 may apply contour analysis on the table image of the fourth data file 57 to identify a plurality of cells of the table image and thereby determine the cell structure of the table image.

At block 60, the computer-implemented method may include performing optical character recognition on at least one cell to generate machine-encoded text. In some examples, the processor 18 may use a ‘matrix matching’ algorithm (also known as ‘pattern matching’, ‘pattern recognition’ or ‘image correlation’) to perform optical character recognition on one or more cells of the determined cell structure to generate machine-encoded text. In such examples, the ‘matrix matching’ algorithm compares a cell image to a stored glyph on a pixel-by-pixel basis to determine the characters in the cell.

In other examples, the processor 18 may use a ‘feature extraction’ algorithm to perform optical character recognition on one or more cells of the determined cell structure to generate machine-encoded text. In these examples, the feature extraction algorithm decomposes glyphs in a cell into features and then compares the features with vector-like representations of characters to determine the characters in the cell.

In further examples, the processor 18 may use a machine learning algorithm (such as a neural network) to perform optical character recognition on one or more cells of the determined cell structure to generate machine encoded text. The machine learning algorithm may be trained to recognize single characters or whole lines of text.

The computer-implemented method may utilise a user input (for example, a key press from the user input device 32 or from the user input device 42) to provide context (or label) for the encoded text. The manually encoded user context (or label) may be used to normalise/link different machine-encoded text. The contextualising of text may make it easier for computers to process and link information from different input images.

The computer-implemented method may also include transmitting the machine- encoded text generated at block 60 to the second device 16. For example, the processor 18 may control the transceiver 22 to transmit the machine-encoded text to the second device 16. The machine-encoded text may be received by the transceiver 46 of the second device 16 and the controller 40 may control the display 44 to display the received machine-encoded text. For example, received machine-encoded text generated from a single cell of the table image may be displayed in a single cell of a spreadsheet application program.

In some examples, the processor 18 may perform blocks 48, 50, 52, 54, 56, 58 and 60 automatically and without user intervention. In other examples, one or more of blocks 48, 50, 52, 54, 56, 58 and 60 may require user input (for example, a key press from the user input device 32 or from the user input device 42) to cause the processor 18 to perform the block.

The computer-implemented method illustrated in Fig. 2 may provide several advantages. The determination of the structure of the first image in the first data file may enable optical character recognition to be performed on scanned images of tables and the machine-encoded text to be generated and stored/presented on a cell by cell basis. Additionally, the computer-implemented method may enable an organisation or individual to store their physical archive in a digital format where the data may be relatively easily searched and extracted. The storage in a digital format may also improve the data security of the information on the physical media because the information may be stored in a variety of different digital formats at any number of physical locations.

Fig. 3 illustrates a flow diagram of a computer-implemented method of performing optical character recognition on an image in a data file.

At block 62, the computer-implemented method includes processing an image in a data file using a first machine learning algorithm to generate machine-encoded text. For example, the processor 18 may process a text image in a cell of the structure determined at block 58 (and which may be stored in the fourth data file 57) using a first machine learning algorithm to generate machine-encoded text. The first machine learning algorithm may be any suitable machine learning algorithm and may be, for example, a first deep learning model (such as a convolutional neural network). In other examples, the processor 18 may process a text image in any other data file (whether generated or not by the first device 14 or the second device 16).

The first machine learning algorithm may be stored in the memory 20 of the apparatus 12 and executed by the processor 18, or may be stored in a memory of a remote apparatus and executed by a processor of that remote apparatus. Where the first machine learning algorithm is stored at a remote apparatus, block 62 additionally comprises transmitting the data file to the remote apparatus using the transceiver 22, processing the data file at that remote apparatus to generate machine-encoded text, and then receiving the generated machine-encoded text using the transceiver 22.

At block 64, the computer-implemented method includes processing the image in the data file using a second machine learning algorithm to generate machine- encoded text. For example, the processor 18 may process the same text image as mentioned above with reference to block 62 using the second machine learning algorithm to generate machine-encoded text. The second machine learning algorithm may be any suitable machine learning algorithm and may be, for example, a second deep learning model (such as a convolutional neural network).

The second machine learning algorithm may be stored in the memory 20 of the apparatus 12 and executed by the processor 18 as described above, or may be stored in a memory of a remote apparatus and executed by a processor of that remote apparatus. Where the second machine learning algorithm is stored at a remote apparatus, block 64 additionally comprises transmitting the data file to the remote apparatus using the transceiver 22, processing the data file at that remote apparatus to generate machine-encoded text, and then receiving the generated machine-encoded text using the transceiver 22.

In some examples, block 62 and block 64 may be performed sequentially. For example, block 62 may be performed prior to block 64, or block 64 may be performed prior to block 62. Alternatively, block 62 and block 64 may be performed in parallel (that is, they may overlap in time at least partially).

It should be appreciated that in some examples, the computer-implemented method may include processing the image in the data file using more than two machine learning algorithms to generate machine-encoded text. The machine learning algorithms may be stored in the memory 20, or may be stored at one or more other apparatus, or may be stored at a combination of the memory 20 and other apparatus.

At block 66, the computer-implemented method includes determining a similarity of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm. For example, the processor 18 may use any suitable content similarity detection algorithm to determine the similarity of the machine-encoded text generated by the first machine learning algorithm and the second machine learning algorithm. Examples of content similarity detection algorithms include string matching, fingerprinting and bag of words analysis. Where machine-encoded text is generated by more than two machine learning algorithms, block 66 may include determining a similarity of the machine-encoded text generated by some or all of the machine learning algorithms.

When it is determined at block 66 that the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm are the same, the computer-implemented method moves to block 68 and includes controlling output of the machine-encoded text generated by the first machine learning algorithm. For example, the processor 18 may control the output of the machine-encoded text generated by the first machine learning algorithm by controlling the transceiver 22 to transmit the machine-encoded text to the second device 16. The machine-encoded text may then be received by the transceiver 46 of the second device 16 and the controller 40 may control the display 44 to display the received machine-encoded text. For example, received machine-encoded text generated from a single cell of the table image may be displayed in a single cell of a spreadsheet application program.

When it is determined that the machine-encoded text generated by the machine learning algorithms have at least one difference, the computer-implemented method moves to block 70 and includes controlling a request for user input to identify the correct machine-encoded text.

For example, the processor 18 may control the transceiver 22 to send a signal to the second device 16. The signal includes the machine-encoded text generated by the first machine learning algorithm, and the machine-encoded text generated by the second machine learning algorithm. Upon receiving the signal, the controller 40 controls the display 44 to display the received machine-encoded text and a message that asks the user to select the correct machine-encoded text. The user may operate the user input device 42 to select the correct machine- encoded text and the controller 40 subsequently receives a signal from the user input device 42 that identifies the correct machine-encoded text. The controller 40 then controls the transceiver 46 to send a signal to the apparatus 12 that identifies the correct machine-encoded text. In other examples, the processor 18 may control the transceiver 22 to send the signal to the first device 14. Upon receiving the signal, the controller 30 controls the display 34 to display the received machine-encoded text and a message that asks the user to select the correct machine-encoded text. The user may operate the user input device 32 to select the correct machine-encoded text and the controller 30 subsequently receives a signal from the user input device 32 that identifies the correct machine-encoded text. In these examples, the user may compare the displayed machine-encoded text with the content printed on the physical media to determine the correct machine-encoded text. The controller 30 subsequently controls the transceiver 36 to send a signal to the apparatus 12 that identifies the correct machine-encoded text.

At block 72, the method includes determining which of the machine-encoded text generated by (at least) the first machine learning algorithm and the machine- encoded text generated by the second machine learning algorithm is correct using the received user input. For example, the processor 18 may receive the signal that identifies the correct machine-encoded text from the first device 14 or the second device 14 and then use that signal to determine the correct machine- encoded text (using a similarity algorithm for example).

At block 74, the computer-implemented method includes controlling output of the machine-encoded text determined to be correct. For example, the processor 18 may control the output of the machine-encoded text determined to be correct by controlling the transceiver 22 to transmit the correct machine-encoded text to the second device 16. The machine-encoded text may then be received by the transceiver 46 of the second device 16 and the controller 40 may control the display 44 to display the received machine-encoded text. For example, received machine-encoded text generated from a single cell of the table image may be displayed in a single cell of a spreadsheet application program.

In examples where a user operates the user input device 42 of the second device 16 to identify the correct machine-encoded text, the controller 40 may perform block 74 by using the correct machine-encoded text identified in the signal to control the display 44 to display the correct machine-encoded text.

The processor 18 may also perform block 74 by providing the machine-encoded text determined to be correct to enable training of the machine learning algorithm that generated the incorrect machine-encoded text. For example, where the processor 18 determines that the second machine learning algorithm (which in this example is stored and executed by another apparatus) generated the incorrect machine-encoded text, the processor 18 may control the transceiver 22 to transmit a signal comprising the correct machine-encoded text to the other apparatus to enable training of the second machine learning algorithm.

In some examples, the processor 18 may perform blocks 62, 64, 66, 68, 70, 72 and 74 automatically and without user intervention. In other examples, one or more of blocks 62, 64, 66, 68, 70, 72 and 74 may require user input (for example, a key press from the user input device 32 or from the user input device 42) to cause the processor 18 to perform the block.

The computer-implemented method illustrated in Fig. 3 may provide several advantages. The use of two or more machine learning algorithms, the subsequent determination of similarity, and request for user input where the generated machine-encoded text is not similar may result in the output of more accurate machine-encoded text relative to performing optical character recognition using a technique such as matrix matching or feature extraction. Additionally, the provision of the correct machine-encoded text to the machine learning algorithm that generated the incorrect machine-encoded text may advantageously enable training of that machine learning algorithm.

Fig. 4 illustrates a flow diagram of a computer-implemented method of determining a structure of the first image in the first data file 49, and performing optical character recognition. The computer-implemented method illustrated in Fig. 4 is similar to the computer-implemented methods illustrated in Figs. 2 and 3, and where the blocks are similar/the same, the same reference numerals are used. Where the blocks are similar/the same, the implementation of those blocks may not be described in detail, and may be understood from the implementations described above with reference to Fig. 2 and Fig. 3.

At block 48, the computer-implemented method includes receiving a first image in a first data file 49. For example, the processor 18 may receive the first image 76 illustrated in Fig. 5. The first image 76 comprises an image of text 78 and an image of a table 80. The image of the table 80 includes a plurality of lines 82 and images of text 84. The first image 76 has an anti-clockwise orientation error of approximately ninety-five degrees.

In this example, block 50 (rotating the first image to generate a second data file comprising a second image) includes four blocks: block 86, block 88, block 90 and block 92.

At block 86, the computer-implemented method includes determining a first angle using a first machine learning algorithm and the first image 76 in the first data file 49. For example, the processor 18 may use a convolutional neural network (CNN) deep learning model stored in the memory 20 to determine that the first angle of the first image 76 is ninety degrees. The convolutional neural network may have, for example, four classes of orientation (zero, ninety, one hundred and eighty, two hundred and seventy).

At block 88, the computer-implemented method includes rotating the first image 76 using the determined first angle to generate a further data file comprising a rotated image. For example, the processor 18 may rotate the first image 76 through ninety degrees clockwise to generate the rotated image 94 illustrated in Fig. 6.

At block 90, the computer-implemented method includes determining a second angle using the rotated image 94 in the further data file and an algorithm. For example, the processor 18 may determine that the second angle is five degrees using the lines 82 of the image of the table 80 and trigonometry as described above. By way of another example, the processor 18 may determine that the second angle is five degrees by thresholding the first image and calculating the sum of squares of the image histogram over several angles.

At block 92, the computer-implemented method includes rotating the rotated image using the determined second angle to generate a second data file comprising a second image. For example, the processor 18 may rotate the rotated image 94 by five degrees clockwise to generate the second image 96 illustrated in Fig. 7 (and stored in the second data file 51 ).

At block 52, the computer-implemented method includes identifying a table image in the second image of the second data file and storing the identified table image in a third data file. For example, the processor 18 may identify the horizontal and vertical lines 82 in the second image 96 in the second data file (as illustrated in Fig. 8) to determine the presence of the table image 80. The processor 18 may then apply contour detection on the identified horizontal and vertical lines 82 (as illustrated in Fig. 9) to determine the table images 80 location within the second image 96. The processor 18 may then extract the table image 80 (as illustrated in Fig. 10) and store the table image 80 in the third data file 53 in the memory 20.

In this example, block 54 (determining whether the table image in the third data file has a missing line) includes four blocks: block 98, block 100, block 102 and block 104. As with the computer-implemented method illustrated in Fig. 2 and described above, it should be appreciated that block 54 is optional and hence, blocks 98, 100, 102 and 104 are also optional.

A block 98, the computer-implemented method may include separating the lines of the table image and the image of text of the table image. For example, the processor 18 may separate the lines 82 of the table image 80 and the image of text 84 of the table image 80 as illustrated in Figs. 11A and 11 B. The processor 18 may then overwrite the third data file with the separated lines 82 and image of text 84 as the table image.

At block 100, the computer-implemented method may include identifying an empty row and/or an empty column in the image of text 84. For example, the processor 18 may scan the image of text 84 illustrated in Fig. 11 B for any empty rows and columns to generate potential locations for where a line may be added. As illustrated in Fig. 12, the black rows 106 represent candidates for rows, and the grey columns 108 represent candidates for columns. It should be appreciated that the processor 18 may identify a row and/or a column as empty where the row and/or column are not completely empty. For example, the processor 18 may determine whether the percentage of white pixels in a row and/or a column is above a threshold percentage (ninety five percent for example) and where the percentage of white pixels is above the threshold, the row and/or column is identified as empty.

At block 102, the computer-implemented method may include determining if the empty row and/or the empty column have a width greater than a threshold value. For example, the processor 18 may determine if one or more of the candidate rows 106 have a width greater than a threshold width. The processor 18 may additionally or alternatively determine if one or more of the candidate columns 108 have a width greater than a threshold width.

At block 104, the computer-implemented method may include determining that the table image has a missing line where the determined width is greater than the threshold value. For example, the processor 18 may determine that the table image 80 of the third data file 53 has a missing line where a candidate row 106 has a width greater than a threshold width stored in the memory 20. The processor 18 may also determine that the table image 80 of the third data file 53 has a missing line where a candidate column 108 has a width greater than a threshold width stored in the memory 20.

In some examples, block 104 may also include determining whether a candidate row 106 and/or candidate column 108 intersects with a line 82 illustrated in Fig. 11 A.

At block 56, the computer-implemented method may include inserting a line in the table image of the third data file where a line is determined to be missing to generate a fourth data file 57. For example, the processor 18 may insert a line into the table image 80 of the third data file 53 (illustrated in Fig. 10) where it is determined that a width of a candidate row 106 or a candidate column 108 exceeds a threshold width to generate the table image 110 illustrated in Fig. 13.

In some examples, block 56 may include inserting lines into the separated image of text to reconstruct the table. For example, the processor 18 may insert a line into the image of text 84 illustrated in Fig. 11 B where it is determined that a candidate row 106 and/or candidate column 108 intersects with a line 82 illustrated in Fig. 11A. Additionally, the processor 18 may insert a line into the image of text 84 illustrated in Fig. 11 B where it is determined that a width of a candidate row 106 or a candidate column 108 exceeds a threshold width to generate the table image 110 illustrated in Fig. 13. The processor 18 does not insert a line into the image of text 84 illustrated in Fig. 11 B where a candidate row or a candidate column does not intersect a line 82 illustrated in Fig. 11 A, or does not have a width exceeding the threshold width.

At block 58, the computer-implemented method includes determining a cell structure of the table image in the fourth data file using the lines of the table image. For example, the processor 18 may apply contour analysis on the table image 110 of the fourth data file 57 (illustrated in Fig. 13) to identify a plurality of cells 112 of the table image 110 and thereby determine the cell structure of the table image (as illustrated in Fig. 14 where each cell 112 is in a different shade of grey and is delineated by black lines).

At block 114, the computer-implemented method may include storing a cell of the plurality of cells of the table in a fifth data file. For example, the processor 18 may store one or more cells of the plurality of cells 112 of the table image 110 in a fifth data file 116 in the memory 20. It should be appreciated that block 114 may be repeated for all cells and consequently, the computer-implemented method may include storing a plurality of data files for some or all of the cells of the table image 110.

The computer-implemented method may then move to block 60, or alternatively, may move to blocks 62 and 64. At block 60, the computer-implemented method includes performing optical character recognition on the cell in the fifth data file to generate machine encoded text. For example, the processor 18 may use a ‘matrix matching’ algorithm, a ‘feature extraction’ algorithm, or a machine learning algorithm to perform optical character recognition on the cell in the fifth data file 116 to generate machine- encoded text.

Block 60 may also include transmitting the machine-encoded text generated at block 60 to the second device 16. For example, the processor 18 may control the transceiver 22 to transmit the machine-encoded text to the second device 16. The machine-encoded text may be received by the transceiver 46 of the second device 16 and the controller 40 may control the display 44 to display the received machine-encoded text. For example, received machine-encoded text generated from the cell in the fifth data file 116 may be displayed in a single cell of a spreadsheet application program on the display 44.

At block 62, the computer-implemented method includes processing a cell of the determined cell structure using a second machine learning algorithm to generate machine-encoded text. For example, the processor 18 may process the text image of the cell in the fifth data file 116 using a second machine learning algorithm to generate machine-encoded text.

At block 64, the computer-implemented method includes processing the cell of the determined cell structure using a third machine learning algorithm to generate machine-encoded text. For example, the processor 18 may process the text image of the cell in the fifth data file 116 using a third machine learning algorithm to generate machine-encoded text.

As described above with reference to Fig. 3, block 62 and block 64 may be performed sequentially. For example, block 62 may be performed prior to block 64, or block 64 may be performed prior to block 62. Alternatively, block 62 and block 64 may be performed in parallel (that is, they may overlap in time at least partially). At block 66, the computer-implemented method includes determining a similarity of the machine-encoded text generated by the second machine learning algorithm and the machine-encoded text generated by the third machine learning algorithm.

Where it is determined that the machine-encoded text generated by the second machine learning algorithm and the machine-encoded text generated by the third machine learning algorithm are the same, the computer-implemented method moves to block 68 and includes controlling output of the machine-encoded text generated by the second machine learning algorithm.

Where it is determined that the machine-encoded text generated by the second machine learning algorithm and the machine-encoded text generated by the third machine learning algorithm have at least one difference, the computer- implemented method moves to block 70 and includes controlling a request for user input to identify the correct machine-encoded text.

At block 72, the computer-implemented method includes determining which of the machine-encoded text generated by the second machine learning algorithm and the machine-encoded text generated by the third machine learning algorithm is correct using the received user input.

The computer-implemented method then moves to block 72 and block 118. Block 72 and block 118 may be performed sequentially (that is, block 72 may be performed prior to block 118, or block 118 may be performed prior to block 72). Alternatively, block 72 and block 118 may performed in parallel (that is, the performance of blocks 72 and 118 may overlap in time at least partially).

At block 72, the computer-implemented method includes controlling output of the machine-encoded text determined to be correct.

At block 118, the computer-implemented method includes determining which of the second machine learning algorithm and the third machine learning algorithm generated incorrect machine-encoded text. For example, the processor 18 may determine a similarity of the machine-encoded text determined to be correct with the machine-encoded text generated by the second machine learning algorithm and the machine-encoded text generated by the third machine learning algorithm. Where the machine-encoded text generated by a machine learning algorithm has at least one difference with the machine-encoded text determined to be correct, the processor 18 determines that that machine learning algorithm generated incorrect machine-encoded text.

It should be appreciated that where more than two machine learning algorithms are used to generate machine-encoded text (at blocks 62 and 64), more than one machine learning algorithm may be determined to have generated incorrect machine-encoded text.

At block 120, the computer-implemented method includes providing the machine- encoded text determined to be correct to enable training of the machine learning algorithm that generated incorrect machine-encoded text. For example, where the machine learning algorithm that generated the incorrect machine-encoded text is hosted by another apparatus, the processor 18 may control the transceiver 22 to transmit the machine-encoded text determined to be correct to that other apparatus, to enable that other apparatus to train the incorrect machine learning algorithm.

It should be appreciated that where two or more machine learning algorithms are determined to have generated incorrect machine-encoded text (at block 118), block 120 may include providing the machine-encoded text determined to be correct to enable training of the two or more incorrect machines learning algorithms.

In some examples, the processor 18 may perform blocks 48, 86, 88, 90, 92, 52, 98, 100, 102, 104, 56, 58, 114, 60, 62, 64, 66, 68, 70, 72, 74, 118 and 120 automatically and without user intervention. In other examples, one or more of blocks 48, 86, 88, 90, 92, 52, 98, 100, 102, 104, 56, 58, 114, 60, 62, 64, 66, 68, 70, 72, 74, 118 and 120 may require user input (for example, a key press from the user input device 32 or from the user input device 42) to cause the processor 18 to perform the block.

It will be understood that the invention is not limited to the embodiments above- described and various modifications and improvements can be made without departing from the concepts described herein. For example, the different embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. By way of another example, blocks 50 and 52 illustrated in Figs. 2 and 4 may be performed in reverse. In other words, the computer-implemented methods illustrated in Fig. 2 and in Fig. 4 may first identify a table image, store the identified table image as a second data file, and then perform a rotation on the identified table image and store the rotated table image as a third data file. Except where mutually exclusive, any of the features may be employed separately or in combination with any other features and the disclosure extends to and includes all combinations and sub-combinations of one or more features described herein.

Claims

28 WE CLAIM:

1 . A computer-implemented method of performing optical character recognition on an image in a data file, the method comprising: processing the image in the data file using a first machine learning algorithm to generate machine-encoded text; processing the image in the data file using a second machine learning algorithm to generate machine-encoded text, the second machine learning algorithm being different to the first machine learning algorithm; determining a similarity of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm; controlling a request for user input to identify the correct machine- encoded text when the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm have at least one difference; determining which of the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm is correct using the received user input; and controlling output of the machine-encoded text determined to be correct.

2. A computer-implemented method as claimed in claim 1 , further comprising: controlling output of the machine-encoded text generated by the first machine learning algorithm when it is determined that the machine-encoded text generated by the first machine learning algorithm and the machine-encoded text generated by the second machine learning algorithm are the same.

3. A computer-implemented method as claimed in claim 1 or 2, further comprising: determining which of the first machine learning algorithm and the second learning algorithm generated incorrect machine-encoded text; providing the machine-encoded text determined to be correct to enable training of the machine learning algorithm that output the incorrect machine- encoded text.

4. A computer-implemented method as claimed in any of the preceding claims, further comprising: rotating a first image to generate a second data file comprising a second image; identifying a table image in the second image of the second data file and storing the identified table image in a third data file, the identified table image comprising an image of text and at least one line; determining whether the table image in the third data file has a missing line; inserting a line in the table image of the third data file where a line is determined to be missing to generate a fourth data file; and determining a cell structure of the table image in the fourth data file using the lines of the table, the cell structure comprising a plurality of cells, and the image comprising an image of text in at least one cell of the cell structure.

5. A computer-implemented method as claimed in claim 4, wherein rotating the first image comprises: determining a first angle using a third machine learning algorithm and the first image in the first data file; and rotating the first image using the determined first angle to generate the second data file.

6. A computer-implemented method as claimed in claim 4, wherein rotating the first image comprises: determining a first angle using a third machine learning algorithm and the first image in the first data file; rotating the first image using the determined first angle to generate a further data file comprising a rotated image; determining a second angle using the rotated image and an algorithm, the second angle being smaller in magnitude than the first angle; and rotating the rotated image using the determined second angle to generate the second data file comprising the second image.

7. A computer-implemented method as claimed in any of the preceding claims, wherein determining whether the table in the third data file has a missing line comprises: separating the lines of the table image and the image of text of the table image; identifying an empty row and/or an empty column in the image of text; determining if the empty row and/or the empty column have a width greater than a threshold value; determining that the table image has a missing line where the determined width is greater than the threshold value.

8. An apparatus for determining a structure of a first image in a first data file, the apparatus comprising: a processor; a memory storing a computer program that, when executed by the processor, causes performance of the computer-implemented method as claimed in any of the preceding claims.

9. A system comprising: an apparatus as claimed in claim 8; a first device configured to generate the image and to transmit the image to the apparatus; and a second device configured to receive the machine-encoded text determined to be correct from the apparatus.

10. A computer program that, when executed by a processor, causes performance of the computer-implemented method as claimed in any of claims 1 to 7.

11. A non-transitory computer readable storage medium comprising computer readable instructions that, when executed by a processor, causes performance of the computer implemented method as claimed in any of claims 1 to 7.