AU2017225022A1 - Method, system and apparatus for processing image data - Google Patents
Method, system and apparatus for processing image data Download PDFInfo
- Publication number
- AU2017225022A1 AU2017225022A1 AU2017225022A AU2017225022A AU2017225022A1 AU 2017225022 A1 AU2017225022 A1 AU 2017225022A1 AU 2017225022 A AU2017225022 A AU 2017225022A AU 2017225022 A AU2017225022 A AU 2017225022A AU 2017225022 A1 AU2017225022 A1 AU 2017225022A1
- Authority
- AU
- Australia
- Prior art keywords
- nodes
- image data
- representation
- target
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Landscapes
- Image Analysis (AREA)
Abstract
-33 Abstract METHOD, SYSTEM AND APPARATUS FOR PROCESSING IMAGE DATA A method of processing image data. Target image data associated with a skeletonised 5 target representation and reference image data associated with a skeletonised reference representation are received. A graph is determined for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, 10 extremum and endpoint. One or more of the nodes are selected from each of the determined graphs based on a type of the nodes. A similarity measure is determined for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the 15 selected nodes of the target representation graph with auxiliary nodes of the reference representation graph. At least one of the target image data and the reference image data are processed based on the determined similarity measure. 13574070v1 (P274569_Speci_As Filed)
Description
A method of processing image data. Target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation are received. A graph is determined for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint. One or more of the nodes are selected from each of the determined graphs based on a type of the nodes. A similarity measure is determined for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph. At least one of the target image data and the reference image data are processed based on the determined similarity measure.
13574070v 1 (P274569_Speci_As Filed)
-12017225022 05 Sep 2017
METHOD, SYSTEM AND APPARATUS FOR PROCESSING IMAGE DATA
TECHNICAL FIELD
The present invention relates to image comparison methods and, in particular, to methods for comparing images of rendered text in order to determine differences. The present invention also relates to a method and apparatus for processing image data, and to a computer program product including a computer readable medium having recorded thereon a computer program for processing image data.
BACKGROUND
Techniques for comparing two or more images, in order to determine a similarity measure or determine image areas in which images differ, is a field with many and varied approaches in the field of image processing. One application for image comparison techniques is to compare images that are each images of text that has been typeset or otherwise rendered by a computer system, for display, print or other uses.
One arrangement for performing image comparison compares corresponding pixel locations, and outputs image pixels that are different under this comparison operation.
However, in comparing images of text, variations such as boldness, thickness of strokes, relative arrangement and spacing of strokes and so forth are typically considered as acceptable changes and are not output as difference. Only differences such as changes in placement of glyph components are typically output.
Other applications perform text image comparisons in order to determine whether text produced using a first font is an acceptable replacement for text produced using a second font. Similarity may be measured based on an image comparison method used to justify such a classification.
Optical character recognition (OCR) is used for converting images of text into machine25 encoded text strings, for example, in ASCII or Unicode. In OCR methods, it is desirable for the method of determining the machine-encoded text strings to be robust to minor variations in appearance of the text in an image being processed.
13574070v 1 (P274569_Speci_As Filed)
-22017225022 05 Sep 2017
In one method, a similarity measure for text images is determined by performing OCR on each input text image, and then comparing the resulting machine-encoded text strings. However, due to the requirement that OCR be resilient to minor variations in appearance, the differences in appearance that are not acceptable to ignore for a particular application may be ignored. During the OCR process the variations may be ignored and a most likely machineencoding may be determined regardless of variations in the exact form and arrangement of the glyph strokes in the text image.
SUMMARY
It is an object of the present invention to substantially overcome, or at least ameliorate, 10 one or more disadvantages of existing arrangements.
Disclosed are arrangements that perform a comparison of text images by extracting features of the text images, and the connectedness of those features, to form a feature graph for each input text image. The features are categorised as being representative, or auxiliary, based on the arrangement and relationship of the features to other features in a feature graph. A similarity measure may be determined firstly by comparing representative features, and subsequently by comparing unmatched representative features to auxiliary features.
According to one aspect of the present disclosure, there is provided a method of processing image data, the method comprising:
receiving target image data associated with a skeletonised target representation and 20 reference image data associated with a skeletonised reference representation;
determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
selecting one or more of the nodes from each of the determined graphs based on a type of the nodes;
determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and
13574070v 1 (P274569_Speci_As Filed)
-32017225022 05 Sep 2017 processing at least one of the target image data and the reference image data based on the determined similarity measure.
According to another aspect of the present disclosure, there is provided an apparatus for processing image data, the apparatus comprising:
means for receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
means for determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
means for selecting one or more of the nodes from each of the determined graphs based on a type of the nodes;
means for determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and means for processing at least one of the target image data and the reference image data based on the determined similarity measure.
According to still another aspect of the present disclosure, there is provided a system for processing image data, the system comprising:
a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program;
receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
selecting one or more of the nodes from each of the determined graphs based on a type of the nodes;
determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a
13574070v 1 (P274569_Speci_As Filed)
-42017225022 05 Sep 2017 respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and processing at least one of the target image data and the reference image data 5 based on the determined similarity measure.
According to still another aspect of the present disclosure, there is provided a nontransitory computer readable medium having stored on the medium a computer program for processing image data, the program comprising:
code for receiving target image data associated with a skeletonised target representation 10 and reference image data associated with a skeletonised reference representation;
code for determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
code for selecting one or more of the nodes from each of the determined graphs based on a type of the nodes;
code for determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and code for processing at least one of the target image data and the reference image data based on the determined similarity measure.
According to still another aspect of the present disclosure, there is provided a method of 25 processing image data, the method comprising:
receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, wherein at least one node is marked, in the memory, as an extremum;
determining a similarity measure for the target image data and the reference image data by comparing nodes of the target representation graph with corresponding nodes of the
13574070v 1 (P274569_Speci_As Filed)
-52017225022 05 Sep 2017 reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, wherein an unmatched extremum of the target representation graph is further compared with auxiliary extrema of the reference representation graph to determine a similarity measure for the target image data and the reference image data; and processing at least one of the target image data and the reference image data based on the determined similarity measure.
Other aspects are also disclosed.
BRIEF DESCRIPTION OF THE DRAWINGS
One or more embodiments of the invention will now be described with reference to the 10 following drawings, in which:
Fig. 1 is a schematic flow diagram illustrating a method of detecting differences between a reference text image and target text image;
Fig. 2A shows two similar example text images, each being an example of a word in Thai language text;
Fig. 2B shows two similar example text images and, each being a word in Arabic writing and font;
Fig. 2C shows example text images, each comprising the letter ‘g’;
Fig. 2D shows another set of example text images;
Fig. 3A shows an example text image skeleton;
Fig. 3B shows an example feature graph representing extracted features of the skeleton of Fig. 3A;
Fig. 4A shows a region within a skeletonised text image and corresponding feature graph;
13574070v 1 (P274569_Speci_As Filed)
-62017225022 05 Sep 2017
Fig. 4B shows another region within a skeletonised text image and corresponding feature graph;
Fig. 5A shows an example of a glyph stroke, a corresponding skeletonised glyph stroke and a corresponding extracted feature graph
Fig. 5B shows another example glyph stroke, a corresponding skeletonised glyph stroke, and a corresponding extracted feature graph;
Fig. 6 is a schematic flow diagram illustrating a method of comparing components of reference and target text images;
Fig. 7 is a schematic flow diagram illustrating a method of comparing components of a 10 candidate pair;
Fig. 8 is a schematic flow diagram illustrating a method of comparing feature nodes;
Fig. 9A shows an example node and neighbouring nodes in a feature graph of a first text image;
Fig. 9B shows an example node and neighbouring nodes in a feature graph of a second 15 text image;
Fig. 9C shows a further example node and neighbouring nodes of a second text image;
Fig. 9D shows yet a further example node and neighbouring nodes of a second text image;
Fig. 10A shows a reference text image and a target text image;
Fig. 10B shows a feature graphs corresponding to each of the images of Fig. 10A; and
Fig. 11A shows an example reference text image, and a corresponding feature graph;
Fig. 1 IB shows an example target text image and corresponding feature graph; and
13574070vl (P274569_Speci_As Filed)
-72017225022 05 Sep 2017
Figs. 12A and 12B form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced.
DETAILED DESCRIPTION INCLUDING BEST MODE
Where reference is made in any one or more of the accompanying drawings to steps 5 and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
Figs. 2A, 2B, 2C and 2D illustrate various renderings of text that may be processed as reference or target images in accordance with methods described below. In Fig. 2A, two similar example text images 201 and 202 are shown, each being an example of a word in Thai language text. Text images 201 and 202 differ in that they use text fonts in different font weight. The font in text image 202 is bolder than that for text image 201. However, in an asemic sense, the text images 201 and 202 may be considered to be completely identical or matching. That is, the text images 201 and 202 show the same features in the same topological arrangement, and same general geometrical arrangement. Further, the differences between the text images 201 and 202 do not represent changes that may introduce a possibility that the marks that form the glyphs depicted in the text images 201 and 202 represent different meaning (or lack of meaning, or nonsensical arrangement) within the writing system (i.e., written Thai language) utilised for the text in the image.
Fig. 2B shows two similar example text images 203 and 204, each being a word in
Arabic writing and font. The text images 203 and 204 differ in that the images 203 and 204 use different fonts. There are visible differences in the relative sizes of the curved glyph elements of the images 203 and 204, thickness profiles of the various glyph strokes, precise position of diacritics and other supplementary marks and so forth. Again, in an asemic sense, the text renderings of the images 203 and 204 may be considered matching, as such variation does not introduce the possibility of changes in meaning.
Fig. 2D shows another set of example text images 209, 210, 211 and 212. Text image 209 is a word of text in Hindi language and font, while text images 210, 211 and 212 vary the appearance of the text of Fig. 2D in various ways. In comparison to text image 209, text image
210 is missing several elements above the right-side section of the main horizontal bar feature of the text. In an asemic sense, without even being aware of the rules of how Hindi text is
13574070v 1 (P274569_Speci_As Filed)
-82017225022 05 Sep 2017 constructed, it can be asserted that the absence of the missing elements of image 210 does introduce the possibility that the text image 210 has a meaningful difference to text image 209.
Similarly, in text image 211, the positions of glyph elements are different such that parts of glyph strokes that are distinct (separated by white space) in image 209 are colliding in image
211. Again, in an asemic sense, the differing topological arrangement of glyph strokes in the images 209 and 211 introduces the possibility that a difference that affects the meaning of the text image (or causes the text image to become nonsensical) has been introduced. In text image 212, the position of an isolated element has been shifted relative to other glyph features, versus the relative arrangement of the glyph features in text image 209. Although a strict topological change in general glyph form has not been introduced, the relative spatial arrangement between glyph features between text images 209 and 212 has been introduced. Therefore, in an asemic sense, the possibility is introduced that the meaning of the text in the images 210, 211 and 212 (according the rules for how Hindi text should be laid out) has been changed compared to the text of image 209.
Fig. 2C shows example text images 205, 206, 207 and 208, each comprising the letter ‘g’ depicted in different fonts. Within the examples of Fig. 2C, the exact form and arrangement of glyph strokes that comprise each letter ‘g’ differ significantly. For example, the vertical stroke on the right side of the ‘g’ extends above the main circular loop in 205 and 207, but is flush with the top of the loop in 206. The font utilised in text image 208 has a letter ‘g’ that looks very different, with little topological similarity to the other example images 205, 206 and 207 of Fig. 2C. In an asemic sense (that, without applying knowledge of the meaning of elements of the writing system itself), such differences may indeed introduce the possibility that a difference in meaning is introduced.
Fig. 1 is a flowchart showing steps of a method 100 of detecting differences between a reference text image and target text image. The method 100 performs asemic text comparison to detect the differences between a reference text image and target text image.
Figs. 12A and 12B depict a general-purpose computer system 1200, upon which various arrangements of the method 100 and other methods described below can be practiced.
As seen in Fig. 12A, the computer system 1200 includes: a computer module 1201;
input devices such as a keyboard 1202, a mouse pointer device 1203, a scanner 1226, a camera 1227, and a microphone 1280; and output devices including a printer 1215, a display
13574070v 1 (P274569_Speci_As Filed)
-92017225022 05 Sep 2017 device 1214 and loudspeakers 1217. An external Modulator-Demodulator (Modem) transceiver device 1216 may be used by the computer module 1201 for communicating to and from a communications network 1220 via a connection 1221. The communications network 1220 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 1221 is a telephone line, the modem 1216 may be a traditional “dial-up” modem. Alternatively, where the connection 1221 is a high capacity (e.g., cable) connection, the modem 1216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 1220.
The computer module 1201 typically includes at least one processor unit 1205, and a memory unit 1206. For example, the memory unit 1206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 1201 also includes an number of input/output (TO) interfaces including: an audio-video interface 1207 that couples to the video display 1214, loudspeakers 1217 and microphone 1280; an I/O interface 1213 that couples to the keyboard 1202, mouse 1203, scanner 1226, camera 1227 and optionally a joystick or other human interface device (not illustrated); and an interface 1208 for the external modem 1216 and printer 1215. In some implementations, the modem 1216 may be incorporated within the computer module 1201, for example within the interface 1208. The computer module 1201 also has a local network interface 1211, which permits coupling of the computer system 1200 via a connection 1223 to a local-area communications network 1222, known as a Tocal Area Network (TAN). As illustrated in Fig. 12A, the local communications network 1222 may also couple to the wide network 1220 via a connection 1224, which would typically include a so-called “firewall” device or device of similar functionality. The local network interface 1211 may comprise an Ethernet circuit card, a Bluetooth® wireless arrangement or an IEEE 802.il wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 1211.
The I/O interfaces 1208 and 1213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 1209 are provided and typically include a hard disk drive (HDD) 1210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 1212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray Disc™), USB-RAM,
13574070v 1 (P274569_Speci_As Filed)
-102017225022 05 Sep 2017 portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 1200.
The components 1205 to 1213 of the computer module 1201 typically communicate via an interconnected bus 1204 and in a manner that results in a conventional mode of operation of the computer system 1200 known to those in the relevant art. For example, the processor 1205 is coupled to the system bus 1204 using a connection 1218. Likewise, the memory 1206 and optical disk drive 1212 are coupled to the system bus 1204 by connections 1219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun Sparcstations, Apple Mac™ or like computer systems.
The described methods may be implemented using the computer system 1200 wherein the processes of Figs. 1, 6, 7 and 8 to be described, may be implemented as one or more software application programs 1233 executable within the computer system 1200. In particular, the steps of the described methods are effected by instructions 1231 (see Fig. 12B) in the software 1233 that are carried out within the computer system 1200. The software instructions 1231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.
The software may be stored in a computer readable medium, including the storage devices described below, for example. The software 1233 is typically stored in the HDD 1210 or the memory 1206. The software is loaded into the computer system 1200 from the computer readable medium, and then executed by the computer system 1200. Thus, for example, the software 1233 may be stored on an optically readable disk storage medium (e.g., CDROM) 1225 that is read by the optical disk drive 1212. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 1200 preferably effects an advantageous apparatus for implementing the described methods.
In some instances, the application programs 1233 may be supplied to the user encoded on one or more CD-ROMs 1225 and read via the corresponding drive 1212, or alternatively may be read by the user from the networks 1220 or 1222. Still further, the software can also be loaded into the computer system 1200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded
13574070v 1 (P274569_Speci_As Filed)
-112017225022 05 Sep 2017 instructions and/or data to the computer system 1200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Bluray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1201. Examples of transitory or nontangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 1201 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The second part of the application programs 1233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 1214. Through manipulation of typically the keyboard 1202 and the mouse 1203, a user of the computer system 1200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 1217 and user voice commands input via the microphone 1280.
Fig. 12B is a detailed schematic block diagram of the processor 1205 and a “memory” 1234. The memory 1234 represents a logical aggregation of all the memory modules (including the HDD 1209 and semiconductor memory 1206) that can be accessed by the computer module 1201 in Fig. 12A.
When the computer module 1201 is initially powered up, a power-on self-test (POST) program 1250 executes. The POST program 1250 is typically stored in a ROM 1249 of the semiconductor memory 1206 of Fig. 12A. A hardware device such as the ROM 1249 storing software is sometimes referred to as firmware. The POST program 1250 examines hardware within the computer module 1201 to ensure proper functioning and typically checks the processor 1205, the memory 1234 (1209, 1206), and a basic input-output systems software (BIOS) module 1251, also typically stored in the ROM 1249, for correct operation. Once the POST program 1250 has mn successfully, the BIOS 1251 activates the hard disk drive 1210 of Fig. 12A. Activation of the hard disk drive 1210 causes a bootstrap loader program 1252 that is
13574070v 1 (P274569_Speci_As Filed)
-122017225022 05 Sep 2017 resident on the hard disk drive 1210 to execute via the processor 1205. This loads an operating system 1253 into the RAM memory 1206, upon which the operating system 1253 commences operation. The operating system 1253 is a system level application, executable by the processor 1205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.
The operating system 1253 manages the memory 1234 (1209, 1206) to ensure that each process or application running on the computer module 1201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 1200 of Fig. 12A must be used properly so that each process can mn effectively. Accordingly, the aggregated memory 1234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 1200 and how such is used.
As shown in Fig. 12B, the processor 1205 includes a number of functional modules including a control unit 1239, an arithmetic logic unit (AFU) 1240, and a local or internal memory 1248, sometimes called a cache memory. The cache memory 1248 typically includes a number of storage registers 1244 - 1246 in a register section. One or more internal busses 1241 functionally interconnect these functional modules. The processor 1205 typically also has one or more interfaces 1242 for communicating with external devices via the system bus 1204, using a connection 1218. The memory 1234 is coupled to the bus 1204 using a connection 1219.
The application program 1233 includes a sequence of instructions 1231 that may include conditional branch and loop instructions. The program 1233 may also include data 1232 which is used in execution of the program 1233. The instructions 1231 and the data 1232 are stored in memory locations 1228, 1229, 1230 and 1235, 1236, 1237, respectively. Depending upon the relative size of the instructions 1231 and the memory locations 1228-1230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 1230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 1228 and 1229.
13574070v 1 (P274569_Speci_As Filed)
-132017225022 05 Sep 2017
In general, the processor 1205 is given a set of instructions which are executed therein.
The processor 1205 waits for a subsequent input, to which the processor 1205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 1202, 1203, data received from an external source across one of the networks 1220, 1202, data retrieved from one of the storage devices 1206, 1209 or data retrieved from a storage medium 1225 inserted into the corresponding reader 1212, all depicted in Fig. 12A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 1234.
The disclosed arrangements use input variables 1254, which are stored in the memory 1234 in corresponding memory locations 1255, 1256, 1257. The disclosed arrangements produce output variables 1261, which are stored in the memory 1234 in corresponding memory locations 1262, 1263, 1264. Intermediate variables 1258 may be stored inmemory locations 1259, 1260, 1266 and 1267.
Referring to the processor 1205 of Fig. 12B, the registers 1244, 1245, 1246, the arithmetic logic unit (ALU) 1240, and the control unit 1239 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 1233. Each fetch, decode, and execute cycle comprises:
a fetch operation, which fetches or reads an instruction 1231 from a memory location 1228, 1229, 1230;
a decode operation in which the control unit 1239 determines which instruction has been fetched; and an execute operation in which the control unit 1239 and/or the ALU 1240 execute the instruction.
Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 1239 stores or writes a value to a memory location 1232.
13574070v 1 (P274569_Speci_As Filed)
-142017225022 05 Sep 2017
Each step or sub-process in the processes of Figs. 1, 6, 7 and 8 is associated with one or more segments of the program 1233 and is performed by the register section 1244, 1245, 1247, the AFU 1240, and the control unit 1239 in the processor 1205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 1233.
The described methods may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of described methods. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.
The method 100 begins at receiving step 101, in which reference and target images for the text to be compared are received, under execution of the processor 1201. The received images may be received as bitmaps and be stored as image files in a file system (e.g., implemented by the software application program 1233), encoded in a lossy or lossless image encoding format such as JPEG, PNG or others. Alternatively, the reference and target images may be received in the form of bitmap data in the memory 1206. In the disclosed arrangements, the images are assumed to be 1-bit-per-pixel (bi-level) (i.e., each pixel is either on or off, without multiple colour channels or grey levels). Other arrangements are possible in which images of higher bit depth are received at step 101, and are subject to a conversion process to bi-level format before continuing.
The method 100 then proceeds to skeletonising step 102, where the received reference and target images are skeletonised under execution of the processor 1205. In the step 102, skeletonised images are produced in which the strokes of glyphs within the images are reduced in thickness until all glyph strokes are one pixel in width. The skeletonised images may be stored in the memory 1206. The skeletonising process executed at step 102 may be carried out according to any suitable method. At step 102, the reference and target images are subjected to an iterated erosion operation in which strokes are repeatedly thinned by removing pixels that lay on the boundary of glyph strokes, until all such strokes are a single pixel in thickness. In some arrangements, further processing steps of the method 100 are carried out using the skeletonised versions of the reference and target images. For example, in some arrangements, the original stroke width (in pixels) corresponding to each point along skeletonised glyph strokes is recorded in order to aid later processing steps. The original stroke width may be
13574070v 1 (P274569_Speci_As Filed)
-152017225022 05 Sep 2017 ascertained by counting the number of erosion operations that occur for each image location during the skeletonisation process.
The method 100 then proceeds to constructing step 103. At step 103, feature graphs are constructed for the skeletonised reference and target images, under execution of the processor
1205. The constructed feature graphs may be stored in the memory 1206. In one arrangement, features extracted at step 103 include: local minima and maxima (“extrema”) in horizontal or vertical location of glyph strokes; endpoints of glyph strokes; and junctions of glyph strokes. The extracted features are used to construct a feature graph, in which nodes of the graph correspond to extracted features, and connections between nodes exist where glyph strokes form a path linking neighbouring features. For example, Fig. 3B shows a feature graph 302 generated for an example text image skeleton 301. In some arrangements, the nodes in the feature graph constructed at step 103 are augmented with data including the co-ordinates of the extracted feature in the original or skeletonised images. One suitable method for detecting image locations corresponding to features is to convolve the text image using filters pre15 selected to highlight the required features (e.g., filters for detecting “T”-shaped junctions with various orientations).
In some arrangements, step 103 of constructing feature graphs additionally includes a step in which closely situated features are combined, which will be described in more detail below with reference to Figs. 4A and 4B.
At decomposing step 104, the feature graphs constructed at step 103 are then decomposed into connected components for further processing. Each component produced at step 104 corresponds to a disjoint region of the feature graph.
The method 100 then proceeds to comparison step 105, where the components in the feature graph for the target image, as generated in step 104, are compared to components in the feature graph for the reference image. A method 600 of comparing components of the reference and target text mages, as executed at step 105, will be described in detail below with reference to Fig. 6. In step 105, some components from the reference and target image feature graphs may be determined as matching, with the remaining components determined as unmatching.
Finally, at emitting step 106, the method 100 identifies the unmatching components determined in step 105 and emits the unmatching components as identified differences between
13574070v 1 (P274569_Speci_As Filed)
-162017225022 05 Sep 2017 the reference and target images. In some arrangements, the components are associated with image areas within the original reference and target images as determined in step 104, and the differences thus reported as image locations (such as bounding boxes).
Fig. 3A shows an example skeletonised glyph 301, and Fig. 3B shows a corresponding example feature graph 302 as may be produced in step 103 of text image comparison method 100. During the feature extraction process for the example skeletonised glyph 301, pixels at locations 302 and 303 are identified as forming a local minimum or a local maximum with respect to horizontal position in the skeletonised text image. Similarly, pixels at locations 304, 305 and 306 form each for a local minimum or local maximum with respect to vertical position in the skeletonised text image. A local minimum or local maximum occurs at locations where pixels have a highest or lowest position (x or y coordinate) amongst pixels along a skeletonised glyph stroke in a local region. In the example of Figs. 3A and 3B, pixels at a location 307 are identified as forming a junction, and pixels at a location 308 are identified as forming an endpoint.
The corresponding feature graph 302 comprises feature nodes 309 to 315, and connections between nodes that are reachable by a chain of neighbouring pixels in the skeletonised text image. Each node in the feature graph 302 is denoted by the type of extracted feature that the node represents. For example, the feature nodes 309, 310, 311, 312 and 313 are denoted as being extremum (minimum or maximum) features, feature node 314 is denoted as being a junction feature, and feature node 315 is denoted as being an endpoint feature. In some arrangements, nodes in the feature graph 302 may be augmented with additional information, including: position (x and y coordinates) in the skeletonised text image of an extracted feature; original stroke thickness at the point in the original text image corresponding to the extracted feature; and valency (number of incoming glyph strokes) of junctions. By virtue of being a graph, each feature node intrinsically is associated with neighbouring feature nodes (i.e., the feature nodes directly connected to the feature node).
The feature graph may be stored in memory 1206 as a data structure comprising a set of node records in memory 1206, and a set of edge records in memory 1206 each referencing or containing a memory pointer to the two node records connected by an edge in the feature graph
302. Each node record may additionally be augmented by values stored in memory 1206 such as the position in the skeletonised text image of that feature; original stroke thickness at that point in the original text image corresponding to the extracted feature; and valency of junctions.
13574070v 1 (P274569_Speci_As Filed)
2017225022 05 Sep 2017
-17When processing the skeletonised text image to form the feature graph, isolated pixels corresponding to small image features (such as dot-like diacritic marks) in the original text image may be reduced to single endpoint feature nodes with no connections to neighbouring features.
The feature graph constructed at step 103 is stored as a data structure in memory 1206 for processing by further steps of the method 100.
During step 103 for constructing the feature graphs for the reference and target text images, nearby features in certain configurations may be merged. The features may be merged in order that artefacts that sometimes arise from the skeletonization step 102 do not introduce topological differences between text images where no asemic difference is represented in the reference and target text images. Figs. 4A and 4B comprise two examples in which junction merging occurs.
Fig. 4A shows a first example of junction merging. Fig. 4A shows an example of a region 401 within a skeletonised text image, with two nearby junction features, and partial glyph strokes leading to other glyph features. As seen in Fig. 4A, corresponding feature graph 402 in the region 401 is shown. Features nodes 407 and 408 of the feature graph 402 represent junctions of the skeletonised glyph 401. In the example of Fig. 4A, the locations of the junction features 407 and 408 are found to be separated by a distance smaller than the stroke width of the glyph multiplied by a scaling factor. In the example arrangement, if the node separation distance is less than half the stroke width, for example, then node merging occurs. In some arrangements, where the skeletonisation process associates a stroke width for a pixel, stroke or region within the skeletonised glyph, the stored stroke width value is referenced when determining whether nearby glyph features should be merged.
As also seen in Fig. 4A, a simplified feature graph 403 shows the result of merging junction features 407 and 408 to form a new junction feature 409. In some arrangements, the merged junction feature 409 has a position equivalent to the centroid or mean of the original nearby junctions. In modified feature graph 403, nodes that were previously connected to the feature nodes that became merged will then become connected to the feature node representing the merged feature. The valency (i.e., number of neighbouring features) of the merged feature is four in the example of Fig. 4A, whereas the original junctions each had a valency of three.
13574070v 1 (P274569_Speci_As Filed)
-182017225022 05 Sep 2017
In some arrangements, glyph merging for complex clusters of feature nodes is performed iteratively by considering additional nearby nodes sequentially.
Feature nodes of a feature graph may be further processed in order to categorise the feature nodes as being strong features, or weak features. In one arrangement, feature node matching when comparing feature graphs for the reference and target text images occurs by first determining a mapping of only strong features between the feature graphs, and then, failing that, including weak features in one feature graph as potential matches for strong features in the other feature graph.
Figs. 5A and 5B show two examples of arrangements for determining strong and weak features in a feature graph as constructed in step 103 of the method 100. A first example is shown in Fig. 55A, which comprises an example glyph stroke 501, a corresponding skeletonised glyph stroke 502, and a corresponding extracted feature graph 503. A second example is shown in Fig. 5B, which comprises an example glyph stroke 511, a corresponding skeletonised glyph stroke 512, and a corresponding extracted feature graph 513. The example glyph strokes 501 and 511 are similar, vertical glyph strokes.
In Fig. 5 A, the result of skeletonising the original glyph stroke 501 yields a skeleton 502 with endpoint features 504 and 505, and an x-extremum feature 506 (being a local minimum).
In the corresponding feature graph 503, feature nodes 507 and 508 represent endpoints, and feature node 509 represents the extremum. In one arrangement, a distance measurement 510 is considered for each pair of nodes joined by a connection in the feature graph. In cases where an extremum feature is connected to another feature, the distance between the extremum feature and the other feature is compared to the stroke thickness for the glyph stroke or for that location within the text image. Next, if the separation distance 510 is smaller than the stroke thickness (or, in some arrangements, the stroke thickness multiplied by some scaling factor), then each extremum feature in the feature pair under consideration is designated as a weak feature. In some arrangements, the vertical or horizontal components of the distance 510 may be used instead. The designation may be stored as supplementary data about the feature node, as recorded in the data structure for the feature graph, and stored in memory 1206.
In Fig. 5B, the result of skeletonising the original glyph stroke 511 yields a skeleton 512 with endpoint features 514 and 515, and an x-extremum feature 516 (being a local maximum).
In the corresponding feature graph 513, feature nodes 517 and 518 represent endpoints, and feature node 509 represents the extremum. In one arrangement, a collinearity measurement is
13574070v 1 (P274569_Speci_As Filed)
-192017225022 05 Sep 2017 determined, by considering the relative positions of extremum nodes and their two connected nodes. In the example of Fig. 5B, extremum feature node 519 is considered in conjunction with the nodes that node 519 connects to, being endpoint feature nodes 517 and 518. A distance 521 from the considered node 519 to a line 520 joining the other nodes is determined, in order to measure a degree of collinearity. The distance 521 is compared to a stroke thickness (or, thickness multiplied by a scaling factor) for the glyph stroke, or for a location within the text image. Next, if the distance 521 is smaller than the stroke thickness (or, in some arrangements, the stroke thickness multiplied by some scaling factor), then the extremum feature under consideration is designated as a weak feature, and similarly stored as supplementary data as for the example of Fig. 5 A.
In other arrangements, the geometrical arrangement of features (using feature locations of the skeletonised glyphs), and topological arrangement of features (using connections between pairs of feature nodes in the feature graph) are considered, in order to designate certain features as being weak or strong. In some arrangements, features nodes not explicitly identified as being weak features may be designated as being strong features.
In some arrangements, nodes of the feature graph may be designated as being representative nodes of the feature graph, with other feature nodes being deemed to be auxiliary nodes of the feature graph. In the arrangements described above, strong feature nodes correspond to the representative nodes, and weak feature nodes correspond to the auxiliary nodes.
Further examples may be constructed, beginning with vertical glyph strokes similar to 501 and 511, that yield skeletonised glyph strokes comprising only two endpoint features, and not having any extrema features at all. Or, in other examples, the skeletonised glyph stroke may have multiple extrema features, which are designated as weak features by means of the aforementioned arrangements.
The method 600 of comparing components of the reference and target text mages, as executed at step 105, will now be described with reference to Fig. 6. The method 600 may be implemented as instructions of one or more of the software code modules of the application program 1233 stored in the hard disk drive 1210 and being executed on processor 1205.
The method 600 takes as input a set of components for the reference text image, each component being a connected set of nodes from the feature graph of the reference text image,
13574070v 1 (P274569_Speci_As Filed)
-202017225022 05 Sep 2017 and a set of components for the target text image, each component being a connected set of nodes from the feature graph of the target text image. After completing the steps of method 600, components in one set that remain not matched with any components in the other set are indicated as regions of difference between the reference and target text images.
The method 600 begins at initialisation step 601, in which the components of the reference and target text images are initialised as lists of candidate components - a first list corresponding to the components of the reference text image, and a second list corresponding to the components of the target text image. The lists of candidate components may be stored in the memory 1206. The method 600 shall iterate pairwise over the candidate components, being potential candidates for matching.
The method 600 then proceeds to selecting step 602, where a pair of components (one from each candidate list) are selected as the current candidate pair, under execution of the processor 1205. In some arrangements, the candidate pair is selected by considering the spatial arrangement or relative positions of components within their respective text images, the number of features, or spatial size or extent of the components - thereby prioritising candidate pairs that are more likely to result in a match.
In one arrangement, the relative layout and position of components may be analysed in order to categorise components as likely being one of: belonging to “mainline” text; being a diacritical mark positioned above the mainline text; or, being a diacritical mark positioned below the mainline text. Then, when selecting component pairs for comparison in step 602, mainline components are selected in combination with other mainline components with a higher priority than non-mainline components, and diacritical marks are selected in combination with other diacritical marks with higher priority than components that are not diacritical marks. The arrangement where relative layout and position of components are analysed in order to categorise components may also be combined with approaches in which the spatial size of components, or their number of features, is also used to rank components as being likely matches, and performing component comparison of the most likely matches first.
At comparing step 603, the components of the current candidate pair are compared to determine if the components match. A method 700 of comparing the components of the current candidate pair, as executed at step 603, will be described in reference to Fig. 7 below. At decision step 604, if the components of the current candidate pair are matched, then the method 600 flows to removing step 605. At step 605, the components are removed from the list of
13574070v 1 (P274569_Speci_As Filed)
-212017225022 05 Sep 2017 candidate components (and therefore shall not be subject to further consideration in determining a set of unmatching components according to the steps of method 600). If instead at step 604, the components of the current candidate pair are determined not to match, then the method 600 flows to decision step 607.
At step 607, if there remains another component that may be selected as one of the candidates in the current candidate pair to form a potential match, then the method 600 returns to step 602 in order to consider a different potential match. Otherwise, the remaining component in the current candidate pair is determined to be exhausted, and the method 600 flows to step 608 at which the exhausted component is indicated as a region of difference between the reference and target text images. After completing steps 605 or 608, the method 600 flows to iteration step 606 at which if there remain any further candidate component pairs to consider, then the method 600 returns for further iteration at step 602. Otherwise, the method 600 proceeds to step 609, at which all the remaining components in the first and second candidate component lists are indicated as differences between the reference and target text images.
As shown in Fig. 6, the iterative loop of method 600 iterates over possible combinations of remaining unmatched components from the reference and target text images. In other approaches, a maximally-matching mapping from components in the reference set to components in the target set may be found, without necessarily removing matches from consideration. In further arrangements, at step 602, a combination of two or more components (e.g., the sum or union of those components) from the reference or target image may be considered as a single component, and compared to candidate component (or union) in the other text image. A match may be recorded, but denoted as being indicative of a gap or other minor topological change existing between the matched components. Such differences are emitted from the method 600 for reporting or further processing.
In comparison step 603 of method 600, a single component of the reference text image is compared to a single component of the target text image. A component in the reference image matches a component in the target image if the features in those components match. In one arrangement, the two components (one from the reference text image, and one from the target text image) are deemed to match if each of the strong features in the feature graph of each component matches a strong or weak feature in the feature graph of the other component. A method 700 of comparing the components of the current candidate pair to determine if the
13574070v 1 (P274569_Speci_As Filed)
-222017225022 05 Sep 2017 components match, as executed at step 603, will be described in detail below with reference to Fig. 7. The method 700 may be implemented as instructions of one or more of the software code modules of the application program 1233 stored in the hard disk drive 1210 and being executed on processor 1205.
The method 700 begins at determining step 701, in which it is determined if there exists a one-to-one mapping of all strong feature nodes in a feature graph of a first component to all strong feature nodes in a feature graph of a second component. In making the determination at step 701, a node may only be mapped to another node if nodes themselves match, the conditions for which shall be described in arrangements below. A method 800 of comparing features nodes, as executed at step 701, will be described in detail below with reference to Fig.
8.
At decision step 702, the mapping of matching nodes is examined, to determine if there remain any unmatched strong features in either the first or second component. If there are unmatched strong features in either the first or second component, then the method 700 continues to determining step 703 in which unmatched strong features in each feature graph are matched to weak features in the other feature graph. In this second stage of matching of the method 700, a mapping of nodes is sought in which strong features in one component may be mapped to weak features in the other component. However, there is no requirement that all weak features in either component are included in the mapping of the features. At decision step
704, if there remain any unmatched strong features in the feature graph of either component, then the method 700 continues to indicating step 705, at which it is indicated that the components under consideration do not match, and the method 700 ends. If, at either of decision steps 702 or 704 it is determined that a complete mapping without any unmatched strong features exists, then the method 700 continues instead to indicating step 706, at which it is indicated that the components under consideration match, and the method 700 ends.
In some arrangements, the method 700 determines a similarity measure with a discrete or continuous range of possible values. The method 700 has been described in the above arrangement as resulting in a binary “matching” or “non-matching” determination. However, in other arrangements, values such as “60%” matching may be used instead. In such arrangements where values like “60%” matching are used, the method 600 instead determines a mapping of components from the reference and target text images in which the similarity measure is
13574070v 1 (P274569_Speci_As Filed)
-232017225022 05 Sep 2017 maximised; or in which the mapping incorporates pairs of components for which the similarity measure is above a desired threshold.
Figs. 10A and 10B show an example of component matching. In Fig. 10A, a reference text image 1001 and a target text image 1002 are shown. The text images 1001 and 1002 are similar glyph strokes, in a font that differs slightly. Fig. 10B shows the extracted feature graphs 1003 and 1004 for the example text images 1001 and 1002, with feature graph 1003 corresponding to text image 1001 and feature graph 1004 corresponding to text image 1002.
The feature graphs 1003 and 1004 include a number of endpoint feature nodes 1005, 1008,
1009 and 1012, and extremum feature nodes 1006, 1007, 1010 and 1011. In particular, the extremum node 1010 in the feature graph for the target text image 1002 has been indicated as being a weak feature due to proximity of the node 1010 to extremum node 1011. According to the steps of method 700, in the example of Figs. 10A and 10B, a mapping of matching feature nodes may be found with node 1005 matching node 1009, node 1007 matching node 1011, and node 1008 matching node 1012. However, decision step 702 will find an unmatched strong feature node 1006. At step 703, the weak feature 1010 is considered, and found as a match for strong feature node 1006. Therefore, at decision step 704 there are no unmatched strong features and the method 700 proceeds to indicate the components 1001 and 1002 as matching.
Arrangements for determining a match between a feature node in first component and a feature node in a second component, as occurs at step 701 of method 700 described above, shall now be described with reference to Figs. 9A, 9B, 9C and 9D. In some arrangements, a first feature node matches a second feature node if the type of the first feature node is the same as the type of the second feature node, both being one of the types junction, extremum or endpoint. In some arrangements, in comparing a first feature node and a second feature node that are both junctions, the feature nodes match if the degree of the first and second feature nodes is the same.
That is, the number of other feature nodes that the first junction is connected to in the feature graph of the first component is required to be the same as the number of other feature nodes that the second junction is connected to in the feature graph of the second component. Endpoint features have one connection to other nodes of the feature graph, unless the end point features are isolated “dot” features in the skeletonization of the text image, and hence will also be isolated components. Similarly, extremum feature nodes always have two connections to other nodes of the feature graph, and hence in some arrangements requirements for comparing the degree of feature nodes are only performed for junction features.
13574070v 1 (P274569_Speci_As Filed)
-242017225022 05 Sep 2017
Fig. 8 illustrates an example method 800 of comparing features nodes. The method 800 compares feature nodes to determine if a feature node from a reference text image feature graph matches a feature node from a target text image feature graph, as at step 701 of the method 700. The method 800 may be implemented as one or more of the software code modules of the application program 1233 stored in the hard disk drive 1210 and being executed on processor 1205.
The method 800 compares the given feature nodes at decision steps 801, 802 and 803 in which aspects of the feature nodes are compared. If the nodes are determined to be of the same type at decision step 801, then the method 800 proceeds to step 802. Otherwise, the method
800 proceeds to indicating step 805 where the method 800 indicates that the feature nodes do not match.
If the nodes are determined to be of the same degree at decision step 802, then the method 800 proceeds to step 803. Otherwise, the method 800 proceeds to step 805.
If the nodes are determined to have connections in similar relative angles at decision 15 step 803, then the method 800 proceeds to indicating step 804. Otherwise, the method 800 proceeds to step 805. Step 803 will be described in further detail below.
At step 804, the method 800 indicates a match. Steps 801 and 802 require that the nodes are the same type and degree, as described above.
In further arrangements, the geometrical layout or position of feature nodes may be 20 utilised in determining if a first feature node matches a second feature node. The spatial coordinates of feature nodes may be recorded during the feature graph extraction step 103. In one arrangement, step 803, in determining a match between a first feature node and a second feature node, compares the relative angles (the angle, or direction, formed by the placement of the features relative to one another) to each of connected nodes from the first and second feature nodes. For a match to occur in arrangements where the geometrical layout or position of feature nodes are utilised in determining if a first feature node matches a second feature node, it is required that the relative angles match within the bounds of a chosen tolerance.
Figs. 9A to 9D illustrate a number of examples of matching and mismatching feature nodes. Fig. 9A shows an example component 920 of a first text image. Figs. 9B, 9C and 9D
13574070v 1 (P274569_Speci_As Filed)
-252017225022 05 Sep 2017 show three views of an example component 921, 922 or 923 which is to be compared to component 920 in the example of Figs. 9A to 9D.
The example component 920 of Fig. 9A shows a junction node 901 which is the subject of the comparison step as performed in step 701, and to be compared against feature node 906 from either of Figs. 9B, 9C or 9D. Junction node 901 has a degree of three, having connections to nodes 902, 903 and 904. The shaded area 905 represents locations in which the relative angle is within an accepted tolerance from the junction node 901 to connected nodes 902, 903 and 904. The shaded area 905, for the purposes of understanding the example of Fig. 9 A, are considered to extend any arbitrary distance from the node 901.
The second example component, as shown in Fig. 9B, Fig. 9C or Fig. 9D also has a junction feature node 906 to be compared against feature node 901, and connections to nodes 907, 908 and 909. In each view 921, 922 or 923 of the second example component, the nodes are overlayed upon the same shaded area 905 of the first example component 920.
In the example shown in Fig. 9B, all the connections to other feature nodes from 906 lie within the shaded region 905. That is, the relative angle to connected nodes, for the comparison of nodes 901 and 906, remains within a specified tolerance. In the example of Fig. 9B, it is determined that nodes 901 and 906 match, and therefore may form part of a determination of matching strong features in step 701 of method 700.
In Fig. 9C, the node 908 does not lie within the shaded region 905, and therefore node
901 does not match node 906 in the example of Fig. 9C.
For the example shown in Fig. 9D, there is an additional connection, being to node 910, meaning that the junction node 906 has a different degree (four) to that of node 901 (three), and therefore there is not a match between the nodes 906 and 901 in the example of Fig. 9D.
Step 106 of method 100 shall now be described in further detail, in reference to Figs.
11A and 1 IB. Fig. 11A shows an example reference text image 1101, and a corresponding feature graph 1103. Fig. 1 IB shows an example target text image 1102 and corresponding feature graph 1104. The reference text image 1101 contains several components 1110, 1111 and 1112, plus several diacritic marks such as 1116 that shall not be considered as matching or nonmatching components for the example of Figs. 11A and 1 IB. The target text image 1102 contains a similar set of components 1113, 1114 and 1115, and has differences from the
13574070v 1 (P274569_Speci_As Filed)
-262017225022 05 Sep 2017 reference text image 1101. For example, in an asemically-significant difference, the position of diacritic mark 1116 is different, partially colliding with component 1113. There are nonasemically-significant differences, being variations in the exact shape of various glyphs in the text image components, and being represented as differences in the feature graphs 1103 and
1104. For example, the precise shape of the central component 1111/1114 has a difference that results in a different feature graph being extracted. The feature graph 1103 for the reference text image has an extra node 1106 not present in feature graph 1104. Similarly, node 1105 does not have a corresponding node in the feature graph 1104 of the target text image.
In the example of Fig. 11, components 1110 and 1113 do not match one another (nor do the components 1110 and 1113 match in other components), as during the component matching method 700, the feature graphs for the components 1110 and 1113 do not match, because extremum feature node 1117 does not match junction feature node 1118. Therefore, in step 106, the text image areas 1107 and 1108 surrounding mismatching components 1110 and 1113 are indicated as being a region of difference between the reference and text images.
Also in the example of Figs. 11A and 1 IB, in comparing components 1111 and 1114, the components 1111 and 1114 are found to be matching. The components 1111 and 1114 are matching as the extra node 1106 in the feature graph of component 111 lis a weak feature node (according to arrangements described in reference to Fig. 5). It is still possible to set up a mapping of matching strong nodes without the need to additionally include weak nodes (step
703). Hence, the region of components 1111 and 1114 is not emitted as a region of difference during the comparison of the reference and target text images of the example of Fig. 11A and 11B.
In some arrangements, in reporting regions of difference, an area such as area 1109 that is localised to surround only the non-matching feature nodes of unmatched components may be reported instead of highlighting the entire unmatched component. In arrangements where method 700 determines a similarity measure between text images, the reported differences may be visually denoted or communicated by using highlight region color, highlight region border represent the similar measure between components. Also, in some such arrangements, components with a best similarity measure to other components that fall below a threshold may be indicated amongst the visually reported differences. Components with a best similarity measure to other components that fall above a threshold are not indicated.
13574070v 1 (P274569_Speci_As Filed)
2017225022 05 Sep 2017
-27In the methods described above, feature nodes are categorised as strong or weak, node matching takes place prioritising finding matches between strong feature nodes, and remaining unmatched strong feature nodes are matched to weak feature nodes. The methods described above presents several advantages over previous methods. Requiring all nodes to match, regardless of strong/weak categorisation causes mismatching in cases where features exist only in weak form and only in one text image (and not the other). If weak features were instead removed from consideration entirely, then there would be a risk that mismatching would occur when the other text image has those features in a stronger form. The methods described above allow matching to occur when the text images differ by small amounts, that are also not topologically significant. The described methods are not unnecessarily sensitive to differences that are not topologically significant.
The methods described above further allow the text image comparison method 100 to detect differences between text images that are topologically equivalent (i.e., in terms of the mere connectivity of endpoint and junction feature nodes). By additionally considering extremum features for the comparison step, the methods described above then require approximate geometric similarity in addition to topological equivalence. However, the described methods only require equivalence of extremum features when the features are strong features - thereby allowing some minor variation (or even the absence of an extremum feature in one of the text images under comparison) when a feature is of minor significance.
The described methods have useful applications where renderings of text are to be compared, without requiring that the comparison involves parsing, recognising or understanding depicted text. In one application, during development of a hardware or software product including rendered text on a display, changes in the appearance of rendered text may be easily and conveniently detected. Further, the changes in the appearance of rendered text may be detected without the need to manually inspect the prior and current appearance of potentially hundreds or thousands of text assets displayed within the user interface of the product. Changes in text appearance may be introduced by the introduction of a new or different version of a displayable font; or code changes handling text shaping of complex gylphs, or other text layout and rendering code.
In another application, the described methods may be utilised for checking for changes in the rendered appearance of mathematical formulae and other typographical constructs with text-like appearance, in which the topological and geometrical arrangement of elements may be
13574070v 1 (P274569_Speci_As Filed)
-282017225022 05 Sep 2017 permitted to vary slightly. However, larger shifts or modifications to appearance introduce an asemic difference.
The described methods have further applications in testing the implementation of software libraries for performing complex text shaping, text layout and text rendering, by flagging the occurrence, during testing, of code changes that significantly alter the outputted rendered text, without being overly sensitive to minor differences as may occur during iterative development of those libraries.
In another application, the described methods may be utilised for performing automatic font substitution for typographic documents (or other sources). The automatic font substitution for typographic documents may be performed by determining a similarity measurement for text strings provided in the document, with alternate renderings in a substituted font, in order to select an alternate font with overall closest similarity for text content appearing in the document.
In a further application, the described methods may be utilised as a tool for the analysis of text in unknown or undeciphered alphabets, language or palimpsest. The described methods may be utilised to create a dictionary of unique symbols that appear in a given source (allowing for minor variation, which may be deemed as being not asemically significant), along with similarity measurements between symbols in that dictionary. Such a dictionary may be a useful aid for an expert working to decipher the meaning of the unknown text.
Industrial Applicability
The arrangements described are applicable to the computer and data processing industries and particularly for the image processing.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.
13574070v 1 (P274569_Speci_As Filed)
-292017225022 05 Sep 2017
CLAIMS:
Claims (18)
1. A method of processing image data, the method comprising:
receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
5 determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
selecting one or more of the nodes from each of the determined graphs based on a type
10 of the nodes;
determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target
15 representation graph with auxiliary nodes of the reference representation graph; and processing at least one of the target image data and the reference image data based on the determined similarity measure.
2. The method according to claim 1, wherein unmatched selected nodes of the target
20 representation graph are compared with auxiliary nodes of the reference representation graph.
3. The method according to claim 1, further comprising determining a data structure to determine the degree of nodes.
25
4. The method according to claim 1, wherein the nodes are selected based on a distance to other nodes and a collinearity measure.
5. The method according to claim 3 , wherein the distance and the collinearity measure are determined based on spatial coordinates stored in the memory for the nodes.
6. The method according to claim 1, further comprising:
selecting a first strong node in the target representation graph; selecting a second strong node in the reference representation graph corresponding to the first strong node based on a type and a degree of the first strong 35 node, the type being at least one of junction and endpoint;
13574070v 1 (P274569_Speci_As Filed)
-302017225022 05 Sep 2017 determining neighbouring strong nodes for the first strong feature in the target graph;
determining neighbouring strong node for the second strong node in the reference graph; and
5 determining similarity of corresponding identified neighbouring strong nodes from the target graph and the reference graph based on type of the nodes, degree of the nodes and spatial distance.
7. The method according to claim 1, further comprising displaying the at least the target
10 image data and highlighting locations of unmatched selected nodes.
8. The method according to claim 1, wherein nodes are selected by marking a node either weak or strong and selecting strong nodes.
15
9. The method according to claim 6, wherein geometric correspondence is used to guide comparison of strong nodes.
10. The method according to claim 1, wherein geometric proximity and topological connectedness is used to compare a strong node of the target representation graph and a weak
20 node of the reference representation graph.
11. An apparatus for processing image data, the apparatus comprising: means for receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
25 means for determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
means for selecting one or more of the nodes from each of the determined graphs based
30 on a type of the nodes;
means for determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with
13574070v 1 (P274569_Speci_As Filed)
-312017225022 05 Sep 2017 corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and means for processing at least one of the target image data and the reference image data
5 based on the determined similarity measure.
12. A system for processing image data, the system comprising: a memory for storing data and a computer program;
a processor coupled to the memory for executing the computer program;
10 receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation,
15 each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
selecting one or more of the nodes from each of the determined graphs based on a type of the nodes;
determining a similarity measure for the target image data and the reference 20 image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and
25 processing at least one of the target image data and the reference image data based on the determined similarity measure.
13. A non-transitory computer readable medium having stored on the medium a computer program for processing image data, the program comprising:
30 code for receiving target image data associated with a skeletonised target representation and reference image data associated with a skeletonised reference representation;
code for determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked
13574070v 1 (P274569_Speci_As Filed)
-322017225022 05 Sep 2017 in a memory based on edges in the corresponding skeletonised representation, each of the nodes being marked, in the memory, as at least one of a junction, extremum and endpoint;
code for selecting one or more of the nodes from each of the determined graphs based on a type of the nodes;
5 code for determining a similarity measure for the target image data and the reference image data by comparing the selected nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of corresponding nodes, and comparing one or more of the selected nodes of the target representation graph with auxiliary nodes of the reference representation graph; and
10 code for processing at least one of the target image data and the reference image data based on the determined similarity measure.
14. A method of processing image data, the method comprising:
receiving target image data associated with a skeletonised target representation and
15 reference image data associated with a skeletonised reference representation;
determining a graph for each of the skeletonised target representation and the skeletonised reference representation, each of the graphs comprising a plurality of nodes linked in a memory based on edges in the corresponding skeletonised representation, wherein at least one node is marked, in the memory, as an extremum;
20 determining a similarity measure for the target image data and the reference image data by comparing nodes of the target representation graph with corresponding nodes of the reference representation graph based on a distance, a respective type and a degree of the corresponding nodes, wherein an unmatched extremum of the target representation graph is further compared with auxiliary extrema of the reference representation graph to determine a
25 similarity measure for the target image data and the reference image data; and processing at least one of the target image data and the reference image data based on the determined similarity measure.
CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant
30 Spruson & Ferguson
13574070v 1 (P274569_Speci_As Filed)
100
101
2017225022 05 Sep 2017
- 1 / 15 -
102
103
104
105
106
Fig. 1
13574610v1 (P274569_Figs_As Filed)
-2/ 15 2017225022 05 Sep 2017
201
203
205
202 itiou maw
Fig. 2A
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2017225022A AU2017225022A1 (en) | 2017-09-05 | 2017-09-05 | Method, system and apparatus for processing image data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2017225022A AU2017225022A1 (en) | 2017-09-05 | 2017-09-05 | Method, system and apparatus for processing image data |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2017225022A1 true AU2017225022A1 (en) | 2019-03-21 |
Family
ID=65728736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2017225022A Abandoned AU2017225022A1 (en) | 2017-09-05 | 2017-09-05 | Method, system and apparatus for processing image data |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2017225022A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275049A (en) * | 2020-01-19 | 2020-06-12 | 佛山市国方识别科技有限公司 | Method and device for acquiring character image skeleton feature descriptors |
CN115375685A (en) * | 2022-10-25 | 2022-11-22 | 临沂天元混凝土工程有限公司 | Method for detecting sand particle size abnormity in concrete raw material |
-
2017
- 2017-09-05 AU AU2017225022A patent/AU2017225022A1/en not_active Abandoned
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275049A (en) * | 2020-01-19 | 2020-06-12 | 佛山市国方识别科技有限公司 | Method and device for acquiring character image skeleton feature descriptors |
CN111275049B (en) * | 2020-01-19 | 2023-07-21 | 佛山市国方识别科技有限公司 | Method and device for acquiring text image skeleton feature descriptors |
CN115375685A (en) * | 2022-10-25 | 2022-11-22 | 临沂天元混凝土工程有限公司 | Method for detecting sand particle size abnormity in concrete raw material |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2020511726A (en) | Data extraction from electronic documents | |
US8351691B2 (en) | Object extraction in colour compound documents | |
US8548253B2 (en) | Fast line linking | |
EP2669847B1 (en) | Document processing apparatus, document processing method and scanner | |
US20030198386A1 (en) | System and method for identifying and extracting character strings from captured image data | |
US9965695B1 (en) | Document image binarization method based on content type separation | |
EP0740263A2 (en) | Method of training character templates for use in a recognition system | |
RU2581786C1 (en) | Determination of image transformations to increase quality of optical character recognition | |
Shafait et al. | Pixel-accurate representation and evaluation of page segmentation in document images | |
CN113158977B (en) | Image character editing method for improving FANnet generation network | |
JP5539488B2 (en) | Judgment of transparent fill based on reference background color | |
US10217020B1 (en) | Method and system for identifying multiple strings in an image based upon positions of model strings relative to one another | |
US11055526B2 (en) | Method, system and apparatus for processing a page of a document | |
US8401298B2 (en) | Storage medium storing character recognition program, character recognition method, and character recognition apparatus | |
AU2017225022A1 (en) | Method, system and apparatus for processing image data | |
Verma et al. | Removal of obstacles in Devanagari script for efficient optical character recognition | |
RU2597163C2 (en) | Comparing documents using reliable source | |
KR102102394B1 (en) | Method and apparatus for preprocessing image for recognition of character | |
CN113033559A (en) | Text detection method and device based on target detection and storage medium | |
US11615636B2 (en) | Apparatus and method for document recognition | |
AU2014277851A1 (en) | Detecting a gap between text columns from text line fragments | |
Hakro et al. | Interactive thinning for segmentation-based and segmentation-free Sindhi OCR | |
JP2014235694A (en) | Document processing device, document processing method, and document processing program | |
AU2012268796A1 (en) | Directional stroke width variation feature for script recognition | |
Ho et al. | Detecting recurring deformable objects: an approximate graph matching method for detecting characters in comics books |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MK4 | Application lapsed section 142(2)(d) - no continuation fee paid for the application |