CN113537201A

CN113537201A - Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium

Info

Publication number: CN113537201A
Application number: CN202111084304.7A
Authority: CN
Inventors: 马百泉
Original assignee: Jiangxi Vaneducation Technology Inc
Current assignee: Jiangxi Vaneducation Technology Inc
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-10-22

Abstract

The invention provides a multi-dimensional hybrid OCR recognition method, a device, equipment and a storage medium, wherein the method comprises the following steps: setting the image to a preset size; distinguishing a formula, a diagram and/or a character area of the image according to a preset neural network model, and respectively obtaining position coordinates of the formula, the diagram and/or the character area; respectively calling different OCR models for recognition according to the formula, the chart and/or the character area to obtain a recognition result; and outputting the identification information of the image according to the position coordinate and the identification result. By the scheme, direct OCR recognition of the formula, the chart and the character mixed image is realized, the recognition accuracy is improved, the robustness is high, and the problem that the simple, convenient and accurate recognition of the formula, the chart and the character mixed image is difficult to perform in the prior art is solved.

Description

Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of image recognition, in particular to a multi-dimensional hybrid OCR recognition method, device, equipment and storage medium.

Background

The OCR (Optical Character Recognition) technology is a technology of converting a print picture obtained by Optical scanning, camera shooting, etc. into Character information that can be processed by a computer using a Character Recognition technology. In recent years, OCR technology has been rapidly developed, and recognition of characters has achieved high recognition accuracy. However, the image of the formula, the graph and the character mixture can not be directly subjected to OCR recognition, the formula, the character and the graph area are firstly divided and then OCR recognition is carried out respectively, and the conventional method for manually dividing the formula, the character and the graph is time-consuming and labor-consuming; the traditional method for directly identifying the image mixed with the formula, the characters and the diagram has low accuracy, low robustness and high requirements on image quality.

Disclosure of Invention

In view of this, embodiments of the present invention provide a multi-dimensional hybrid OCR recognition method, apparatus, device and storage medium, so as to implement OCR recognition on an image in which a formula, a chart or a text is mixed.

In a first aspect, an embodiment of the present invention provides a multi-dimensional hybrid OCR recognition method, including:

setting a target image as an image to be identified with a preset size;

distinguishing a formula, a chart and/or a character area of an image to be recognized according to a preset neural network model, and respectively obtaining position coordinates of the formula, the chart and/or the character area;

respectively calling different OCR models for recognition according to a formula, a diagram and/or a character area to obtain a recognition result;

and outputting the identification information of the image to be identified according to the position coordinates and the identification result.

Preferably, distinguishing a formula, a diagram and/or a character area of the image to be recognized according to a preset neural network model, and respectively obtaining position coordinates of the formula, the diagram and/or the character area, includes:

inputting an image to be recognized into a preset neural network model to obtain a first feature vector of the image to be recognized;

inputting the first feature vector into a regional candidate network to obtain position coordinates of one or more candidate frames;

and extracting second feature vectors corresponding to the one or more candidate frames, and inputting the second feature vectors into a category identification network to obtain categories of the one or more candidate frames, wherein the categories comprise formulas, graphs or characters.

Further, distinguishing a formula, a diagram and/or a character area of the image to be recognized according to a preset neural network model, and respectively obtaining position coordinates of the formula, the diagram and/or the character area, further comprising:

the position coordinates are optimally adjusted to obtain accurate position coordinates of the formula, the chart and/or the text area.

Preferably, different OCR models are respectively invoked for recognition according to the formula, the diagram and/or the text area to obtain recognition results, including:

calling a first OCR model recognition formula area to obtain a formula recognition result;

calling a second OCR model to identify a chart area so as to obtain a chart identification result;

calling a third OCR model to identify a character area so as to obtain a character identification result;

specifically, as preferred: the third OCR model for recognizing the character area can adopt a differential binarization network and an end-to-end scene character recognition network architecture; the first OCR model for formula region recognition may employ a convolutional neural network plus attention mechanism plus sequence encoder plus sequence decoder architecture; the second OCR model performing chart region recognition may first invoke a deep neural network analysis on the chart region to deconstruct the chart structure and then invoke a third OCR model to recognize the text therein.

In a second aspect, an embodiment of the present invention provides a multi-dimensional hybrid OCR recognition apparatus, including:

the first processing module is used for setting the target image as an image to be identified with a preset size;

the second processing module is used for distinguishing a formula, a chart and/or a character area of the image to be recognized according to the preset neural network model and respectively acquiring position coordinates of the formula, the chart and/or the character area;

the third processing module is used for respectively calling different OCR models for recognition according to formulas, graphs and/or character areas so as to obtain recognition results;

and the information output module is used for outputting the identification information of the image to be identified according to the position coordinate and the identification result.

Preferably, the second processing module comprises:

the fourth processing module is used for inputting the image to be recognized to the preset neural network model so as to obtain a first feature vector of the image to be recognized;

the fifth processing module is used for inputting the first feature vector into the regional candidate network so as to obtain the position coordinates of one or more candidate frames;

and the sixth processing module extracts second feature vectors corresponding to the one or more candidate frames and inputs the second feature vectors to the category identification network to obtain categories of the one or more candidate frames, wherein the categories comprise formulas, graphs or characters.

Further, the second processing module further comprises:

and the seventh processing module is used for optimizing and adjusting the position coordinates to acquire accurate position coordinates of the formula, the chart and/or the character area.

Preferably, the third processing module comprises:

the first recognition module is used for calling a first OCR model recognition formula area to obtain a formula recognition result;

the second recognition module is used for calling a second OCR model to recognize the chart area so as to obtain a chart recognition result;

the third recognition module is used for calling a third OCR model to recognize the character area so as to obtain a character recognition result;

In a third aspect, an embodiment of the present invention provides an electronic device, including:

one or more processors;

a storage device for storing one or more programs,

when executed by one or more processors, cause the one or more processors to implement a multi-dimensional hybrid OCR recognition method according to any embodiment of the invention.

In a fourth aspect, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the multi-dimensional hybrid OCR recognition method according to any embodiment of the present invention.

According to the scheme, the formula, the chart and/or the character area are respectively obtained by utilizing the preset neural network model, and different OCR models are respectively called for recognition according to the formula, the chart and/or the character area, so that the direct OCR recognition of the formula, the chart and the character mixed image is realized, the recognition accuracy is improved, the robustness is high, and the problem that the simple, convenient and accurate recognition of the formula, the chart and the character mixed image is difficult to perform in the prior art is solved.

Drawings

FIG. 1 is a flow chart of a multi-dimensional hybrid OCR recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a multi-dimensional hybrid OCR recognition method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a multi-dimensional hybrid OCR recognition apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.

Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first processing module may be referred to as a second processing module, and similarly, a second processing module may be referred to as a first processing module, without departing from the scope of the present application. The first processing module and the second processing module are both processing modules, but are not the same processing module. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Example one

Fig. 1 is a flowchart of a multi-dimensional hybrid OCR recognition method according to an embodiment of the present invention, which is applicable to a multi-dimensional hybrid OCR recognition scenario, and the method can be executed by a multi-dimensional hybrid OCR recognition apparatus, and the apparatus can be implemented in a software and/or hardware manner, and can be integrated on a device.

As shown in fig. 1, a multi-dimensional hybrid OCR recognition method provided by an embodiment of the present invention includes:

and S110, setting the target image as an image to be recognized with a preset size.

And S120, distinguishing the formula, the chart and/or the character area of the image to be recognized according to a preset neural network model, and respectively obtaining the position coordinates of the formula, the chart and/or the character area.

And S130, respectively calling different OCR models for recognition according to the formula, the chart and/or the character area to obtain recognition results.

And S140, outputting the identification information of the image to be identified according to the position coordinate and the identification result.

In the embodiment of the present invention, the image recognition for mixing formulas, graphs, and characters first sets the target image as the image to be recognized with a preset size, and preferably, the target image may be uniformly scaled to a fixed size, and the image size may be normalized, for example, the target image is normalized to 800 × 600 pixels, so that the model can process the input of data with different sizes, and facilitate the overall recognition operation of the subsequent image. Preferably, before the target image is set to a preset size, the target image may be obtained by scanning, taking a picture or screenshot, and the target image may include a formula, a picture and/or a text area. Because the formula and the chart can not be accurately recognized by the traditional OCR character recognition, different OCR models are required to be adopted for processing aiming at different types of areas, therefore, the division of the formula, diagram and/or text area of the image to be recognized needs to be performed first, in the embodiment of the invention, the adopted preset neural network model is Faster R-CNN (one of convolutional neural network types CNN), the image to be identified is input into the preset neural network model Faster R-CNN, namely, the formula, the chart and/or the character area of the image to be recognized can be distinguished according to the model category of the formula, the chart and/or the character, meanwhile, the position coordinates corresponding to each region are respectively obtained, so that the subsequent recognition result is conveniently output and positioned, manual region segmentation operation is not needed, and time and labor are saved. In the embodiment of the invention, different OCR models are respectively called for recognition according to the formula, the chart and/or the character area, so that different OCR model recognition can be accurately carried out on different types of areas, and the overall recognition result is obtained; and finally, outputting the identification information of the image to be identified according to the position coordinate and the identification result.

According to the technical scheme of the embodiment of the invention, the preset neural network model (Faster R-CNN) is utilized to respectively obtain the formula, the chart and/or the character area, different OCR models are respectively called for recognition according to the formula, the chart and/or the character area, the direct OCR recognition of the mixed image of the formula, the chart or the character and the like is realized, the manual area division is not needed, the time and the labor are saved, the recognition accuracy is improved, the robustness is high, and the problem that the simple, convenient and accurate recognition of the mixed image of the formula, the chart and the character is difficult to carry out in the prior art is solved.

Example two

Fig. 2 is a flowchart of a multi-dimensional hybrid OCR recognition method according to a second embodiment of the present invention, which is applicable to a scenario of multi-dimensional hybrid OCR recognition, and the method is a further refinement of the first embodiment, and the method can be executed by a multi-dimensional hybrid OCR recognition apparatus, and the apparatus can be implemented in a software and/or hardware manner, and can be integrated on a device.

As shown in fig. 2, a multi-dimensional hybrid OCR recognition method provided by the second embodiment of the present invention includes:

s210, setting a target image as an image to be identified with a preset size;

s220, inputting the image to be recognized into the preset neural network model to obtain a first feature vector of the image to be recognized;

s230, inputting the first feature vector into a regional candidate network to obtain position coordinates of one or more candidate frames;

s240, extracting second feature vectors corresponding to the one or more candidate frames, and inputting the second feature vectors to a category identification network to obtain categories of the one or more candidate frames, wherein the categories comprise formulas, charts or characters.

And S250, optimizing and adjusting the position coordinates to obtain accurate position coordinates of the formula, the chart and/or the character area.

S260, calling a first OCR model to identify the formula area so as to obtain a formula identification result;

s270, calling a second OCR model to recognize the chart area so as to obtain a chart recognition result;

and S280, calling a third OCR model to recognize the character area so as to obtain a character recognition result.

And S290, outputting the identification information of the image to be identified according to the position coordinate and the identification result.

In the embodiment of the present invention, the image recognition for mixing formulas, graphs, and characters first sets the target image as the image to be recognized with a preset size, and preferably, the target image may be uniformly scaled to a fixed size, and the image size may be normalized, for example, the target image is normalized to 800 × 600 pixels, so that the model can process the input of data with different sizes, and facilitate the overall recognition operation of the subsequent image. Preferably, before the target image is set to a preset size, the target image may be obtained by scanning, taking a picture or screenshot, and the target image may include a formula, a picture and/or a text area. Because the formula and the chart cannot be accurately recognized by the traditional OCR character recognition, different OCR models are required to be adopted for processing aiming at different types of areas, and therefore, the formula, the chart and/or the character area of the image to be recognized need to be divided firstly. In the embodiment of the invention, the formula, the chart and/or the character area of the image to be recognized are divided, firstly, the image is input to the preset neural network model Faster R-CNN to obtain a first feature vector (v1, v2, …, vn) of the image; then, inputting the first feature vector (v1, v2, …, vn) into a Region candidate Network (Region probable Network) to obtain the position coordinates of one or more candidate boxes, wherein obtaining the position coordinates can facilitate the output positioning of the subsequent recognition result; and then, continuously extracting second feature vectors corresponding to the one or more candidate frames, and inputting the second feature vectors into a category identification network (ROI Pooling) to obtain categories of the one or more candidate frames, wherein the categories comprise formulas, charts or characters, so that the automatic division of the formulas, charts and/or character areas of the image to be identified is realized, the manual area segmentation operation is not needed, and the time and the labor are saved.

Preferably, the embodiment of the present invention further performs optimization adjustment on the position coordinates to obtain accurate position coordinates of the formula, the graph and/or the text area. The accuracy of distinguishing different types of regions is improved. In the embodiment of the invention, the optimized adjustment operation of the position coordinates can be carried out by the category identification network. Firstly, a fixed anchor point is set for each candidate frame, and the category identification network operates a boundary frame regression algorithm on the basis of the anchor points to obtain optimized position coordinates.

Further, as a preferred implementation manner, different OCR models are adopted for processing different types of areas, and first an OCR model is called to identify a formula area according to the precise position coordinates of the formula area to obtain a formula identification result; calling a second OCR model to identify the chart area according to the accurate position coordinates of the chart area so as to obtain a chart identification result; and calling a third OCR model to identify the character area according to the accurate position coordinates of the character area so as to obtain a character identification result. Specifically, there are a plurality of character recognition mainstream models, and a third OCR model for recognizing a character region may adopt a differential binarization network plus an end-to-end scene character recognition network architecture; the first OCR model for formula region recognition may employ a convolutional neural network plus attention mechanism plus sequence encoder plus sequence decoder architecture; the second OCR model performing chart region recognition may first invoke a deep neural network analysis on the chart region to deconstruct the chart structure and then invoke a third OCR model to recognize the text therein.

And finally, outputting the identification information of the image according to the position coordinate and the identification result. In the embodiment of the invention, the corresponding positions of the recognition results are integrated according to the position coordinates so as to realize the original layout output of the recognition information according to the original target image

According to the technical scheme of the embodiment of the invention, the preset neural network model Faster R-CNN is utilized to respectively obtain the formula, the chart and/or the character area, different OCR models are respectively called for recognition according to the formula, the chart and/or the character area, so that the direct OCR recognition of the formula, the chart and the character mixed image is realized, the manual area division is not needed, the time and the labor are saved, the recognition accuracy is improved, the robustness is strong, the original layout output of the target image is realized, the user experience is enhanced, and the problem that the simple, convenient and accurate recognition of the formula, the chart and the character mixed image is difficult to perform in the prior art is solved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a multi-dimensional hybrid OCR recognition apparatus according to a third embodiment of the present invention, where the present embodiment is applicable to a scene of multi-dimensional hybrid OCR recognition, and the apparatus may be implemented in a software and/or hardware manner and may be integrated on a device.

As shown in fig. 3, the multi-dimensional hybrid OCR recognition apparatus provided in this embodiment may include: a first processing module 10, a second processing module 20, a third processing module 30 and an information output module 40.

The first processing module 10 is configured to set a target image as an image to be recognized with a preset size;

the second processing module 20 is configured to distinguish a formula, a diagram and/or a text region of the image to be recognized according to a preset neural network model, and obtain position coordinates of the formula, the diagram and/or the text region respectively;

the third processing module 30 is configured to respectively call different OCR models to perform recognition according to the formula, the chart and/or the text area, so as to obtain recognition results;

the information output module 40 is configured to output the identification information of the image to be identified according to the position coordinate and the identification result.

Preferably, the second processing module 20 comprises: the device comprises a fourth processing module, a fifth processing module and a sixth processing module.

the fifth processing module is used for inputting the first feature vector into a regional candidate network to obtain the position coordinates of one or more candidate frames;

the sixth processing module extracts second feature vectors corresponding to the one or more candidate frames, and inputs the second feature vectors to a category identification network to obtain categories of the one or more candidate frames, where the categories include formulas, charts, or characters.

Further, the second processing module 20 further includes: and the seventh processing module is used for optimizing and adjusting the position coordinates to acquire accurate position coordinates of the formula, the chart and/or the character area.

Preferably, the third processing module 30 comprises: the device comprises a first identification module, a second identification module and a third identification module.

The first recognition module is used for calling a first OCR model to recognize the formula area so as to obtain a formula recognition result;

and the third recognition module is used for calling a third OCR model to recognize the character area so as to obtain a character recognition result.

The multi-dimensional hybrid OCR recognition device provided by the embodiment of the invention can execute the multi-dimensional hybrid OCR recognition method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the invention not specifically described in this embodiment.

Example four

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary electronic device 412 suitable for use in implementing embodiments of the present invention. The electronic device 412 shown in fig. 4 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention.

As shown in fig. 4, the electronic device 412 is in the form of a general purpose device. The components of the electronic device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.

Bus 418 represents one or more of any of several types of bus structures, including a memory device bus or memory device controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 412 and includes both volatile and nonvolatile media, removable and non-removable media.

Storage 428 may include computer system readable media in the form of volatile Memory, such as RAM430 (Random Access Memory) and/or cache 432. The electronic device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk such as a Compact disk Read-Only Memory (CD-ROM), Digital Video disk Read-Only Memory (DVD-ROM) or other optical media may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Storage 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 440 having a set (at least one) of program modules 442 may be stored, for instance, in storage 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 442 generally perform the functions and/or methodologies of the described embodiments of the invention.

The electronic device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing terminal, display 424, etc.), with one or more terminals that enable a user to interact with the electronic device 412, and/or with any terminals (e.g., network card, modem, etc.) that enable the electronic device 412 to communicate with one or more other computing terminals. Such communication may be through I/O interfaces 422 (input/output). Also, the electronic device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 420. As shown in FIG. 4, network adapter 420 communicates with the other modules of electronic device 412 over bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 412, including but not limited to: microcode, end drives, Redundant processors, external disk drive Arrays, RAID (Redundant Arrays of Independent Disks) systems, tape drives, and data backup storage systems, among others.

The processor 416 executes programs stored in the storage device 428 to perform various functional applications and data processing, such as implementing a multi-dimensional hybrid OCR recognition method provided by any embodiment of the present invention, which may include:

setting the image to a preset size;

distinguishing formulas, diagrams and/or character areas of the image according to a preset neural network model (such as Faster R-CNN), and respectively acquiring position coordinates of the formulas, the diagrams and/or the character areas;

respectively calling different OCR models for recognition according to the formula, the chart and/or the character area to obtain a recognition result;

and outputting the identification information of the image according to the position coordinate and the identification result.

Through the mode, the embodiment of the invention realizes the direct OCR recognition of the formula, the chart and the character mixed image, improves the recognition accuracy, has stronger robustness and solves the problem that the simple, convenient and accurate recognition of the formula, the chart and the character mixed image is difficult to carry out in the prior art.

EXAMPLE five

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a multi-dimensional hybrid OCR recognition method according to any embodiment of the present invention, where the method may include:

setting the image to a preset size;

The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C plus, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or terminal. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A multi-dimensional hybrid OCR recognition method, comprising:

setting a target image as an image to be identified with a preset size;

distinguishing a formula, a diagram and/or a character area of the image to be recognized according to a preset neural network model, and respectively obtaining position coordinates of the formula, the diagram and/or the character area, wherein the position coordinates comprise:

inputting the image to be recognized into the preset neural network model to obtain a first feature vector of the image to be recognized; inputting the first feature vector into a regional candidate network to obtain position coordinates of one or more candidate frames; extracting second feature vectors corresponding to the one or more candidate frames, and inputting the second feature vectors into a category identification network to obtain categories of the one or more candidate frames, wherein the categories comprise formulas, graphs or characters;

and outputting the identification information of the image to be identified according to the position coordinate and the identification result.

2. A multi-dimensional hybrid OCR recognition method according to claim 1, wherein the distinguishing of the formula, the graph and/or the text area of the image to be recognized according to a preset neural network model and the obtaining of the position coordinates of the formula, the graph and/or the text area respectively further comprises:

and optimizing and adjusting the position coordinates to obtain accurate position coordinates of the formula, the chart and/or the text area.

3. A multi-dimensional hybrid OCR recognition method according to claim 1, wherein said respectively invoking different OCR models according to said formula, chart and/or text area to perform recognition to obtain recognition results comprises:

calling a first OCR model to identify the formula area so as to obtain a formula identification result, wherein the first OCR model adopts a convolutional neural network plus an attention mechanism plus a sequence encoder plus a sequence decoder framework;

calling a second OCR model to identify the chart area so as to obtain a chart identification result, wherein the second OCR model firstly calls a deep neural network to analyze and deconstruct the chart structure of the chart area, and then calls a third OCR model to identify characters in the chart area;

and calling a third OCR model to identify the character area so as to obtain a character identification result, wherein the third OCR model adopts a differential binarization network and an end-to-end scene character identification network architecture.

4. A multi-dimensional hybrid OCR recognition apparatus, comprising:

the first processing module is used for setting a target image as an image to be identified with a preset size;

the second processing module is used for distinguishing a formula, a chart and/or a character area of the image to be recognized according to a preset neural network model, and respectively acquiring position coordinates of the formula, the chart and/or the character area, and the second processing module comprises:

the fourth processing module is used for inputting the image to be recognized to the preset neural network model so as to obtain a first feature vector of the image to be recognized; a fifth processing module, configured to input the first feature vector to a regional candidate network to obtain position coordinates of one or more candidate frames; a sixth processing module, configured to extract a second feature vector corresponding to the one or more candidate frames, and input the second feature vector to a category identification network to obtain categories of the one or more candidate frames, where the categories include formulas, diagrams, or characters;

the third processing module is used for respectively calling different OCR models for recognition according to the formula, the chart and/or the character area so as to obtain a recognition result;

5. A multi-dimensional hybrid OCR recognition apparatus as defined in claim 4, wherein the second processing module further comprises:

6. A multi-dimensional hybrid OCR recognition apparatus as defined in claim 4, wherein the third processing module comprises:

the first recognition module is used for calling a first OCR (optical character recognition) model to recognize the formula area so as to obtain a formula recognition result, wherein the first OCR model adopts a convolutional neural network plus an attention mechanism plus a sequence encoder plus a sequence decoder architecture;

the second recognition module is used for calling a second OCR (optical character recognition) model to recognize the chart area so as to obtain a chart recognition result, wherein the second OCR model firstly calls a deep neural network to analyze and deconstruct the chart structure of the chart area and then calls a third OCR model to recognize characters in the chart area;

and the third recognition module is used for calling a third OCR model to recognize the character area so as to obtain a character recognition result, wherein the third OCR model adopts a differential binarization network and an end-to-end scene character recognition network architecture.

7. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the multi-dimensional hybrid OCR recognition method of any of claims 1-3.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a multi-dimensional hybrid OCR recognition method according to any one of claims 1-3.