CN110796137A

CN110796137A - Method and device for identifying image

Info

Publication number: CN110796137A
Application number: CN201910958274.4A
Authority: CN
Inventors: 易显维
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-02-14

Abstract

The invention discloses a method and a device for identifying an image, and relates to the technical field of computers. One embodiment of the method comprises: respectively marking a formula label and a character label on each image sample in the training data set; respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network; and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network. The implementation method can solve the technical problem that a user needs to select the formula area in a frame mode.

Description

Method and device for identifying image

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a method and an apparatus for recognizing an image.

Background

In the prior art, when performing text recognition on an image, a user needs to select a position of a formula in a frame of the image, as shown in fig. 1, and then recognize text information of a formula area and a character area.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

each time a text in an image is recognized, a user is required to select a formula area in a frame first, which increases user operations and results in low recognition efficiency.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for recognizing an image, so as to solve the technical problem that a user needs to select a formula area.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of recognizing an image, including:

respectively marking a formula label and a character label on each image sample in the training data set;

respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network;

and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.

Optionally, the formula tag includes position information of formula text in the image sample, and the word tag includes position information of word text in the image sample.

Optionally, training a convolutional neural network by respectively using the image sample marked with the formula label and the image sample marked with the text label to obtain a first convolutional neural network and a second convolutional neural network, including:

training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network;

and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network.

Optionally, the positioning a formula area and a text area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network respectively includes:

and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.

Optionally, after the formula area and the text area in the image to be recognized are respectively located, the method further includes:

and respectively identifying text information from a formula area and a character area of the image to be identified based on optical character identification.

In addition, according to another aspect of an embodiment of the present invention, there is provided an apparatus for recognizing an image, including:

the marking module is used for marking a formula label and a character label on each image sample in the training data set respectively;

the training module is used for training the convolutional neural network by respectively adopting the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network;

and the positioning module is used for respectively positioning a formula area and a character area in the image to be identified based on the first convolutional neural network and the second convolutional neural network.

Optionally, the training module is further configured to:

Optionally, the identification module is further configured to:

after a formula area and a character area in an image to be recognized are respectively positioned, text information is recognized from the formula area and the character area of the image to be recognized respectively based on optical character recognition.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.

According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above embodiments.

One embodiment of the above invention has the following advantages or benefits: the technical means that the formula area and the character area in the image to be recognized are respectively positioned based on the first convolutional neural network and the second convolutional neural network is achieved by adopting the convolutional neural network trained by the image sample marked with the formula label and the image sample marked with the character label, so that the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a diagram illustrating the framing of formula areas in an image according to the prior art;

FIG. 2 is a schematic main flow diagram of a method of identifying an image according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an image sample according to an embodiment of the invention;

FIG. 4 is a schematic illustration of labeling a label on an image sample according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of saving a tag into an xml format file according to an embodiment of the present invention;

FIG. 6 is a main flow chart diagram of a method of recognizing an image according to one referential embodiment of the present invention;

FIG. 7 is a schematic diagram of the main blocks of an apparatus for recognizing an image according to an embodiment of the present invention;

FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 2 is a schematic diagram of a main flow of a method of recognizing an image according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 2, the method of recognizing an image may include:

step 201, respectively labeling a formula label and a character label for each image sample in the training data set.

In this step, a certain number (for example, 1 ten thousand, 2 ten thousand, or 3 ten thousand, etc.) of image samples containing characters and/or formulas are first prepared as a training data set, as shown in fig. 3. Then, for each image sample, a formula label and a text label are marked on the image sample respectively. Optionally, the formula tag includes position information of formula text in the image sample, and the word tag includes position information of word text in the image sample.

For example, taking the example of marking the formula label on the image sample shown in fig. 3, labelimg software may be used for marking, and a formula area is framed on the image sample, as shown in fig. 4, so as to obtain the formula label (i.e., the position information of the formula area framed and selected in the image sample), and the formula label is stored in the xml format file, as shown in fig. 5. Similarly, the text area is continuously selected on the image sample, so as to obtain the text label (that is, the position information of the text area selected by the frame in the image sample), and the text label is stored in the xml format file.

Step 202, training a convolutional neural network by respectively adopting the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network.

In this step, the convolutional neural network is trained by using the training data set in step 201, where the convolutional neural network may be CPTN (connectionist Text forward network), and the CPTN model greatly simplifies the detection process, and also improves the effect, speed, and robustness of Text detection. The CTPN model mainly comprises three parts, namely a convolution layer, a Bi-LSTM layer and a full connection layer.

Optionally, step 202 comprises: training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network; and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network. In the embodiment of the present invention, the image samples marked in step 201 are used to train two CPTN models respectively, and the two CPTN models are used to identify a formula area and a text area in the image respectively. It should be noted that the two CPTN models may be trained simultaneously or may not be trained simultaneously, which is not limited in the embodiment of the present invention.

And 203, respectively positioning a formula area and a character area in the image to be identified based on the first convolutional neural network and the second convolutional neural network.

Since two convolutional neural networks are obtained by training in step 202, one of the convolutional neural networks can be used to identify the formula region, and the other convolutional neural network can be used to identify the text region. Optionally, step 203 comprises: and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.

According to the various embodiments, it can be seen that the technical means for respectively locating the formula area and the character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network is achieved by respectively training the convolutional neural network by using the image sample marked with the formula label and the image sample marked with the character label, and the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.

Fig. 6 is a schematic diagram of a main flow of a method of recognizing an image according to one referential embodiment of the present invention.

Step 601, respectively marking a formula label and a character label for each image sample in the training data set.

In this step, a certain number of image samples containing text and/or formulas are first prepared as a training data set. Then, for each image sample, a formula label and a text label are marked on the image sample respectively. Optionally, the formula label is position information of a formula text in the image sample, and the word label is position information of a word text in the image sample.

Step 602, training a convolutional neural network by respectively using the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network.

Specifically, an image sample is used as input, a formula label is used as output, and a convolutional neural network is trained to obtain a first convolutional neural network; and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network. Optionally, the convolutional neural network may be a CPTN model, so that two CPTN models can be obtained through training, wherein one CPTN model is used for identifying a formula area in an image, and the other CPTN model is used for identifying a text area in the image.

Step 603, inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, respectively, and outputting the position information of the formula area and the position information of the character area, thereby respectively positioning the formula area and the character area in the image to be recognized.

Step 604, respectively identifying text information from the formula area and the Character area of the image to be identified based on Optical Character Recognition (OCR).

The embodiment of the invention respectively identifies the text information from the formula area and the character area through the OCR and directly outputs the characters of the text and the formula.

Therefore, the method for identifying the image provided by the embodiment of the invention can directly input the image to be identified, does not need a user to select a formula area, automatically identifies the formula area and the character area in the image, respectively outputs the characters of the text and the formula, and obviously improves the image identification efficiency.

In addition, in a reference embodiment of the present invention, the detailed implementation of the method for recognizing an image is described in detail above, so that the repeated description is not repeated here.

Fig. 7 is a schematic diagram of main blocks of an apparatus for recognizing an image 700 according to an embodiment of the present invention, as shown in fig. 7, including a labeling module 701, a training module 702, and a positioning module 703. The marking module 701 is used for marking a formula label and a character label on each image sample in the training data set respectively; the training module 702 is configured to train a convolutional neural network by using the image sample labeled with the formula label and the image sample labeled with the text label, respectively, to obtain a first convolutional neural network and a second convolutional neural network; the positioning module 703 is configured to separately position a formula area and a text area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.

Optionally, the training module 702 is further configured to:

Optionally, the identifying module 703 is further configured to:

It should be noted that, in the implementation of the apparatus for recognizing an image according to the present invention, the above-mentioned method for recognizing an image has been described in detail, and therefore, the repeated description is omitted here.

Fig. 8 illustrates an exemplary system architecture 800 of a method of recognizing an image or an apparatus for recognizing an image to which an embodiment of the present invention may be applied.

As shown in fig. 8, the system architecture 800 may include

terminal devices

801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the

terminal devices

801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. The

terminal devices

801, 802, 803 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 805 may be a server that provides various services, such as a back-office management server (for example only) that supports shopping-like websites browsed by users using the

terminal devices

801, 802, 803. The background management server may analyze and otherwise process the received data such as the item information query request, and feed back a processing result (for example, target push information, item information — just an example) to the terminal device.

It should be noted that the method for recognizing an image provided by the embodiment of the present invention is generally performed by the server 805, and accordingly, the apparatus for recognizing an image is generally disposed in the server 805. The method for recognizing the image provided by the embodiment of the present invention may also be executed by the

terminal devices

801, 802, and 803, and accordingly, the apparatus for recognizing the image may be disposed in the

terminal devices

801, 802, and 803.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a labeling module, a training module, and a positioning module, where the names of the modules do not in some cases constitute a limitation on the module itself.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: respectively marking a formula label and a character label on each image sample in the training data set; respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network; and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.

According to the technical scheme of the embodiment of the invention, the convolution neural network is trained by adopting the image sample marked with the formula label and the image sample marked with the character label respectively to obtain the first convolution neural network and the second convolution neural network, so that the technical means of respectively positioning the formula area and the character area in the image to be recognized based on the first convolution neural network and the second convolution neural network is adopted, and the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of recognizing an image, comprising:

2. The method of claim 1, wherein the formula label comprises position information of formula text in the image sample, and wherein the text label comprises position information of text in the image sample.

3. The method of claim 2, wherein training the convolutional neural network with formula-labeled image samples and text-labeled image samples to obtain a first convolutional neural network and a second convolutional neural network respectively comprises:

4. The method of claim 2, wherein locating a formula area and a text area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network respectively comprises:

5. The method of claim 1, wherein after locating the formula area and the text area in the image to be recognized, respectively, further comprising:

6. An apparatus for recognizing an image, comprising:

7. The apparatus of claim 6, wherein the formula label comprises position information of formula text in the image sample, and wherein the text label comprises position information of text in the image sample.

8. The apparatus of claim 7, wherein the training module is further configured to:

9. The apparatus of claim 7, wherein the identification module is further configured to:

10. The apparatus of claim 6, wherein the identification module is further configured to:

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.