CN110796137A - Method and device for identifying image - Google Patents

Method and device for identifying image Download PDF

Info

Publication number
CN110796137A
CN110796137A CN201910958274.4A CN201910958274A CN110796137A CN 110796137 A CN110796137 A CN 110796137A CN 201910958274 A CN201910958274 A CN 201910958274A CN 110796137 A CN110796137 A CN 110796137A
Authority
CN
China
Prior art keywords
neural network
convolutional neural
image
formula
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910958274.4A
Other languages
Chinese (zh)
Inventor
易显维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN201910958274.4A priority Critical patent/CN110796137A/en
Publication of CN110796137A publication Critical patent/CN110796137A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for identifying an image, and relates to the technical field of computers. One embodiment of the method comprises: respectively marking a formula label and a character label on each image sample in the training data set; respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network; and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network. The implementation method can solve the technical problem that a user needs to select the formula area in a frame mode.

Description

Method and device for identifying image
Technical Field
The present invention relates to the field of machine learning technologies, and in particular, to a method and an apparatus for recognizing an image.
Background
In the prior art, when performing text recognition on an image, a user needs to select a position of a formula in a frame of the image, as shown in fig. 1, and then recognize text information of a formula area and a character area.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
each time a text in an image is recognized, a user is required to select a formula area in a frame first, which increases user operations and results in low recognition efficiency.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for recognizing an image, so as to solve the technical problem that a user needs to select a formula area.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of recognizing an image, including:
respectively marking a formula label and a character label on each image sample in the training data set;
respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network;
and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.
Optionally, the formula tag includes position information of formula text in the image sample, and the word tag includes position information of word text in the image sample.
Optionally, training a convolutional neural network by respectively using the image sample marked with the formula label and the image sample marked with the text label to obtain a first convolutional neural network and a second convolutional neural network, including:
training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network;
and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network.
Optionally, the positioning a formula area and a text area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network respectively includes:
and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.
Optionally, after the formula area and the text area in the image to be recognized are respectively located, the method further includes:
and respectively identifying text information from a formula area and a character area of the image to be identified based on optical character identification.
In addition, according to another aspect of an embodiment of the present invention, there is provided an apparatus for recognizing an image, including:
the marking module is used for marking a formula label and a character label on each image sample in the training data set respectively;
the training module is used for training the convolutional neural network by respectively adopting the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network;
and the positioning module is used for respectively positioning a formula area and a character area in the image to be identified based on the first convolutional neural network and the second convolutional neural network.
Optionally, the formula tag includes position information of formula text in the image sample, and the word tag includes position information of word text in the image sample.
Optionally, the training module is further configured to:
training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network;
and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network.
Optionally, the identification module is further configured to:
and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.
Optionally, the identification module is further configured to:
after a formula area and a character area in an image to be recognized are respectively positioned, text information is recognized from the formula area and the character area of the image to be recognized respectively based on optical character recognition.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: the technical means that the formula area and the character area in the image to be recognized are respectively positioned based on the first convolutional neural network and the second convolutional neural network is achieved by adopting the convolutional neural network trained by the image sample marked with the formula label and the image sample marked with the character label, so that the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a diagram illustrating the framing of formula areas in an image according to the prior art;
FIG. 2 is a schematic main flow diagram of a method of identifying an image according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an image sample according to an embodiment of the invention;
FIG. 4 is a schematic illustration of labeling a label on an image sample according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of saving a tag into an xml format file according to an embodiment of the present invention;
FIG. 6 is a main flow chart diagram of a method of recognizing an image according to one referential embodiment of the present invention;
FIG. 7 is a schematic diagram of the main blocks of an apparatus for recognizing an image according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 2 is a schematic diagram of a main flow of a method of recognizing an image according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 2, the method of recognizing an image may include:
step 201, respectively labeling a formula label and a character label for each image sample in the training data set.
In this step, a certain number (for example, 1 ten thousand, 2 ten thousand, or 3 ten thousand, etc.) of image samples containing characters and/or formulas are first prepared as a training data set, as shown in fig. 3. Then, for each image sample, a formula label and a text label are marked on the image sample respectively. Optionally, the formula tag includes position information of formula text in the image sample, and the word tag includes position information of word text in the image sample.
For example, taking the example of marking the formula label on the image sample shown in fig. 3, labelimg software may be used for marking, and a formula area is framed on the image sample, as shown in fig. 4, so as to obtain the formula label (i.e., the position information of the formula area framed and selected in the image sample), and the formula label is stored in the xml format file, as shown in fig. 5. Similarly, the text area is continuously selected on the image sample, so as to obtain the text label (that is, the position information of the text area selected by the frame in the image sample), and the text label is stored in the xml format file.
Step 202, training a convolutional neural network by respectively adopting the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network.
In this step, the convolutional neural network is trained by using the training data set in step 201, where the convolutional neural network may be CPTN (connectionist Text forward network), and the CPTN model greatly simplifies the detection process, and also improves the effect, speed, and robustness of Text detection. The CTPN model mainly comprises three parts, namely a convolution layer, a Bi-LSTM layer and a full connection layer.
Optionally, step 202 comprises: training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network; and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network. In the embodiment of the present invention, the image samples marked in step 201 are used to train two CPTN models respectively, and the two CPTN models are used to identify a formula area and a text area in the image respectively. It should be noted that the two CPTN models may be trained simultaneously or may not be trained simultaneously, which is not limited in the embodiment of the present invention.
And 203, respectively positioning a formula area and a character area in the image to be identified based on the first convolutional neural network and the second convolutional neural network.
Since two convolutional neural networks are obtained by training in step 202, one of the convolutional neural networks can be used to identify the formula region, and the other convolutional neural network can be used to identify the text region. Optionally, step 203 comprises: and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.
According to the various embodiments, it can be seen that the technical means for respectively locating the formula area and the character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network is achieved by respectively training the convolutional neural network by using the image sample marked with the formula label and the image sample marked with the character label, and the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.
Fig. 6 is a schematic diagram of a main flow of a method of recognizing an image according to one referential embodiment of the present invention.
Step 601, respectively marking a formula label and a character label for each image sample in the training data set.
In this step, a certain number of image samples containing text and/or formulas are first prepared as a training data set. Then, for each image sample, a formula label and a text label are marked on the image sample respectively. Optionally, the formula label is position information of a formula text in the image sample, and the word label is position information of a word text in the image sample.
Step 602, training a convolutional neural network by respectively using the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network.
Specifically, an image sample is used as input, a formula label is used as output, and a convolutional neural network is trained to obtain a first convolutional neural network; and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network. Optionally, the convolutional neural network may be a CPTN model, so that two CPTN models can be obtained through training, wherein one CPTN model is used for identifying a formula area in an image, and the other CPTN model is used for identifying a text area in the image.
Step 603, inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, respectively, and outputting the position information of the formula area and the position information of the character area, thereby respectively positioning the formula area and the character area in the image to be recognized.
Step 604, respectively identifying text information from the formula area and the Character area of the image to be identified based on Optical Character Recognition (OCR).
The embodiment of the invention respectively identifies the text information from the formula area and the character area through the OCR and directly outputs the characters of the text and the formula.
Therefore, the method for identifying the image provided by the embodiment of the invention can directly input the image to be identified, does not need a user to select a formula area, automatically identifies the formula area and the character area in the image, respectively outputs the characters of the text and the formula, and obviously improves the image identification efficiency.
In addition, in a reference embodiment of the present invention, the detailed implementation of the method for recognizing an image is described in detail above, so that the repeated description is not repeated here.
Fig. 7 is a schematic diagram of main blocks of an apparatus for recognizing an image 700 according to an embodiment of the present invention, as shown in fig. 7, including a labeling module 701, a training module 702, and a positioning module 703. The marking module 701 is used for marking a formula label and a character label on each image sample in the training data set respectively; the training module 702 is configured to train a convolutional neural network by using the image sample labeled with the formula label and the image sample labeled with the text label, respectively, to obtain a first convolutional neural network and a second convolutional neural network; the positioning module 703 is configured to separately position a formula area and a text area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.
Optionally, the formula tag includes position information of formula text in the image sample, and the word tag includes position information of word text in the image sample.
Optionally, the training module 702 is further configured to:
training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network;
and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network.
Optionally, the identifying module 703 is further configured to:
and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.
Optionally, the identifying module 703 is further configured to:
after a formula area and a character area in an image to be recognized are respectively positioned, text information is recognized from the formula area and the character area of the image to be recognized respectively based on optical character recognition.
According to the various embodiments, it can be seen that the technical means for respectively locating the formula area and the character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network is achieved by respectively training the convolutional neural network by using the image sample marked with the formula label and the image sample marked with the character label, and the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.
It should be noted that, in the implementation of the apparatus for recognizing an image according to the present invention, the above-mentioned method for recognizing an image has been described in detail, and therefore, the repeated description is omitted here.
Fig. 8 illustrates an exemplary system architecture 800 of a method of recognizing an image or an apparatus for recognizing an image to which an embodiment of the present invention may be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. The terminal devices 801, 802, 803 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 805 may be a server that provides various services, such as a back-office management server (for example only) that supports shopping-like websites browsed by users using the terminal devices 801, 802, 803. The background management server may analyze and otherwise process the received data such as the item information query request, and feed back a processing result (for example, target push information, item information — just an example) to the terminal device.
It should be noted that the method for recognizing an image provided by the embodiment of the present invention is generally performed by the server 805, and accordingly, the apparatus for recognizing an image is generally disposed in the server 805. The method for recognizing the image provided by the embodiment of the present invention may also be executed by the terminal devices 801, 802, and 803, and accordingly, the apparatus for recognizing the image may be disposed in the terminal devices 801, 802, and 803.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a labeling module, a training module, and a positioning module, where the names of the modules do not in some cases constitute a limitation on the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: respectively marking a formula label and a character label on each image sample in the training data set; respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network; and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.
According to the technical scheme of the embodiment of the invention, the convolution neural network is trained by adopting the image sample marked with the formula label and the image sample marked with the character label respectively to obtain the first convolution neural network and the second convolution neural network, so that the technical means of respectively positioning the formula area and the character area in the image to be recognized based on the first convolution neural network and the second convolution neural network is adopted, and the technical problem that a user needs to select the formula area in a frame mode in the prior art is solved. According to the embodiment of the invention, the formula area and the character area in the image to be identified are respectively positioned through the two convolutional neural networks, so that a user does not need to select the formula area and the character area in the image frame, and does not need to judge whether the formula area or the character area exists in the image. In addition, the embodiment of the invention can not only identify the formula area and the character area, but also distinguish the formula area and the character area, thereby reducing user operation and improving text identification efficiency.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method of recognizing an image, comprising:
respectively marking a formula label and a character label on each image sample in the training data set;
respectively training a convolutional neural network by adopting an image sample marked with a formula label and an image sample marked with a character label to obtain a first convolutional neural network and a second convolutional neural network;
and respectively positioning a formula area and a character area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network.
2. The method of claim 1, wherein the formula label comprises position information of formula text in the image sample, and wherein the text label comprises position information of text in the image sample.
3. The method of claim 2, wherein training the convolutional neural network with formula-labeled image samples and text-labeled image samples to obtain a first convolutional neural network and a second convolutional neural network respectively comprises:
training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network;
and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network.
4. The method of claim 2, wherein locating a formula area and a text area in the image to be recognized based on the first convolutional neural network and the second convolutional neural network respectively comprises:
and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.
5. The method of claim 1, wherein after locating the formula area and the text area in the image to be recognized, respectively, further comprising:
and respectively identifying text information from a formula area and a character area of the image to be identified based on optical character identification.
6. An apparatus for recognizing an image, comprising:
the marking module is used for marking a formula label and a character label on each image sample in the training data set respectively;
the training module is used for training the convolutional neural network by respectively adopting the image sample marked with the formula label and the image sample marked with the character label to obtain a first convolutional neural network and a second convolutional neural network;
and the positioning module is used for respectively positioning a formula area and a character area in the image to be identified based on the first convolutional neural network and the second convolutional neural network.
7. The apparatus of claim 6, wherein the formula label comprises position information of formula text in the image sample, and wherein the text label comprises position information of text in the image sample.
8. The apparatus of claim 7, wherein the training module is further configured to:
training a convolutional neural network by taking an image sample as input and a formula label as output to obtain a first convolutional neural network;
and training the convolutional neural network by taking the image sample as input and the character label as output to obtain a second convolutional neural network.
9. The apparatus of claim 7, wherein the identification module is further configured to:
and respectively inputting the image to be recognized into the first convolutional neural network and the second convolutional neural network, and outputting the position information of the formula area and the position information of the character area, so that the formula area and the character area are respectively positioned in the image to be recognized.
10. The apparatus of claim 6, wherein the identification module is further configured to:
after a formula area and a character area in an image to be recognized are respectively positioned, text information is recognized from the formula area and the character area of the image to be recognized respectively based on optical character recognition.
11. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201910958274.4A 2019-10-10 2019-10-10 Method and device for identifying image Pending CN110796137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910958274.4A CN110796137A (en) 2019-10-10 2019-10-10 Method and device for identifying image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910958274.4A CN110796137A (en) 2019-10-10 2019-10-10 Method and device for identifying image

Publications (1)

Publication Number Publication Date
CN110796137A true CN110796137A (en) 2020-02-14

Family

ID=69438887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910958274.4A Pending CN110796137A (en) 2019-10-10 2019-10-10 Method and device for identifying image

Country Status (1)

Country Link
CN (1) CN110796137A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626588A (en) * 2020-05-09 2021-11-09 北京金山数字娱乐科技有限公司 Convolutional neural network training method and device and article classification method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020126905A1 (en) * 2001-03-07 2002-09-12 Kabushiki Kaisha Toshiba Mathematical expression recognizing device, mathematical expression recognizing method, character recognizing device and character recognizing method
CN107886082A (en) * 2017-11-24 2018-04-06 腾讯科技(深圳)有限公司 Mathematical formulae detection method, device, computer equipment and storage medium in image
CN109389061A (en) * 2018-09-26 2019-02-26 苏州友教习亦教育科技有限公司 Paper recognition methods and system
CN109753962A (en) * 2019-01-13 2019-05-14 南京邮电大学盐城大数据研究院有限公司 Text filed processing method in natural scene image based on hybrid network
CN109886093A (en) * 2019-01-08 2019-06-14 深圳禾思众成科技有限公司 A kind of formula detection method, equipment and computer readable storage medium
CN110210581A (en) * 2019-04-28 2019-09-06 平安科技(深圳)有限公司 A kind of handwritten text recognition methods and device, electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020126905A1 (en) * 2001-03-07 2002-09-12 Kabushiki Kaisha Toshiba Mathematical expression recognizing device, mathematical expression recognizing method, character recognizing device and character recognizing method
CN107886082A (en) * 2017-11-24 2018-04-06 腾讯科技(深圳)有限公司 Mathematical formulae detection method, device, computer equipment and storage medium in image
CN109389061A (en) * 2018-09-26 2019-02-26 苏州友教习亦教育科技有限公司 Paper recognition methods and system
CN109886093A (en) * 2019-01-08 2019-06-14 深圳禾思众成科技有限公司 A kind of formula detection method, equipment and computer readable storage medium
CN109753962A (en) * 2019-01-13 2019-05-14 南京邮电大学盐城大数据研究院有限公司 Text filed processing method in natural scene image based on hybrid network
CN110210581A (en) * 2019-04-28 2019-09-06 平安科技(深圳)有限公司 A kind of handwritten text recognition methods and device, electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
户其修: "基于OCR开源框架的常用公式识别系统的研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
魏琦: "基于深度学习的印刷体文档中数学公式的检测", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626588A (en) * 2020-05-09 2021-11-09 北京金山数字娱乐科技有限公司 Convolutional neural network training method and device and article classification method and device

Similar Documents

Publication Publication Date Title
US10635735B2 (en) Method and apparatus for displaying information
CN109325213B (en) Method and device for labeling data
US11055373B2 (en) Method and apparatus for generating information
CN108108342B (en) Structured text generation method, search method and device
CN108628830B (en) Semantic recognition method and device
CN109359194B (en) Method and apparatus for predicting information categories
CN108280200B (en) Method and device for pushing information
US20200322570A1 (en) Method and apparatus for aligning paragraph and video
CN109446442B (en) Method and apparatus for processing information
CN111104479A (en) Data labeling method and device
US9588952B2 (en) Collaboratively reconstituting tables
CN113377653B (en) Method and device for generating test cases
CN109413056B (en) Method and apparatus for processing information
CN109582854B (en) Method and apparatus for generating information
CN109753644B (en) Rich text editing method and device, mobile terminal and storage medium
CN110910178A (en) Method and device for generating advertisement
CN111160410A (en) Object detection method and device
CN110705271B (en) System and method for providing natural language processing service
CN107329981B (en) Page detection method and device
CN110852057A (en) Method and device for calculating text similarity
CN109710634B (en) Method and device for generating information
CN112528610A (en) Data labeling method and device, electronic equipment and storage medium
CN110796137A (en) Method and device for identifying image
CN111400581A (en) System, method and apparatus for annotating samples
CN111368693A (en) Identification method and device for identity card information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220926

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200214