WO2021147221A1 - Text recognition method and apparatus, and electronic device and storage medium - Google Patents
Text recognition method and apparatus, and electronic device and storage medium Download PDFInfo
- Publication number
- WO2021147221A1 WO2021147221A1 PCT/CN2020/093605 CN2020093605W WO2021147221A1 WO 2021147221 A1 WO2021147221 A1 WO 2021147221A1 CN 2020093605 W CN2020093605 W CN 2020093605W WO 2021147221 A1 WO2021147221 A1 WO 2021147221A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- text
- recognition
- recognition result
- image
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This application relates to the field of artificial intelligence, and in particular to a text recognition method, device, electronic equipment, and computer-readable storage medium.
- the basic process of the existing general OCR recognition is to first detect the area where the text in the picture is located, draw the circumscribed rectangular frame of each area, and then perform a basic two-dimensional rotation correction on each rectangular frame, and then input the cut block for recognition Module to obtain all the text content of the entire picture.
- this process can correct the inclination of the target in a two-dimensional plane, the inventor realizes that in actual image recognition scenarios, there are often cases where the recognition target and the original picture are not coplanar. In this case, the image recognition result will be far from the correct result.
- a text recognition method includes:
- Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
- Recognition step perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;
- Analysis step generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results.
- An abnormal text box is recognized in the first target recognition result;
- Update step send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box.
- the first target recognition result is described, and the second target recognition result is generated;
- the first feedback step feedback the second target recognition result to the user.
- a text recognition device which includes:
- a receiving module configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
- the recognition module is configured to perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized.
- the first target recognition result includes multiple target text boxes and the multiple First target text information corresponding to each target text box;
- the analysis module is configured to generate pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and output the results according to the model Identifying an abnormal text box from the first target recognition result;
- An update module configured to send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, based on the second target text information of the abnormal text box Update the first target recognition result to generate a second target recognition result;
- the first feedback module is configured to feed back the second target recognition result to the user.
- An electronic device comprising: a memory and a processor, the memory storing a text recognition program that can be run on the processor, and the text recognition program can be executed by the processor to implement the following steps:
- Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
- Recognition step perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;
- Analysis step generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results.
- An abnormal text box is recognized in the first target recognition result;
- Update step send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box.
- the first target recognition result is described, and the second target recognition result is generated;
- the first feedback step feedback the second target recognition result to the user.
- a computer-readable storage medium includes a text recognition program, and when the text recognition program is executed by a processor, the following steps can be implemented:
- Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
- Recognition step perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;
- Analysis step generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results.
- An abnormal text box is recognized in the first target recognition result;
- Update step send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box.
- the first target recognition result is described, and the second target recognition result is generated;
- the first feedback step feedback the second target recognition result to the user
- the above text recognition method, device, electronic equipment, and computer-readable storage medium after receiving the instruction from the user to carry the image to be recognized, perform text recognition on the image to be recognized to obtain the first target recognition result, and generate according to the first target recognition result
- For the image to be verified calculate the similarity between the image to be recognized and the corresponding target text box, identify the abnormal text box in the first target recognition result based on the similarity and perform exception processing, and update the first target recognition result based on the abnormal processing result to obtain the second target
- the recognition result, the second target recognition result is fed back to the user.
- the accuracy of the output of the recognition results is improved, and the user experience is improved; by random perspective transformation of the image to be recognized, the accuracy rate is selected from the recognition results corresponding to the results of multiple perspective transformations
- the highest text information is used as the first target text information of the target text box, which improves the accuracy of text recognition; before the image to be recognized is recognized, the image to be recognized is also subjected to distortion correction, which lays the foundation for accurate text recognition.
- Figure 1 is a flowchart of a preferred embodiment of the text recognition method of this application.
- FIG. 2 is a schematic diagram of a preferred embodiment of the electronic device of this application.
- FIG. 3 is a schematic diagram of modules of a preferred embodiment of the text recognition device of this application.
- This application provides a text recognition method.
- the method can be executed by a device, and the device can be implemented by software and/or hardware.
- FIG. 1 it is a flowchart of a preferred embodiment of the text recognition method of this application.
- the text recognition method only includes: step S1-step S5.
- Step S1 Receive a text recognition instruction sent by a user, where the text recognition instruction includes an image to be recognized.
- the electronic device is used as the execution subject to describe each embodiment of the present application.
- the user selects the image to be recognized through the APP on the client, and sends a text recognition instruction based on the selected image to be recognized.
- the electronic device After receiving the instruction issued by the client, the electronic device performs a text recognition operation on the image to be recognized carried in the instruction.
- Step S2 Perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized.
- the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box.
- the pre-trained OCR recognition model is used to perform OCR recognition on the image to be recognized, and the recognition result output by the model is used as the first target recognition result.
- Step S3 generating pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, inputting the pictures to be verified and the target text boxes into a preset analysis model, and outputting the results from all An abnormal text box is identified in the first target recognition result.
- the generating the to-be-verified pictures corresponding to the multiple target text boxes based on the first target recognition result includes:
- a target text box P For example, take a target text box P as an example, read the length and width information of the target text box P, use the length and width information to determine a pure color background picture P1 with a random color (light color is best, for example, white), and then obtain The first target text information PT corresponding to the target text box P, the first target text information PT is formatted to generate the first target text information PT1 of Song Ti, and the first target text information PT1 of Song Ti is centered on the pure color background picture P1 , To obtain a picture P2 to be verified with black characters on a white background and corresponding to the target text box P.
- a pure color background picture P1 with a random color (light color is best, for example, white)
- the placing the first target text information corresponding to the target text box in the solid-color background picture in a preset format further includes:
- each target text box and its corresponding picture to be verified are compared and analyzed, and the consistency of the target text box and its corresponding picture to be verified is carried out. Analysis can determine the abnormal text box in the first target recognition result.
- the preset analysis model is a convolutional neural network, and preferably, the preset analysis model is resnet50.
- the preset analysis model is used to extract features from the target text box and its corresponding image to be verified.
- Pre-train a convolutional neural network for feature extraction use the trained neural network to extract the features of the text box and the image to be verified, and calculate the similarity between the text box and the image to be verified to determine whether the contents of the two images are the same , And perform exception handling on the text boxes judged to be inconsistent.
- the analysis model includes a batch input layer, a feature extraction layer, an L2 normalization layer, and a loss function.
- the loss function includes but not limited to Softmax loss, Center Either loss or Triplet loss. Different loss functions have different requirements for training data.
- the anchor point sample refers to the field interception map of the original image.
- the positive sample refers to the graph generated according to the field content
- the negative sample refers to the graph produced by replacing the field content.
- each replaced field can be replaced according to the order of the Chinese character table.
- n pictures generated according to different sizes, angles, and colors in each type of sample and n copied field cropping pictures.
- the ROC curve is used to calculate the threshold that maximizes the accuracy of the model.
- the hope to achieve the effect is: in the feature space, the distance between the image features of the same content is getting closer, and the distance between the image features of other different content is getting farther and farther.
- the inputting the image to be verified and the target text box into a preset analysis model, and identifying an abnormal text box from the first target recognition result according to the model output result includes:
- the aforementioned preset similarity algorithm includes but is not limited to any one of Euclidean distance algorithm and cosine similarity algorithm.
- the features extracted from the target text box can better reflect the original features of the region in the image to be recognized, and the features extracted from the corresponding image to be verified can better reflect the features of the first target text information.
- the target text boxes with similarity greater than or equal to the similarity threshold are regarded as normal text boxes, and the target text boxes with similarity less than the similarity threshold are regarded as abnormal text boxes.
- Step S4 the abnormal text box is sent to a preset terminal, and the second target text information of the abnormal text box fed back by the preset terminal is received, and the second target text information of the abnormal text box is updated based on the second target text information of the abnormal text box. According to the first target recognition result, the second target recognition result is generated.
- the abnormal text box After determining the abnormal text box in the first target recognition result, the abnormal text box needs to be processed.
- the aforementioned preset terminal is a terminal used by crowdsourced personnel. Send the abnormal text box to the crowdsourcing personnel, artificially identify the second target text information corresponding to the abnormal text box, and return the second target text information corresponding to the abnormal text box to the electronic device.
- the electronic device updates the first target text information corresponding to the abnormal text box in the first target recognition result based on the received second target text information corresponding to the abnormal text box to obtain the second target recognition result.
- Step S5 feeding back the second target recognition result to the user.
- the second target recognition result is displayed to the user through the client.
- the text recognition method includes: steps S1-step S3 and step S6.
- Step S6 When there is no abnormal text box in the first recognition result, the first recognition result is fed back to the user.
- the first target recognition result is directly used as the final recognition result, and the obtained final recognition result is displayed to the user through the client.
- the performing text recognition on the image to be recognized based on preset recognition rules to obtain the first target recognition result of the image to be recognized includes:
- multiple random perspective transformations are performed on each first text box to obtain multiple corresponding second text boxes. For example, perform 5 random perspective transformations on each first text box to obtain 5 second text boxes corresponding to one first text box.
- the second text box includes the first text box.
- the OCR recognition model is used to identify the first text information and the first confidence level corresponding to the five second text boxes. And filter out the second text box with the highest first confidence as the target text box corresponding to the first text box.
- the first target text information of the first text box is determined according to the first text information of the target text box, and the first target text information of each first text box is summarized to obtain the first target recognition result.
- the method further includes:
- Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
- the performing distortion correction on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction includes:
- the coordinates of each pixel corner on the undistorted image are obtained by performing distortion correction on the pixel corners on the original image to be recognized with distortion, where the pixel corners can be the distorted image to be recognized If the image to be recognized is a quadrilateral, it is the four vertices of the quadrilateral.
- the perspective transformation matrix At least the corresponding coordinates of four pixel points are needed to solve the solution. Therefore, when obtaining the pixel corner points on the image to be recognized with distortion, at least the coordinates of the four pixel corner points need to be obtained.
- the image to be recognized can be subjected to distortion correction to obtain the image to be recognized after the distortion correction, and then the subsequent recognition, verification, update and feedback operations are performed.
- the coordinates of pixel corners on the undistorted image calculated by distortion correction are not one-to-one mapping, it is possible that the coordinates on the undistorted image calculated for the pixel corners on the original distorted image are not unique, in order to find The optimal coordinates of the pixel corners on the undistorted image.
- the calculating the coordinates of the pixel corners on the undistorted image includes:
- the distance between each neighborhood pixel and the pixel corner can be calculated according to the coordinates of each neighborhood pixel on the original distorted image to be recognized, and then the coordinates corresponding to the shortest distance can be determined as the pixel corner is undistorted The coordinates on the image.
- the neighborhood radius can be flexibly set according to the degree of distortion of the original distorted image to be recognized.
- the degree of distortion is small, the neighborhood radius It can be set smaller, so that there are fewer pixels in the neighborhood that need to be traversed, which can reduce the amount of calculation.
- the degree of distortion is large, the radius of the neighborhood can be set larger, so that the optimal pixel can be found.
- text recognition is performed on the image to be recognized to obtain a first target recognition result
- the image to be verified is generated according to the first target recognition result
- the to-be-recognized image is calculated
- the similarity between the picture and the corresponding target text box, the abnormal text box in the first target recognition result is identified according to the similarity and the exception processing is performed, and the first target recognition result is updated based on the abnormal processing result to obtain the second target recognition result, and the second target
- the recognition result is fed back to the user.
- the accuracy of the output of the recognition results is improved, and the user experience is improved; by random perspective transformation of the image to be recognized, the accuracy rate is selected from the recognition results corresponding to the results of multiple perspective transformations
- the highest text information is used as the first target text information of the target text box, which improves the accuracy of text recognition; before the image to be recognized is recognized, the image to be recognized is also subjected to distortion correction, which lays the foundation for accurate text recognition.
- FIG. 2 is a schematic diagram of a preferred embodiment of the electronic device of this application.
- the electronic device 1 may be a terminal device with data processing functions such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, etc.
- the server may be a rack server, a blade server, or a tower. Server or rack server.
- the electronic device 1 includes a memory 11, a processor 12 and a network interface 13.
- the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
- the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk of the electronic device 1.
- the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
- the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
- the memory 11 can be used not only to store application software and various data installed in the electronic device 1, for example, a text recognition program 10, etc., but also to temporarily store data that has been output or will be output.
- the processor 12 may be a central processing unit (Central Processing Unit) in some embodiments.
- Central Processing Unit CPU
- controller a controller
- microcontroller a microprocessor
- other data processing chips are used to run program codes or processing data stored in the memory 11, for example, the text recognition program 10, etc.
- the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), which is usually used to establish a communication connection between the electronic device 1 and other electronic devices, for example, a client (not shown in the figure). ).
- a wireless interface such as a WI-FI interface
- the components 11-13 of the electronic device 1 communicate with each other via a communication bus.
- FIG. 2 only shows the electronic device 1 with components 11-13. Those skilled in the art can understand that the structure shown in FIG. 2 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Multiple components, or a combination of certain components, or different component arrangements.
- the electronic device 1 may further include a user interface.
- the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
- the optional user interface may also include a standard wired interface and a wireless interface.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc.
- the display may also be called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
- the memory 11 as a computer storage medium stores the program code of the text recognition program 10, and when the processor 12 executes the program code of the text recognition program 10, the following steps are implemented:
- Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized.
- the user selects the image to be recognized through the APP on the client, and sends a text recognition instruction based on the selected image to be recognized.
- the electronic device 1 After receiving the instruction sent by the client, the electronic device 1 performs a text recognition operation on the image to be recognized carried in the instruction.
- Recognition step perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box.
- the pre-trained OCR recognition model is used to perform OCR recognition on the image to be recognized, and the recognition result output by the model is used as the first target recognition result.
- Analysis step generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is identified in the first target recognition result.
- the generating the to-be-verified pictures corresponding to the multiple target text boxes based on the first target recognition result includes:
- a target text box P For example, take a target text box P as an example, read the length and width information of the target text box P, use the length and width information to determine a pure color background picture P1 with a random color (light color is best, for example, white), and then obtain The first target text information PT corresponding to the target text box P, the first target text information PT is formatted to generate the first target text information PT1 of Song Ti, and the first target text information PT1 of Song Ti is centered on the pure color background picture P1 , A picture P2 to be verified with black characters on a white background corresponding to the target text box P is obtained.
- a pure color background picture P1 with a random color (light color is best, for example, white)
- the placing the first target text information corresponding to the target text box in the solid-color background picture in a preset format further includes:
- each target text box and its corresponding picture to be verified are compared and analyzed, and the consistency of the target text box and its corresponding picture to be verified is carried out. Analysis can determine the abnormal text box in the first target recognition result.
- the preset analysis model is a convolutional neural network, and preferably, the preset analysis model is resnet50.
- the preset analysis model is used to extract features from the target text box and its corresponding image to be verified.
- Pre-train a convolutional neural network for feature extraction use the trained neural network to extract the features of the text box and the image to be verified, and calculate the similarity between the text box and the image to be verified to determine whether the contents of the two images are the same , And perform exception handling on the text boxes judged to be inconsistent.
- the analysis model includes a batch input layer, a feature extraction layer, an L2 normalization layer, and a loss function.
- the loss function includes but not limited to Softmax loss, Center Either loss or Triplet loss. Different loss functions have different requirements for training data.
- the anchor point sample refers to the field interception map of the original image.
- the positive sample refers to the graph generated according to the field content
- the negative sample refers to the graph produced by replacing the field content.
- each replaced field can be replaced according to the order of the Chinese character table.
- n pictures generated according to different sizes, angles, and colors in each type of sample and n copied field cropping pictures.
- the ROC curve is used to calculate the threshold that maximizes the accuracy of the model.
- the hope to achieve the effect is: in the feature space, the distance between the image features of the same content is getting closer, and the distance between the image features of other different content is getting farther and farther.
- the inputting the image to be verified and the target text box into a preset analysis model, and identifying an abnormal text box from the first target recognition result according to the model output result includes:
- the aforementioned preset similarity algorithm includes but is not limited to any one of Euclidean distance algorithm and cosine similarity algorithm.
- the features extracted from the target text box can better reflect the original features of the region in the image to be recognized, and the features extracted from the corresponding image to be verified can better reflect the features of the first target text information.
- the target text boxes with similarity greater than or equal to the similarity threshold are regarded as normal text boxes, and the target text boxes with similarity less than the similarity threshold are regarded as abnormal text boxes.
- Update step send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box.
- the first target recognition result is described, and the second target recognition result is generated;
- the abnormal text box After determining the abnormal text box in the first target recognition result, the abnormal text box needs to be processed.
- the aforementioned preset terminal is a terminal used by crowdsourced personnel.
- the abnormal text box is sent to the crowdsourcing personnel, the second target text information corresponding to the abnormal text box is manually recognized, and the second target text information corresponding to the abnormal text box is returned to the electronic device 1.
- the electronic device 1 updates the first target text information corresponding to the abnormal text box in the first target recognition result based on the received second target text information corresponding to the abnormal text box to obtain the second target recognition result.
- Feedback step feedback the second target recognition result to the user.
- the second target recognition result is displayed to the user through the client.
- the first recognition result when there is no abnormal text box in the first recognition result, the first recognition result is fed back to the user.
- the first target recognition result is directly used as the final recognition result, and the obtained final recognition result is displayed to the user through the client.
- the performing text recognition on the image to be recognized based on preset recognition rules to obtain the first target recognition result of the image to be recognized includes:
- multiple random perspective transformations are performed on each first text box to obtain multiple corresponding second text boxes. For example, perform 5 random perspective transformations on each first text box to obtain 5 second text boxes corresponding to one first text box.
- the second text box includes the first text box.
- the OCR recognition model is used to identify the first text information and the first confidence level corresponding to the five second text boxes. And filter out the second text box with the highest first confidence as the target text box corresponding to the first text box.
- the first target text information of the first text box is determined according to the first text information of the target text box, and the first target text information of each first text box is summarized to obtain the first target recognition result.
- the image to be recognized may be captured by the user in real time, and when the user uses the camera to capture the image to be recognized, the image may be distorted due to the characteristics of the camera itself. Therefore, in order to further improve the accuracy of recognition, in other embodiments, when the processor 12 executes the text recognition program 10, before the recognition step, the following steps are further implemented:
- Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
- the performing distortion correction on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction includes:
- the coordinates of each pixel corner on the undistorted image are obtained by performing distortion correction on the pixel corners on the original image to be recognized with distortion, where the pixel corners can be the distorted image to be recognized If the image to be recognized is a quadrilateral, it is the four vertices of the quadrilateral.
- the perspective transformation matrix At least the corresponding coordinates of four pixel points are needed to solve the solution. Therefore, when obtaining the pixel corner points on the image to be recognized with distortion, at least the coordinates of the four pixel corner points need to be obtained.
- the image to be recognized can be subjected to distortion correction to obtain the image to be recognized after the distortion correction, and then the subsequent recognition, verification, update and feedback operations are performed.
- the coordinates of pixel corners on the undistorted image calculated by distortion correction are not one-to-one mapping, it is possible that the coordinates on the undistorted image calculated for the pixel corners on the original distorted image are not unique, in order to find The optimal coordinates of the pixel corners on the undistorted image.
- the calculating the coordinates of the pixel corners on the undistorted image includes:
- the distance between each neighborhood pixel and the pixel corner can be calculated according to the coordinates of each neighborhood pixel on the original distorted image to be recognized, and then the coordinates corresponding to the shortest distance can be determined as the pixel corner is undistorted The coordinates on the image.
- the neighborhood radius can be flexibly set according to the degree of distortion of the original distorted image to be recognized.
- the degree of distortion is small, the neighborhood radius It can be set smaller, so that there are fewer pixels in the neighborhood that need to be traversed, which can reduce the amount of calculation.
- the degree of distortion is large, the radius of the neighborhood can be set larger, so that the optimal pixel can be found.
- This application also proposes a text recognition device.
- FIG. 3 it is a schematic diagram of modules of a preferred embodiment of the text recognition device of this application.
- the text recognition apparatus 2 in this embodiment may include: module 210-module 250 according to the realized functions.
- the module can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
- each module/unit is as follows:
- the receiving module 210 is configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
- the recognition module 220 is configured to perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized.
- the first target recognition result includes a plurality of target text boxes and the The first target text information corresponding to the multiple target text boxes;
- the analysis module 230 is configured to generate pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and output according to the model As a result, an abnormal text box is identified from the first target recognition result;
- the update module 240 is configured to send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, based on the second target text of the abnormal text box Information updates the first target recognition result, and generates a second target recognition result;
- the feedback module 250 is configured to feed back the second target recognition result to the user.
- modules 210-250 The functions or operation steps implemented by the modules 210-250 are similar to the above, and will not be described in detail here.
- the embodiment of the present application also proposes a computer-readable storage medium.
- the computer-readable storage medium may be non-volatile or volatile.
- the computer-readable storage medium includes a text recognition program 10, which implements any step of the text recognition method when the text recognition program 10 is executed by a processor.
- the specific implementation of the computer-readable storage medium of the present application is substantially the same as the foregoing method embodiment, and will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Character Discrimination (AREA)
Abstract
The present application relates to artificial intelligence, and in particular to the field of image processing. Disclosed is a text recognition method. The method comprises: after receiving an instruction which carries an image to be recognized and which is sent by a user, performing text recognition on the image to be recognized, so as to obtain a first target recognition result; generating, according to the first target recognition result, a picture to be verified, calculating the similarity between the picture to be recognized and a corresponding target text box, recognizing an abnormal text box in the first target recognition result according to the similarity, and performing abnormality processing; updating the first target recognition result on the basis of an abnormality processing result to obtain a second target recognition result; and feeding back the second target recognition result to the user. Further disclosed are a text recognition apparatus, an electronic device and a computer storage medium. The present application can improve the accuracy of text recognition.
Description
本申请要求于2020年01月22日提交中国专利局、申请号为202010073495.6、发明名称为“文本识别方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010073495.6, and the invention title is "text recognition method, device and storage medium" on January 22, 2020, the entire content of which is incorporated into the application by reference .
本申请涉及人工智能领域,尤其涉及一种文本识别方法、装置、电子设备及计算机可读存储介质。This application relates to the field of artificial intelligence, and in particular to a text recognition method, device, electronic equipment, and computer-readable storage medium.
现今专用OCR识别已经有一套成熟的算法,分别承担目标文件检测,字段检测和字段识别,这个过程是端到端的,结果将直接输出至用户。Nowadays, dedicated OCR recognition already has a set of mature algorithms, which are respectively responsible for target file detection, field detection and field recognition. This process is end-to-end, and the results will be directly output to the user.
现有的通用OCR识别的基本流程是,首先检测图片中文字所在的区域,画出每个区域的外接矩形框,然后把每个矩形框进行基本的二维旋转矫正后,把切块输入识别模块,由此获得整张图片的全部文本内容。虽然这个流程可以矫正目标在二维平面内的倾斜,然而,发明人意识到在实际的图像识别情景中,经常有识别对象和原图片并不共平面的情况。这种情况下的图像识别结果也会与正确的结果相差甚远。The basic process of the existing general OCR recognition is to first detect the area where the text in the picture is located, draw the circumscribed rectangular frame of each area, and then perform a basic two-dimensional rotation correction on each rectangular frame, and then input the cut block for recognition Module to obtain all the text content of the entire picture. Although this process can correct the inclination of the target in a two-dimensional plane, the inventor realizes that in actual image recognition scenarios, there are often cases where the recognition target and the original picture are not coplanar. In this case, the image recognition result will be far from the correct result.
因此,亟待提供一种能准确从图片中识别文本的方法。Therefore, it is urgent to provide a method that can accurately recognize text from pictures.
基于此,有必要针对上述技术问题,提供一种能够提高文本识别准确性的文本识别方法、装置、电子设备及计算机可读存储介质。Based on this, it is necessary to provide a text recognition method, device, electronic device, and computer-readable storage medium that can improve the accuracy of text recognition in response to the above technical problems.
一种文本识别方法,该方法包括:A text recognition method, the method includes:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;
分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is recognized in the first target recognition result;
更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;
第一反馈步骤:将所述第二目标识别结果反馈至所述用户。The first feedback step: feedback the second target recognition result to the user.
一种文本识别装置,该装置包括:A text recognition device, which includes:
接收模块,用于接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;A receiving module, configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
识别模块,用于基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;The recognition module is configured to perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized. The first target recognition result includes multiple target text boxes and the multiple First target text information corresponding to each target text box;
分析模块,用于基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;The analysis module is configured to generate pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and output the results according to the model Identifying an abnormal text box from the first target recognition result;
更新模块,用于将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;An update module, configured to send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, based on the second target text information of the abnormal text box Update the first target recognition result to generate a second target recognition result;
第一反馈模块,用于将所述第二目标识别结果反馈至所述用户。The first feedback module is configured to feed back the second target recognition result to the user.
一种电子设备,该设备包括:存储器、处理器,所述存储器中存储有可在所述处理器上运行的文本识别程序,所述文本识别程序被所述处理器执行时可实现如下步骤:An electronic device comprising: a memory and a processor, the memory storing a text recognition program that can be run on the processor, and the text recognition program can be executed by the processor to implement the following steps:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;
分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is recognized in the first target recognition result;
更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;
第一反馈步骤:将所述第二目标识别结果反馈至所述用户。The first feedback step: feedback the second target recognition result to the user.
一种计算机可读存储介质,所述计算机可读存储介质中包括文本识别程序,所述文本识别程序被处理器执行时,可实现如下步骤:A computer-readable storage medium, the computer-readable storage medium includes a text recognition program, and when the text recognition program is executed by a processor, the following steps can be implemented:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;
分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is recognized in the first target recognition result;
更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;
第一反馈步骤:将所述第二目标识别结果反馈至所述用户The first feedback step: feedback the second target recognition result to the user
上述文本识别方法、装置、电子设备及计算机可读存储介质,在接收到用户发出的携带待识别图像的指令后,对待识别图像进行文本识别得到第一目标识别结果,根据第一目标识别结果生成待验证图片,计算待识别图片及对应目标文本框的相似度,根据相似度识别第一目标识别结果中的异常文本框并进行异常处理,基于异常处理结果更新第一目标识别结果得到第二目标识别结果,将第二目标识别结果反馈给用户。通过在通用OCR识别过程后新增一个验证机制,提高识别结果输出的准确性,提高用户的使用体验;通过对待识别图片进行随机透视变换,从多次透视变换结果对应的识别结果中选择准确率最高的文本信息作为目标文本框的第一目标文本信息,提高了文本识别的准确性;在对待识别图像进行识别前还对待识别图像进行畸变校正,为准确识别文本奠定基础。The above text recognition method, device, electronic equipment, and computer-readable storage medium, after receiving the instruction from the user to carry the image to be recognized, perform text recognition on the image to be recognized to obtain the first target recognition result, and generate according to the first target recognition result For the image to be verified, calculate the similarity between the image to be recognized and the corresponding target text box, identify the abnormal text box in the first target recognition result based on the similarity and perform exception processing, and update the first target recognition result based on the abnormal processing result to obtain the second target The recognition result, the second target recognition result is fed back to the user. By adding a verification mechanism after the general OCR recognition process, the accuracy of the output of the recognition results is improved, and the user experience is improved; by random perspective transformation of the image to be recognized, the accuracy rate is selected from the recognition results corresponding to the results of multiple perspective transformations The highest text information is used as the first target text information of the target text box, which improves the accuracy of text recognition; before the image to be recognized is recognized, the image to be recognized is also subjected to distortion correction, which lays the foundation for accurate text recognition.
图1为本申请文本识别方法较佳实施例的流程图;Figure 1 is a flowchart of a preferred embodiment of the text recognition method of this application;
图2为本申请电子设备较佳实施例的示意图;FIG. 2 is a schematic diagram of a preferred embodiment of the electronic device of this application;
图3为本申请文本识别装置较佳实施例的模块示意图。FIG. 3 is a schematic diagram of modules of a preferred embodiment of the text recognition device of this application.
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供一种文本识别方法。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides a text recognition method. The method can be executed by a device, and the device can be implemented by software and/or hardware.
参照图1所示,为本申请文本识别方法较佳实施例的流程图。Referring to FIG. 1, it is a flowchart of a preferred embodiment of the text recognition method of this application.
在本申请文本识别方法一较佳实施例中,所述文本识别方法仅包括:步骤S1-步骤S5。In a preferred embodiment of the text recognition method of the present application, the text recognition method only includes: step S1-step S5.
步骤S1,接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像。Step S1: Receive a text recognition instruction sent by a user, where the text recognition instruction includes an image to be recognized.
以下以电子设备作为执行主体对本申请各实施例进行说明。In the following, the electronic device is used as the execution subject to describe each embodiment of the present application.
用户通过客户端上的APP选择待识别图像,并基于选择的待识别图像发出文本识别指令。电子设备接收到客户端发出的指令后,对指令中携带的待识别图像执行文本识别操作。The user selects the image to be recognized through the APP on the client, and sends a text recognition instruction based on the selected image to be recognized. After receiving the instruction issued by the client, the electronic device performs a text recognition operation on the image to be recognized carried in the instruction.
步骤S2,基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息。Step S2: Perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized. The first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box.
例如,利用预先训练好的OCR识别模型对待识别图像进行OCR识别,将模型输出的识别结果作为第一目标识别结果。For example, the pre-trained OCR recognition model is used to perform OCR recognition on the image to be recognized, and the recognition result output by the model is used as the first target recognition result.
步骤S3,基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框。Step S3, generating pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, inputting the pictures to be verified and the target text boxes into a preset analysis model, and outputting the results from all An abnormal text box is identified in the first target recognition result.
为了提高文本识别的准确性,在对待识别图片进行OCR识别得到第一目标识别结果后,对第一目标识别结果进行准确性验证,因此,需对第一目标识别结果进行异常文本框识别。在本实施例中,所述基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,包括:In order to improve the accuracy of text recognition, after the first target recognition result is obtained by performing OCR recognition on the picture to be recognized, the accuracy of the first target recognition result is verified. Therefore, it is necessary to perform abnormal text box recognition on the first target recognition result. In this embodiment, the generating the to-be-verified pictures corresponding to the multiple target text boxes based on the first target recognition result includes:
a1、读取所述第一识别结果中一个所述目标文本框的长宽信息,新建一张与所述目标文本框的长宽信息一致的背景图片;及a1. Read the length and width information of one of the target text boxes in the first recognition result, and create a new background picture consistent with the length and width information of the target text box; and
a2、获取所述第一识别结果中所述目标文本框对应的第一目标文本信息,将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,生成所述目标文本框对应的待验证图像。a2. Obtain the first target text information corresponding to the target text box in the first recognition result, and place the first target text information corresponding to the target text box in the background picture in a preset format to generate all The image to be verified corresponding to the target text box.
例如,以一个目标文本框P为例,读取目标文本框P的长宽信息,以该长宽信息确定一张随机颜色(浅色最佳,例如,白色)的纯色背景图片P1,然后获取目标文本框P对应的第一目标文本信息PT,将第一目标文本信息PT进行格式转换生成宋体的第一目标文本信息PT1,将宋体的第一目标文本信息PT1居中放置在纯色背景图片P1上,得到目标文本框P对应的一张白底黑字的待验证图片P2。For example, take a target text box P as an example, read the length and width information of the target text box P, use the length and width information to determine a pure color background picture P1 with a random color (light color is best, for example, white), and then obtain The first target text information PT corresponding to the target text box P, the first target text information PT is formatted to generate the first target text information PT1 of Song Ti, and the first target text information PT1 of Song Ti is centered on the pure color background picture P1 , To obtain a picture P2 to be verified with black characters on a white background and corresponding to the target text box P.
在其他实施例中,所述将所述目标文本框对应的第一目标文本信息以预设格式放置在所述纯色背景图片中,还包括:In other embodiments, the placing the first target text information corresponding to the target text box in the solid-color background picture in a preset format further includes:
b1、对所述目标文本框对应的第一目标文本信息进行随机格式调整;及b1. Perform random format adjustment on the first target text information corresponding to the target text box; and
b2、将随机格式调整后的所述第一目标文本信息放置在所述背景图片中。b2. Place the first target text information adjusted in a random format in the background picture.
例如,修改字体颜色、字体大小、字体、角度等。通过以上随机调整,增加生成的待验证图片的随机性,为后续准确验证奠定基础。For example, modify the font color, font size, font, angle, etc. Through the above random adjustments, the randomness of the generated images to be verified is increased, laying the foundation for subsequent accurate verification.
在本实施例中,基于每一个目标文本框生成对应的待验证图片后,对比分析每一个目标文本框及其对应的待验证图片,通过对目标文本框及其对应的待验证图片进行一致性分析,即可确定第一目标识别结果中的异常文本框。In this embodiment, after the corresponding picture to be verified is generated based on each target text box, each target text box and its corresponding picture to be verified are compared and analyzed, and the consistency of the target text box and its corresponding picture to be verified is carried out. Analysis can determine the abnormal text box in the first target recognition result.
在本实施例中,所述预设分析模型为卷积神经网络,优选地,所述预设分析模型为resnet50。所述预设分析模型用于从所述目标文本框及其对应的待验证图片中提取特征。预先训练一个用于特征提取的卷积神经网络,利用训练好的神经网络提取文本框及待验证图片的特征,通过计算文本框及待验证图片的相似度,以判断两张图片的内容是否一致,对判断为不一致的文本框进行异常处理。In this embodiment, the preset analysis model is a convolutional neural network, and preferably, the preset analysis model is resnet50. The preset analysis model is used to extract features from the target text box and its corresponding image to be verified. Pre-train a convolutional neural network for feature extraction, use the trained neural network to extract the features of the text box and the image to be verified, and calculate the similarity between the text box and the image to be verified to determine whether the contents of the two images are the same , And perform exception handling on the text boxes judged to be inconsistent.
所述分析模型包括:batch输入层、特征提取层、L2归一化层及损失函数。所述损失函数包括但不仅限于 Softmax loss、Center
loss或者Triplet loss中的任意一种。损失函数不同,对训练数据的要求也不一样。The analysis model includes a batch input layer, a feature extraction layer, an L2 normalization layer, and a loss function. The loss function includes but not limited to Softmax loss, Center
Either loss or Triplet loss. Different loss functions have different requirements for training data.
以三元组损失函数为例,本实施例中为了适应三元组损失函数,需要在batch中选取锚点样本、正样本、负样本,在OCR业务中锚点样本指原图的字段截取图,正样本指按照字段内容生成的图,负样本指更换字段内容生产的图。其中,负样本的选取,每个替换的字段可根据汉字表顺序进行替换。Take the triplet loss function as an example. In this embodiment, in order to adapt to the triplet loss function, it is necessary to select anchor point samples, positive samples, and negative samples in the batch. In the OCR business, the anchor point sample refers to the field interception map of the original image. , The positive sample refers to the graph generated according to the field content, and the negative sample refers to the graph produced by replacing the field content. Among them, for the selection of negative samples, each replaced field can be replaced according to the order of the Chinese character table.
在其他实施例中,为适应中心损失函数,需要在每一类样本,选取n张按照不同大小、角度、颜色来生成的图片,和n张复制的字段裁剪图。在生成样本图片过程中,设置字体大小区间,角度区间,颜色区间,在区间中随机取值获取变换参数生成图片。In other embodiments, in order to adapt to the central loss function, it is necessary to select n pictures generated according to different sizes, angles, and colors in each type of sample, and n copied field cropping pictures. In the process of generating sample pictures, set the font size interval, angle interval, and color interval, and randomly select values in the interval to obtain transformation parameters to generate the picture.
训练时通过ROC曲线,计算出使模型准确率最大的阈值。通过训练,希望达到效果是:在特征空间里面,同一内容的图片特征距离变得越来越近,与其他非同内容的图片特征距离变得越来越远。During training, the ROC curve is used to calculate the threshold that maximizes the accuracy of the model. Through training, the hope to achieve the effect is: in the feature space, the distance between the image features of the same content is getting closer, and the distance between the image features of other different content is getting farther and farther.
在本实施例中,所述将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框,包括:In this embodiment, the inputting the image to be verified and the target text box into a preset analysis model, and identifying an abnormal text box from the first target recognition result according to the model output result includes:
c1、根据所述模型输出结果分别确定所述目标文本框及所述待验证图片的特征向量;c1. Determine the feature vectors of the target text box and the picture to be verified respectively according to the output result of the model;
c2、利用预设相似度算法计算所述目标文本框与所述待验证图片的特征向量间的相似度;及c2. Calculate the similarity between the target text box and the feature vector of the picture to be verified by using a preset similarity algorithm; and
c3、当所述相似度小于预设相似度阈值时,判断所述目标文本框为异常文本框。c3. When the similarity is less than a preset similarity threshold, determine that the target text box is an abnormal text box.
例如,上述预设相似度算法包括但不仅限于欧式距离算法、余弦相似度算法中的任意一种。For example, the aforementioned preset similarity algorithm includes but is not limited to any one of Euclidean distance algorithm and cosine similarity algorithm.
可以理解的是,从目标文本框中提取的特征更能体现待识别图像中该区域原有的特征,从其对应的待验证图片中提取的特征更能体现第一目标文本信息的特征,通过计算两者之间的相似度,可判断两者之间的一致性。相似度越高,两者一致的可能性越高,即识别结果的准确性越高,相反,相似度越低,两者一致的可能性越低,即识别结果的准确性越低。通过设置一个相似度阈值,将相似度大于或等于相似度阈值的目标文本框作为正常的文本框,将相似度小于相似度阈值的目标文本框作为异常文本框。It is understandable that the features extracted from the target text box can better reflect the original features of the region in the image to be recognized, and the features extracted from the corresponding image to be verified can better reflect the features of the first target text information. Calculate the similarity between the two to judge the consistency between the two. The higher the similarity, the higher the likelihood of the two being consistent, that is, the higher the accuracy of the recognition result. Conversely, the lower the similarity, the lower the likelihood of the two being consistent, that is, the lower the accuracy of the recognition result. By setting a similarity threshold, the target text boxes with similarity greater than or equal to the similarity threshold are regarded as normal text boxes, and the target text boxes with similarity less than the similarity threshold are regarded as abnormal text boxes.
步骤S4,将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果。Step S4, the abnormal text box is sent to a preset terminal, and the second target text information of the abnormal text box fed back by the preset terminal is received, and the second target text information of the abnormal text box is updated based on the second target text information of the abnormal text box. According to the first target recognition result, the second target recognition result is generated.
确定第一目标识别结果中的异常文本框后,需要对异常文本框进行处理。在本实施例中,上述预设终端为众包人员使用的终端。将异常文本框发送至众包人员,人为识别出异常文本框对应的第二目标文本信息,并将异常文本框对应的第二目标文本信息返回给电子设备。电子设备基于接收到的异常文本框对应的第二目标文本信息对第一目标识别结果中该异常文本框对应的第一目标文本信息进行更新,得到第二目标识别结果。After determining the abnormal text box in the first target recognition result, the abnormal text box needs to be processed. In this embodiment, the aforementioned preset terminal is a terminal used by crowdsourced personnel. Send the abnormal text box to the crowdsourcing personnel, artificially identify the second target text information corresponding to the abnormal text box, and return the second target text information corresponding to the abnormal text box to the electronic device. The electronic device updates the first target text information corresponding to the abnormal text box in the first target recognition result based on the received second target text information corresponding to the abnormal text box to obtain the second target recognition result.
步骤S5,将所述第二目标识别结果反馈至所述用户。Step S5, feeding back the second target recognition result to the user.
确定待识别图像的第二目标识别结果后,将第二目标识别结果通过客户端展示给用户。After determining the second target recognition result of the image to be recognized, the second target recognition result is displayed to the user through the client.
在其他实施例中,所述文本识别方法包括:步骤S1-步骤S3及步骤S6。In other embodiments, the text recognition method includes: steps S1-step S3 and step S6.
步骤S6,当所述第一识别结果中不存在异常文本框时,将所述第一识别结果反馈至所述用户。Step S6: When there is no abnormal text box in the first recognition result, the first recognition result is fed back to the user.
当判断第一目标识别结果中不存在异常文本框时,直接将第一目标识别结果作为最终识别结果,并通过客户端向用户展示得到的最终识别结果。When it is determined that there is no abnormal text box in the first target recognition result, the first target recognition result is directly used as the final recognition result, and the obtained final recognition result is displayed to the user through the client.
在其他实施例中,所述基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,包括:In other embodiments, the performing text recognition on the image to be recognized based on preset recognition rules to obtain the first target recognition result of the image to be recognized includes:
d1、识别所述待识别图像的字段区域,确定待识别图像的多个第一文本框;d1. Identify the field area of the image to be recognized, and determine a plurality of first text boxes of the image to be recognized;
d2、对每个所述第一文本框进行多次随机透视变换,得到每个所述第一文本框对应的多个第二文本框;d2. Perform multiple random perspective transformations on each of the first text boxes to obtain multiple second text boxes corresponding to each of the first text boxes;
d3、将每个所述第一文本框对应的多个第二文本框输入预设识别模型中,得到每个所述第一文本框对应的多个第二文本框的第一识别结果;d3. Input the multiple second text boxes corresponding to each of the first text boxes into a preset recognition model to obtain first recognition results of the multiple second text boxes corresponding to each of the first text boxes;
d4、基于每个所述第一文本框对应的多个第二文本框的第一识别结果从每个所述第一文本框对应的多个第二文本框中筛选出所述第一文本框对应的目标文本框;及d4. Filter out the first text box from the plurality of second text boxes corresponding to each of the first text boxes based on the first recognition results of the plurality of second text boxes corresponding to each of the first text boxes The corresponding target text box; and
d5、根据每个所述第一文本框对应的所述目标文本框的第一识别结果确定所述待识别图像的第一目标识别结果。d5. Determine the first target recognition result of the image to be recognized according to the first recognition result of the target text box corresponding to each of the first text boxes.
首先检测所述待识别图像中文本字段位置,并确定包含所述文本字段位置的外接矩形框,即,第一文本框。Firstly, detect the position of the text field in the image to be recognized, and determine the circumscribed rectangular box containing the position of the text field, that is, the first text box.
然后对每个第一文本框进行多次随机透视变换,得到对应的多个第二文本框。例如,对每个第一文本框进行5次随机透视变换,得到一个第一文本框对应的5个第二文本框。第二文本框中包括第一文本框。Then, multiple random perspective transformations are performed on each first text box to obtain multiple corresponding second text boxes. For example, perform 5 random perspective transformations on each first text box to obtain 5 second text boxes corresponding to one first text box. The second text box includes the first text box.
接着利用OCR识别模型识别出5个第二文本框对应的的第一文本信息及第一置信度。并筛选出第一置信度最高的第二文本框作为第一文本框对应的目标文本框。Then, the OCR recognition model is used to identify the first text information and the first confidence level corresponding to the five second text boxes. And filter out the second text box with the highest first confidence as the target text box corresponding to the first text box.
最后根据目标文本框的第一文本信息确定第一文本框的第一目标文本信息,汇总每个第一文本框的第一目标文本信息得到第一目标识别结果。Finally, the first target text information of the first text box is determined according to the first text information of the target text box, and the first target text information of each first text box is summarized to obtain the first target recognition result.
可以理解的是,所述待识别图像可能是用户即时采集的,在用户采用摄像头采集待识别图像过程中,可能出现由于摄像头自身的特性导致图片出现畸变的情况。因此,为了进一步提高识别的准确性,在其他实施例中,在所述步骤S2之前,该方法还包括:It is understandable that the image to be recognized may be captured by the user in real time, and when the user uses the camera to capture the image to be recognized, the image may be distorted due to the characteristics of the camera itself. Therefore, in order to further improve the accuracy of recognition, in other embodiments, before the step S2, the method further includes:
基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
在本实施例中,所述基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像,包括:In this embodiment, the performing distortion correction on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction includes:
e1、获取所述待识别图像的像素角点,计算所述像素角点在无畸变图像上的坐标;e1. Obtain the pixel corners of the image to be recognized, and calculate the coordinates of the pixel corners on the undistorted image;
e2、根据所述像素角点在所述无畸变图像上的坐标计算透视变换矩阵;及e2, calculating a perspective transformation matrix according to the coordinates of the pixel corners on the undistorted image; and
e3、根据所述透视变换矩阵对所述待识别图像进行畸变校正,生成所述畸变校正后的待识别图像。e3. Perform distortion correction on the image to be recognized according to the perspective transformation matrix, and generate a distortion-corrected image to be recognized.
在本实施例中,通过对原始存在畸变的待识别图像上的像素角点进行畸变矫正,获取各个像素角点在无畸变图像上的坐标,其中,像素角点可以是存在畸变的待识别图像的顶点,如果待识别图像为四边形,则是四边形的四个顶点。由于在计算透视变换矩阵时,至少需要四个像素点的对应坐标才能求解,因而,在获取存在畸变的待识别图像上的像素角点时至少需要获取四个像素角点的坐标。以二维码图像为例,可以先从原始的畸变图像中获取图像中的二维码区域的四个像素角点的坐标,即二维码的四个顶点的坐标,然后根据以下公式采用事先标定好的畸变参数求出四个角点在无畸变图像上的坐标:[x,y]=K[u,v],其中,[x,y]为原始畸变图像上的像素角点坐标,[u,v]为无畸变图像上的像素角点坐标,K为畸变参数。In this embodiment, the coordinates of each pixel corner on the undistorted image are obtained by performing distortion correction on the pixel corners on the original image to be recognized with distortion, where the pixel corners can be the distorted image to be recognized If the image to be recognized is a quadrilateral, it is the four vertices of the quadrilateral. When calculating the perspective transformation matrix, at least the corresponding coordinates of four pixel points are needed to solve the solution. Therefore, when obtaining the pixel corner points on the image to be recognized with distortion, at least the coordinates of the four pixel corner points need to be obtained. Take the two-dimensional code image as an example, you can first obtain the coordinates of the four pixel corners of the two-dimensional code area in the image from the original distorted image, that is, the coordinates of the four vertices of the two-dimensional code, and then use the previous The calibrated distortion parameters calculate the coordinates of the four corner points on the undistorted image: [x, y] = K[u, v], where [x, y] are the pixel corner coordinates on the original distorted image, [u, v] is the pixel corner coordinates on the undistorted image, and K is the distortion parameter.
求解出透视变换矩阵后,即可对待识别图像进行畸变校正,得到经过畸变校正后的待识别图像,然后执行后续的识别、验证、更新及反馈操作。After the perspective transformation matrix is solved, the image to be recognized can be subjected to distortion correction to obtain the image to be recognized after the distortion correction, and then the subsequent recognition, verification, update and feedback operations are performed.
由于通过畸变矫正来计算像素角点在无畸变图像上的坐标并不是一一映射的,所以可能针对原始畸变图像上的像素角点计算得到的在无畸变图像上坐标并不是唯一的,为了找到像素角点在无畸变图像上的较优的坐标。Because the coordinates of pixel corners on the undistorted image calculated by distortion correction are not one-to-one mapping, it is possible that the coordinates on the undistorted image calculated for the pixel corners on the original distorted image are not unique, in order to find The optimal coordinates of the pixel corners on the undistorted image.
在其他实施例中,所述计算所述像素角点在无畸变图像上的坐标,包括:In other embodiments, the calculating the coordinates of the pixel corners on the undistorted image includes:
f1、在所述无畸变图像上确定一个目标像素点,目标像素点的坐标与所述待识别图像上的像素角点的坐标相同;f1. Determine a target pixel on the undistorted image, and the coordinates of the target pixel are the same as the coordinates of the pixel corners on the image to be recognized;
f2、确定以所述目标像素点为圆心,预设邻域半径为半径的圆形区域内的像素点,作为邻域像素点;f2. Determine a pixel in a circular area with the target pixel as the center and a preset neighborhood radius as the radius as the neighborhood pixel;
f3、遍历所述无畸变图像上目标像素点的各个邻域像素点,分别计算所述各个邻域像素点在所述待识别图像上的坐标;及f3. Traverse each neighborhood pixel of the target pixel on the undistorted image, and calculate the coordinates of each neighborhood pixel on the image to be recognized; and
f4、根据所述各个邻域像素点在所述待识别图像上的坐标确定所述像素角点在所述无畸变图像上的坐标。f4. Determine the coordinates of the pixel corners on the undistorted image according to the coordinates of the respective neighborhood pixel points on the image to be recognized.
例如,可以分别根据各个邻域像素点在原始畸变的待识别图像上的坐标计算各个邻域像素点与像素角点的距离,然后将最短距离对应的坐标确定为所述像素角点在无畸变图像上的坐标。在确定原始畸变的待识别图像上各个像素角点在无畸变图像中的坐标时,可以根据原始畸变的待识别图像的畸变程度去灵活地设置邻域半径,当畸变程度较小时,邻域半径可以设置得小一些,这样需要遍历的邻域像素点少一些,可以减少计算量,当畸变程度较大时,可以将邻域半径设置得大一些,这样便可以找到最优的像素点。For example, the distance between each neighborhood pixel and the pixel corner can be calculated according to the coordinates of each neighborhood pixel on the original distorted image to be recognized, and then the coordinates corresponding to the shortest distance can be determined as the pixel corner is undistorted The coordinates on the image. When determining the coordinates of each pixel corner on the original distorted image to be recognized in the undistorted image, the neighborhood radius can be flexibly set according to the degree of distortion of the original distorted image to be recognized. When the degree of distortion is small, the neighborhood radius It can be set smaller, so that there are fewer pixels in the neighborhood that need to be traversed, which can reduce the amount of calculation. When the degree of distortion is large, the radius of the neighborhood can be set larger, so that the optimal pixel can be found.
上述实施例提出的文本识别方法,在接收到用户发出的携带待识别图像的指令后,对待识别图像进行文本识别得到第一目标识别结果,根据第一目标识别结果生成待验证图片,计算待识别图片及对应目标文本框的相似度,根据相似度识别第一目标识别结果中的异常文本框并进行异常处理,基于异常处理结果更新第一目标识别结果得到第二目标识别结果,将第二目标识别结果反馈给用户。通过在通用OCR识别过程后新增一个验证机制,提高识别结果输出的准确性,提高用户的使用体验;通过对待识别图片进行随机透视变换,从多次透视变换结果对应的识别结果中选择准确率最高的文本信息作为目标文本框的第一目标文本信息,提高了文本识别的准确性;在对待识别图像进行识别前还对待识别图像进行畸变校正,为准确识别文本奠定基础。In the text recognition method proposed in the above embodiment, after receiving an instruction from a user to carry an image to be recognized, text recognition is performed on the image to be recognized to obtain a first target recognition result, the image to be verified is generated according to the first target recognition result, and the to-be-recognized image is calculated The similarity between the picture and the corresponding target text box, the abnormal text box in the first target recognition result is identified according to the similarity and the exception processing is performed, and the first target recognition result is updated based on the abnormal processing result to obtain the second target recognition result, and the second target The recognition result is fed back to the user. By adding a verification mechanism after the general OCR recognition process, the accuracy of the output of the recognition results is improved, and the user experience is improved; by random perspective transformation of the image to be recognized, the accuracy rate is selected from the recognition results corresponding to the results of multiple perspective transformations The highest text information is used as the first target text information of the target text box, which improves the accuracy of text recognition; before the image to be recognized is recognized, the image to be recognized is also subjected to distortion correction, which lays the foundation for accurate text recognition.
本申请还提出一种电子设备。参照图2所示,为本申请电子设备较佳实施例的示意图。This application also proposes an electronic device. Refer to FIG. 2, which is a schematic diagram of a preferred embodiment of the electronic device of this application.
在本实施例中,电子设备1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有数据处理功能的终端设备,所述服务器可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器。In this embodiment, the electronic device 1 may be a terminal device with data processing functions such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, etc. The server may be a rack server, a blade server, or a tower. Server or rack server.
该电子设备1包括存储器11、处理器12及网络接口13。The electronic device 1 includes a memory 11, a processor 12 and a network interface 13.
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是所述电子设备1的内部存储单元,例如该电子设备1的硬盘。存储器11在另一些实施例中也可以是所述电子设备1的外部存储设备,例如该电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括该电子设备1的内部存储单元也包括外部存储设备。The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
存储器11不仅可以用于存储安装于该电子设备1的应用软件及各类数据,例如,文本识别程序10等,还可以用于暂时地存储已经输出或者将要输出的数据。The memory 11 can be used not only to store application software and various data installed in the electronic device 1, for example, a text recognition program 10, etc., but also to temporarily store data that has been output or will be output.
处理器12在一些实施例中可以是一中央处理器(Central
Processing Unit, CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如,文本识别程序10等。The processor 12 may be a central processing unit (Central Processing Unit) in some embodiments.
Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips are used to run program codes or processing data stored in the memory 11, for example, the text recognition program 10, etc.
网络接口13可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子设备1与其他电子设备之间建立通信连接,例如,客户端(图中未标识)。电子设备1的组件11-13通过通信总线相互通信。The network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), which is usually used to establish a communication connection between the electronic device 1 and other electronic devices, for example, a client (not shown in the figure). ). The components 11-13 of the electronic device 1 communicate with each other via a communication bus.
图2仅示出了具有组件11-13的电子设备1,本领域技术人员可以理解的是,图2示出的结构并不构成对电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 2 only shows the electronic device 1 with components 11-13. Those skilled in the art can understand that the structure shown in FIG. 2 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Multiple components, or a combination of certain components, or different component arrangements.
可选地,该电子设备1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may further include a user interface. The user interface may include a display (Display) and an input unit such as a keyboard (Keyboard). The optional user interface may also include a standard wired interface and a wireless interface.
可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。其中,显示器也可以称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc. Among them, the display may also be called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
在图2所示的电子设备1实施例中,作为一种计算机存储介质的存储器11中存储文本识别程序10的程序代码,处理器12执行文本识别程序10的程序代码时,实现如下步骤:In the embodiment of the electronic device 1 shown in FIG. 2, the memory 11 as a computer storage medium stores the program code of the text recognition program 10, and when the processor 12 executes the program code of the text recognition program 10, the following steps are implemented:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像。Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized.
用户通过客户端上的APP选择待识别图像,并基于选择的待识别图像发出文本识别指令。电子设备1接收到客户端发出的指令后,对指令中携带的待识别图像执行文本识别操作。The user selects the image to be recognized through the APP on the client, and sends a text recognition instruction based on the selected image to be recognized. After receiving the instruction sent by the client, the electronic device 1 performs a text recognition operation on the image to be recognized carried in the instruction.
识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息。Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box.
例如,利用预先训练好的OCR识别模型对待识别图像进行OCR识别,将模型输出的识别结果作为第一目标识别结果。For example, the pre-trained OCR recognition model is used to perform OCR recognition on the image to be recognized, and the recognition result output by the model is used as the first target recognition result.
分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框。Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is identified in the first target recognition result.
为了提高文本识别的准确性,在对待识别图片进行OCR识别得到第一目标识别结果后,对第一目标识别结果进行准确性验证,因此,需对第一目标识别结果进行异常文本框识别。在本实施例中,所述基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,包括:In order to improve the accuracy of text recognition, after the first target recognition result is obtained by performing OCR recognition on the picture to be recognized, the accuracy of the first target recognition result is verified. Therefore, it is necessary to perform abnormal text box recognition on the first target recognition result. In this embodiment, the generating the to-be-verified pictures corresponding to the multiple target text boxes based on the first target recognition result includes:
a1、读取所述第一识别结果中一个所述目标文本框的长宽信息,新建一张与所述目标文本框的长宽信息一致的背景图片;及a1. Read the length and width information of one of the target text boxes in the first recognition result, and create a new background picture consistent with the length and width information of the target text box; and
a2、获取所述第一识别结果中所述目标文本框对应的第一目标文本信息,将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,生成所述目标文本框对应的待验证图像。a2. Obtain the first target text information corresponding to the target text box in the first recognition result, and place the first target text information corresponding to the target text box in the background picture in a preset format to generate all The image to be verified corresponding to the target text box.
例如,以一个目标文本框P为例,读取目标文本框P的长宽信息,以该长宽信息确定一张随机颜色(浅色最佳,例如,白色)的纯色背景图片P1,然后获取目标文本框P对应的第一目标文本信息PT,将第一目标文本信息PT进行格式转换生成宋体的第一目标文本信息PT1,将宋体的第一目标文本信息PT1居中放置在纯色背景图片P1上,得到目标文本框P对应的一张白底黑字的待验证图片P2。For example, take a target text box P as an example, read the length and width information of the target text box P, use the length and width information to determine a pure color background picture P1 with a random color (light color is best, for example, white), and then obtain The first target text information PT corresponding to the target text box P, the first target text information PT is formatted to generate the first target text information PT1 of Song Ti, and the first target text information PT1 of Song Ti is centered on the pure color background picture P1 , A picture P2 to be verified with black characters on a white background corresponding to the target text box P is obtained.
在其他实施例中,所述将所述目标文本框对应的第一目标文本信息以预设格式放置在所述纯色背景图片中,还包括:In other embodiments, the placing the first target text information corresponding to the target text box in the solid-color background picture in a preset format further includes:
b1、对所述目标文本框对应的第一目标文本信息进行随机格式调整;及b1. Perform random format adjustment on the first target text information corresponding to the target text box; and
b2、将随机格式调整后的所述第一目标文本信息放置在所述背景图片中。b2. Place the first target text information adjusted in a random format in the background picture.
例如,修改字体颜色、字体大小、字体、角度等。通过以上随机调整,增加生成的待验证图片的随机性,为后续准确验证奠定基础。For example, modify the font color, font size, font, angle, etc. Through the above random adjustments, the randomness of the generated images to be verified is increased, laying the foundation for subsequent accurate verification.
在本实施例中,基于每一个目标文本框生成对应的待验证图片后,对比分析每一个目标文本框及其对应的待验证图片,通过对目标文本框及其对应的待验证图片进行一致性分析,即可确定第一目标识别结果中的异常文本框。In this embodiment, after the corresponding picture to be verified is generated based on each target text box, each target text box and its corresponding picture to be verified are compared and analyzed, and the consistency of the target text box and its corresponding picture to be verified is carried out. Analysis can determine the abnormal text box in the first target recognition result.
在本实施例中,所述预设分析模型为卷积神经网络,优选地,所述预设分析模型为resnet50。所述预设分析模型用于从所述目标文本框及其对应的待验证图片中提取特征。预先训练一个用于特征提取的卷积神经网络,利用训练好的神经网络提取文本框及待验证图片的特征,通过计算文本框及待验证图片的相似度,以判断两张图片的内容是否一致,对判断为不一致的文本框进行异常处理。In this embodiment, the preset analysis model is a convolutional neural network, and preferably, the preset analysis model is resnet50. The preset analysis model is used to extract features from the target text box and its corresponding image to be verified. Pre-train a convolutional neural network for feature extraction, use the trained neural network to extract the features of the text box and the image to be verified, and calculate the similarity between the text box and the image to be verified to determine whether the contents of the two images are the same , And perform exception handling on the text boxes judged to be inconsistent.
所述分析模型包括:batch输入层、特征提取层、L2归一化层及损失函数。所述损失函数包括但不仅限于 Softmax loss、Center
loss或者Triplet loss中的任意一种。损失函数不同,对训练数据的要求也不一样。The analysis model includes a batch input layer, a feature extraction layer, an L2 normalization layer, and a loss function. The loss function includes but not limited to Softmax loss, Center
Either loss or Triplet loss. Different loss functions have different requirements for training data.
以三元组损失函数为例,本实施例中为了适应三元组损失函数,需要在batch中选取锚点样本、正样本、负样本,在OCR业务中锚点样本指原图的字段截取图,正样本指按照字段内容生成的图,负样本指更换字段内容生产的图。其中,负样本的选取,每个替换的字段可根据汉字表顺序进行替换。Take the triplet loss function as an example. In this embodiment, in order to adapt to the triplet loss function, it is necessary to select anchor point samples, positive samples, and negative samples in the batch. In the OCR business, the anchor point sample refers to the field interception map of the original image. , The positive sample refers to the graph generated according to the field content, and the negative sample refers to the graph produced by replacing the field content. Among them, for the selection of negative samples, each replaced field can be replaced according to the order of the Chinese character table.
在其他实施例中,为适应中心损失函数,需要在每一类样本,选取n张按照不同大小、角度、颜色来生成的图片,和n张复制的字段裁剪图。在生成样本图片过程中,设置字体大小区间,角度区间,颜色区间,在区间中随机取值获取变换参数生成图片。In other embodiments, in order to adapt to the central loss function, it is necessary to select n pictures generated according to different sizes, angles, and colors in each type of sample, and n copied field cropping pictures. In the process of generating sample pictures, set the font size interval, angle interval, and color interval, and randomly select values in the interval to obtain transformation parameters to generate the picture.
训练时通过ROC曲线,计算出使模型准确率最大的阈值。通过训练,希望达到效果是:在特征空间里面,同一内容的图片特征距离变得越来越近,与其他非同内容的图片特征距离变得越来越远。During training, the ROC curve is used to calculate the threshold that maximizes the accuracy of the model. Through training, the hope to achieve the effect is: in the feature space, the distance between the image features of the same content is getting closer, and the distance between the image features of other different content is getting farther and farther.
在本实施例中,所述将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框,包括:In this embodiment, the inputting the image to be verified and the target text box into a preset analysis model, and identifying an abnormal text box from the first target recognition result according to the model output result includes:
c1、根据所述模型输出结果分别确定所述目标文本框及所述待验证图片的特征向量;c1. Determine the feature vectors of the target text box and the picture to be verified respectively according to the output result of the model;
c2、利用预设相似度算法计算所述目标文本框与所述待验证图片的特征向量间的相似度;及c2. Calculate the similarity between the target text box and the feature vector of the picture to be verified by using a preset similarity algorithm; and
c3、当所述相似度小于预设相似度阈值时,判断所述目标文本框为异常文本框。c3. When the similarity is less than a preset similarity threshold, determine that the target text box is an abnormal text box.
例如,上述预设相似度算法包括但不仅限于欧式距离算法、余弦相似度算法中的任意一种。For example, the aforementioned preset similarity algorithm includes but is not limited to any one of Euclidean distance algorithm and cosine similarity algorithm.
可以理解的是,从目标文本框中提取的特征更能体现待识别图像中该区域原有的特征,从其对应的待验证图片中提取的特征更能体现第一目标文本信息的特征,通过计算两者之间的相似度,可判断两者之间的一致性。相似度越高,两者一致的可能性越高,即识别结果的准确性越高,相反,相似度越低,两者一致的可能性越低,即识别结果的准确性越低。通过设置一个相似度阈值,将相似度大于或等于相似度阈值的目标文本框作为正常的文本框,将相似度小于相似度阈值的目标文本框作为异常文本框。It is understandable that the features extracted from the target text box can better reflect the original features of the region in the image to be recognized, and the features extracted from the corresponding image to be verified can better reflect the features of the first target text information. Calculate the similarity between the two to judge the consistency between the two. The higher the similarity, the higher the likelihood of the two being consistent, that is, the higher the accuracy of the recognition result. Conversely, the lower the similarity, the lower the likelihood of the two being consistent, that is, the lower the accuracy of the recognition result. By setting a similarity threshold, the target text boxes with similarity greater than or equal to the similarity threshold are regarded as normal text boxes, and the target text boxes with similarity less than the similarity threshold are regarded as abnormal text boxes.
更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;
确定第一目标识别结果中的异常文本框后,需要对异常文本框进行处理。在本实施例中,上述预设终端为众包人员使用的终端。将异常文本框发送至众包人员,人为识别出异常文本框对应的第二目标文本信息,并将异常文本框对应的第二目标文本信息返回给电子设备1。电子设备1基于接收到的异常文本框对应的第二目标文本信息对第一目标识别结果中该异常文本框对应的第一目标文本信息进行更新,得到第二目标识别结果。After determining the abnormal text box in the first target recognition result, the abnormal text box needs to be processed. In this embodiment, the aforementioned preset terminal is a terminal used by crowdsourced personnel. The abnormal text box is sent to the crowdsourcing personnel, the second target text information corresponding to the abnormal text box is manually recognized, and the second target text information corresponding to the abnormal text box is returned to the electronic device 1. The electronic device 1 updates the first target text information corresponding to the abnormal text box in the first target recognition result based on the received second target text information corresponding to the abnormal text box to obtain the second target recognition result.
反馈步骤:将所述第二目标识别结果反馈至所述用户。Feedback step: feedback the second target recognition result to the user.
确定待识别图像的第二目标识别结果后,将第二目标识别结果通过客户端展示给用户。After determining the second target recognition result of the image to be recognized, the second target recognition result is displayed to the user through the client.
在其他实施例中,当所述第一识别结果中不存在异常文本框时,将所述第一识别结果反馈至所述用户。当判断第一目标识别结果中不存在异常文本框时,直接将第一目标识别结果作为最终识别结果,并通过客户端向用户展示得到的最终识别结果。In other embodiments, when there is no abnormal text box in the first recognition result, the first recognition result is fed back to the user. When it is determined that there is no abnormal text box in the first target recognition result, the first target recognition result is directly used as the final recognition result, and the obtained final recognition result is displayed to the user through the client.
在其他实施例中,所述基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,包括:In other embodiments, the performing text recognition on the image to be recognized based on preset recognition rules to obtain the first target recognition result of the image to be recognized includes:
d1、识别所述待识别图像的字段区域,确定待识别图像的多个第一文本框;d1. Identify the field area of the image to be recognized, and determine a plurality of first text boxes of the image to be recognized;
d2、对每个所述第一文本框进行多次随机透视变换,得到每个所述第一文本框对应的多个第二文本框;d2. Perform multiple random perspective transformations on each of the first text boxes to obtain multiple second text boxes corresponding to each of the first text boxes;
d3、将每个所述第一文本框对应的多个第二文本框输入预设识别模型中,得到每个所述第一文本框对应的多个第二文本框的第一识别结果;d3. Input the multiple second text boxes corresponding to each of the first text boxes into a preset recognition model to obtain first recognition results of the multiple second text boxes corresponding to each of the first text boxes;
d4、基于每个所述第一文本框对应的多个第二文本框的第一识别结果从每个所述第一文本框对应的多个第二文本框中筛选出所述第一文本框对应的目标文本框;及d4. Filter out the first text box from the plurality of second text boxes corresponding to each of the first text boxes based on the first recognition results of the plurality of second text boxes corresponding to each of the first text boxes The corresponding target text box; and
d5、根据每个所述第一文本框对应的所述目标文本框的第一识别结果确定所述待识别图像的第一目标识别结果。d5. Determine the first target recognition result of the image to be recognized according to the first recognition result of the target text box corresponding to each of the first text boxes.
首先检测所述待识别图像中文本字段位置,并确定包含所述文本字段位置的外接矩形框,即,第一文本框。Firstly, detect the position of the text field in the image to be recognized, and determine the circumscribed rectangular box containing the position of the text field, that is, the first text box.
然后对每个第一文本框进行多次随机透视变换,得到对应的多个第二文本框。例如,对每个第一文本框进行5次随机透视变换,得到一个第一文本框对应的5个第二文本框。第二文本框中包括第一文本框。Then, multiple random perspective transformations are performed on each first text box to obtain multiple corresponding second text boxes. For example, perform 5 random perspective transformations on each first text box to obtain 5 second text boxes corresponding to one first text box. The second text box includes the first text box.
接着利用OCR识别模型识别出5个第二文本框对应的的第一文本信息及第一置信度。并筛选出第一置信度最高的第二文本框作为第一文本框对应的目标文本框。Then, the OCR recognition model is used to identify the first text information and the first confidence level corresponding to the five second text boxes. And filter out the second text box with the highest first confidence as the target text box corresponding to the first text box.
最后根据目标文本框的第一文本信息确定第一文本框的第一目标文本信息,汇总每个第一文本框的第一目标文本信息得到第一目标识别结果。Finally, the first target text information of the first text box is determined according to the first text information of the target text box, and the first target text information of each first text box is summarized to obtain the first target recognition result.
可以理解的是,所述待识别图像可能是用户即时采集的,在用户采用摄像头采集待识别图像过程中,可能出现由于摄像头自身的特性导致图片出现畸变的情况。因此,为了进一步提高识别的准确性,在其他实施例中,所述处理器12执行所述文本识别程序10时,在所述识别步骤之前,还实现以下步骤:It is understandable that the image to be recognized may be captured by the user in real time, and when the user uses the camera to capture the image to be recognized, the image may be distorted due to the characteristics of the camera itself. Therefore, in order to further improve the accuracy of recognition, in other embodiments, when the processor 12 executes the text recognition program 10, before the recognition step, the following steps are further implemented:
基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
在本实施例中,所述基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像,包括:In this embodiment, the performing distortion correction on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction includes:
e1、获取所述待识别图像的像素角点,计算所述像素角点在无畸变图像上的坐标;e1. Obtain the pixel corners of the image to be recognized, and calculate the coordinates of the pixel corners on the undistorted image;
e2、根据所述像素角点在所述无畸变图像上的坐标计算透视变换矩阵;及e2, calculating a perspective transformation matrix according to the coordinates of the pixel corners on the undistorted image; and
e3、根据所述透视变换矩阵对所述待识别图像进行畸变校正,生成所述畸变校正后的待识别图像。e3. Perform distortion correction on the image to be recognized according to the perspective transformation matrix, and generate a distortion-corrected image to be recognized.
在本实施例中,通过对原始存在畸变的待识别图像上的像素角点进行畸变矫正,获取各个像素角点在无畸变图像上的坐标,其中,像素角点可以是存在畸变的待识别图像的顶点,如果待识别图像为四边形,则是四边形的四个顶点。由于在计算透视变换矩阵时,至少需要四个像素点的对应坐标才能求解,因而,在获取存在畸变的待识别图像上的像素角点时至少需要获取四个像素角点的坐标。以二维码图像为例,可以先从原始的畸变图像中获取图像中的二维码区域的四个像素角点的坐标,即二维码的四个顶点的坐标,然后根据以下公式采用事先标定好的畸变参数求出四个角点在无畸变图像上的坐标:[x,y]=K[u,v],其中,[x,y]为原始畸变图像上的像素角点坐标,[u,v]为无畸变图像上的像素角点坐标,K为畸变参数。In this embodiment, the coordinates of each pixel corner on the undistorted image are obtained by performing distortion correction on the pixel corners on the original image to be recognized with distortion, where the pixel corners can be the distorted image to be recognized If the image to be recognized is a quadrilateral, it is the four vertices of the quadrilateral. When calculating the perspective transformation matrix, at least the corresponding coordinates of four pixel points are needed to solve the solution. Therefore, when obtaining the pixel corner points on the image to be recognized with distortion, at least the coordinates of the four pixel corner points need to be obtained. Take the two-dimensional code image as an example, you can first obtain the coordinates of the four pixel corners of the two-dimensional code area in the image from the original distorted image, that is, the coordinates of the four vertices of the two-dimensional code, and then use the previous The calibrated distortion parameters calculate the coordinates of the four corner points on the undistorted image: [x, y] = K[u, v], where [x, y] are the pixel corner coordinates on the original distorted image, [u, v] is the pixel corner coordinates on the undistorted image, and K is the distortion parameter.
求解出透视变换矩阵后,即可对待识别图像进行畸变校正,得到经过畸变校正后的待识别图像,然后执行后续的识别、验证、更新及反馈操作。After the perspective transformation matrix is solved, the image to be recognized can be subjected to distortion correction to obtain the image to be recognized after the distortion correction, and then the subsequent recognition, verification, update and feedback operations are performed.
由于通过畸变矫正来计算像素角点在无畸变图像上的坐标并不是一一映射的,所以可能针对原始畸变图像上的像素角点计算得到的在无畸变图像上坐标并不是唯一的,为了找到像素角点在无畸变图像上的较优的坐标。Because the coordinates of pixel corners on the undistorted image calculated by distortion correction are not one-to-one mapping, it is possible that the coordinates on the undistorted image calculated for the pixel corners on the original distorted image are not unique, in order to find The optimal coordinates of the pixel corners on the undistorted image.
在其他实施例中,所述计算所述像素角点在无畸变图像上的坐标,包括:In other embodiments, the calculating the coordinates of the pixel corners on the undistorted image includes:
f1、在所述无畸变图像上确定一个目标像素点,目标像素点的坐标与所述待识别图像上的像素角点的坐标相同;f1. Determine a target pixel on the undistorted image, and the coordinates of the target pixel are the same as the coordinates of the pixel corners on the image to be recognized;
f2、确定以所述目标像素点为圆心,预设邻域半径为半径的圆形区域内的像素点,作为邻域像素点;f2. Determine a pixel in a circular area with the target pixel as the center and a preset neighborhood radius as the radius as the neighborhood pixel;
f3、遍历所述无畸变图像上目标像素点的各个邻域像素点,分别计算所述各个邻域像素点在所述待识别图像上的坐标;及f3. Traverse each neighborhood pixel of the target pixel on the undistorted image, and calculate the coordinates of each neighborhood pixel on the image to be recognized; and
f4、根据所述各个邻域像素点在所述待识别图像上的坐标确定所述像素角点在所述无畸变图像上的坐标。f4. Determine the coordinates of the pixel corners on the undistorted image according to the coordinates of the respective neighborhood pixel points on the image to be recognized.
例如,可以分别根据各个邻域像素点在原始畸变的待识别图像上的坐标计算各个邻域像素点与像素角点的距离,然后将最短距离对应的坐标确定为所述像素角点在无畸变图像上的坐标。在确定原始畸变的待识别图像上各个像素角点在无畸变图像中的坐标时,可以根据原始畸变的待识别图像的畸变程度去灵活地设置邻域半径,当畸变程度较小时,邻域半径可以设置得小一些,这样需要遍历的邻域像素点少一些,可以减少计算量,当畸变程度较大时,可以将邻域半径设置得大一些,这样便可以找到最优的像素点。For example, the distance between each neighborhood pixel and the pixel corner can be calculated according to the coordinates of each neighborhood pixel on the original distorted image to be recognized, and then the coordinates corresponding to the shortest distance can be determined as the pixel corner is undistorted The coordinates on the image. When determining the coordinates of each pixel corner on the original distorted image to be recognized in the undistorted image, the neighborhood radius can be flexibly set according to the degree of distortion of the original distorted image to be recognized. When the degree of distortion is small, the neighborhood radius It can be set smaller, so that there are fewer pixels in the neighborhood that need to be traversed, which can reduce the amount of calculation. When the degree of distortion is large, the radius of the neighborhood can be set larger, so that the optimal pixel can be found.
本申请还提出一种文本识别装置。This application also proposes a text recognition device.
参照图3所示,为本申请文本识别装置较佳实施例的模块示意图。Referring to FIG. 3, it is a schematic diagram of modules of a preferred embodiment of the text recognition device of this application.
本实施例所述文本识别装置2根据实现的功能可以包括:模块210-模块250。所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The text recognition apparatus 2 in this embodiment may include: module 210-module 250 according to the realized functions. The module can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
在本申请文本识别装置2的一实施例中,关于各模块/单元的功能如下:In an embodiment of the text recognition device 2 of the present application, the functions of each module/unit are as follows:
接收模块210,用于接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;The receiving module 210 is configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
识别模块220,用于基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;The recognition module 220 is configured to perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized. The first target recognition result includes a plurality of target text boxes and the The first target text information corresponding to the multiple target text boxes;
分析模块230,用于基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;The analysis module 230 is configured to generate pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and output according to the model As a result, an abnormal text box is identified from the first target recognition result;
更新模块240,用于将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;The update module 240 is configured to send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, based on the second target text of the abnormal text box Information updates the first target recognition result, and generates a second target recognition result;
反馈模块250,用于将所述第二目标识别结果反馈至所述用户。The feedback module 250 is configured to feed back the second target recognition result to the user.
所述模块210-250所实现的功能或操作步骤均与上文类似,此处不再详述。The functions or operation steps implemented by the modules 210-250 are similar to the above, and will not be described in detail here.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质中包括文本识别程序10,所述文本识别程序10被处理器执行时实现所述文本识别方法的任意步骤。本申请计算机可读存储介质的具体实施方式与上述方法实施例大致相同,在此不再赘述。In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes a text recognition program 10, which implements any step of the text recognition method when the text recognition program 10 is executed by a processor. The specific implementation of the computer-readable storage medium of the present application is substantially the same as the foregoing method embodiment, and will not be repeated here.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of this application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.
Claims (20)
- 一种文本识别方法,适用于电子设备,其中,该方法包括:A text recognition method suitable for electronic equipment, wherein the method includes:接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is recognized in the first target recognition result;更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;第一反馈步骤:将所述第二目标识别结果反馈至所述用户。The first feedback step: feedback the second target recognition result to the user.
- 根据权利要求1所述的文本识别方法,其中,所述文本识别方法还包括:The text recognition method according to claim 1, wherein the text recognition method further comprises:第二反馈步骤:当所述第一识别结果中不存在异常文本框时,将所述第一识别结果反馈至所述用户。The second feedback step: when there is no abnormal text box in the first recognition result, the first recognition result is fed back to the user.
- 根据权利要求1所述的文本识别方法,其中,所述基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,包括:The text recognition method according to claim 1, wherein said generating the to-be-verified pictures corresponding to the multiple target text boxes based on the first target recognition result comprises:读取所述第一识别结果中一个所述目标文本框的长宽信息,新建一张与所述目标文本框的长宽信息一致的背景图片;及Read the length and width information of one of the target text boxes in the first recognition result, and create a new background picture consistent with the length and width information of the target text box; and获取所述第一识别结果中所述目标文本框对应的第一目标文本信息,将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,生成所述目标文本框对应的待验证图像。Acquire the first target text information corresponding to the target text box in the first recognition result, place the first target text information corresponding to the target text box in the background picture in a preset format, and generate the target The image to be verified corresponding to the text box.
- 根据权利要求3所述的文本识别方法,其中,所述将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,还包括:The text recognition method according to claim 3, wherein the placing the first target text information corresponding to the target text box in the background picture in a preset format further comprises:对所述目标文本框对应的第一目标文本信息进行随机格式调整;及Performing random format adjustment on the first target text information corresponding to the target text box; and将随机格式调整后的所述第一目标文本信息放置在所述背景图片中。The first target text information adjusted in a random format is placed in the background picture.
- 根据权利要求1至4中任意一项所述的文本识别方法,其中,所述将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框,包括:The text recognition method according to any one of claims 1 to 4, wherein said inputting said image to be verified and said target text box into a preset analysis model, and extracting from said first target according to the model output result An abnormal text box is identified in the recognition result, including:根据所述模型输出结果分别确定所述目标文本框及所述待验证图片的特征向量;Respectively determine the feature vectors of the target text box and the picture to be verified according to the output result of the model;利用预设相似度算法计算所述目标文本框与所述待验证图片的特征向量间的相似度;及Using a preset similarity algorithm to calculate the similarity between the target text box and the feature vector of the picture to be verified; and当所述相似度小于预设相似度阈值时,判断所述目标文本框为异常文本框。When the similarity is less than the preset similarity threshold, it is determined that the target text box is an abnormal text box.
- 根据权利要求1所述的文本识别方法,其中,所述预设相似度算法包括但不仅限于欧式距离算法、余弦相似度算法中的任意一种。The text recognition method according to claim 1, wherein the preset similarity algorithm includes but is not limited to any one of Euclidean distance algorithm and cosine similarity algorithm.
- 根据权利要求1所述的文本识别方法,其中,所述基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,包括:The text recognition method according to claim 1, wherein the performing text recognition on the image to be recognized based on preset recognition rules to obtain the first target recognition result of the image to be recognized comprises:识别所述待识别图像的字段区域,确定待识别图像的多个第一文本框;Identifying the field area of the image to be recognized, and determining a plurality of first text boxes of the image to be recognized;对每个所述第一文本框进行多次随机透视变换,得到每个所述第一文本框对应的多个第二文本框;Performing multiple random perspective transformations on each of the first text boxes to obtain multiple second text boxes corresponding to each of the first text boxes;将每个所述第一文本框对应的多个第二文本框输入预设识别模型中,得到每个所述第一文本框对应的多个第二文本框的第一识别结果;Inputting multiple second text boxes corresponding to each of the first text boxes into a preset recognition model to obtain first recognition results of the multiple second text boxes corresponding to each of the first text boxes;基于每个所述第一文本框对应的多个第二文本框的第一识别结果从每个所述第一文本框对应的多个第二文本框中筛选出所述第一文本框对应的目标文本框;及Based on the first recognition results of the plurality of second text boxes corresponding to each of the first text boxes, filter out the plurality of second text boxes corresponding to each of the first text boxes to filter out those corresponding to the first text box Target text box; and根据每个所述第一文本框对应的所述目标文本框的第一识别结果确定所述待识别图像的第一目标识别结果。The first target recognition result of the image to be recognized is determined according to the first recognition result of the target text box corresponding to each of the first text boxes.
- 根据权利要求1所述的文本识别方法,其中,在所述识别步骤之前,所述方法还包括:The text recognition method according to claim 1, wherein, before the recognition step, the method further comprises:基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
- 一种电子设备,其中,该装置包括存储器及处理器,所述存储器中存储有可在所述处理器上运行的文本识别程序,所述文本识别程序被所述处理器执行时可实现如下步骤:An electronic device, wherein the device includes a memory and a processor, the memory stores a text recognition program that can run on the processor, and when the text recognition program is executed by the processor, the following steps can be implemented :接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is recognized in the first target recognition result;更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;第一反馈步骤:将所述第二目标识别结果反馈至所述用户。The first feedback step: feedback the second target recognition result to the user.
- 根据权利要求9所述的电子设备,其中,所述文本识别程序被所述处理器执行时还实现如下步骤:The electronic device according to claim 9, wherein the following steps are further implemented when the text recognition program is executed by the processor:第二反馈步骤:当所述第一识别结果中不存在异常文本框时,将所述第一识别结果反馈至所述用户。The second feedback step: when there is no abnormal text box in the first recognition result, the first recognition result is fed back to the user.
- 根据权利要求9所述的电子设备,其中,所述基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,包括:9. The electronic device according to claim 9, wherein said generating the to-be-verified pictures corresponding to said multiple target text boxes based on said first target recognition result comprises:读取所述第一识别结果中一个所述目标文本框的长宽信息,新建一张与所述目标文本框的长宽信息一致的背景图片;及Read the length and width information of one of the target text boxes in the first recognition result, and create a new background picture consistent with the length and width information of the target text box; and获取所述第一识别结果中所述目标文本框对应的第一目标文本信息,将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,生成所述目标文本框对应的待验证图像。Acquire the first target text information corresponding to the target text box in the first recognition result, place the first target text information corresponding to the target text box in the background picture in a preset format, and generate the target The image to be verified corresponding to the text box.
- 根据权利要求11所述的电子设备,其中,所述将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,还包括:11. The electronic device according to claim 11, wherein the placing the first target text information corresponding to the target text box in the background picture in a preset format further comprises:对所述目标文本框对应的第一目标文本信息进行随机格式调整;及Performing random format adjustment on the first target text information corresponding to the target text box; and将随机格式调整后的所述第一目标文本信息放置在所述背景图片中。The first target text information adjusted in a random format is placed in the background picture.
- 根据权利要求9至12中任意一项所述的电子设备,其中,所述将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框,包括:The electronic device according to any one of claims 9 to 12, wherein the inputting the picture to be verified and the target text box into a preset analysis model, and identifying from the first target according to the model output result Unusual text boxes were identified in the results, including:根据所述模型输出结果分别确定所述目标文本框及所述待验证图片的特征向量;Respectively determine the feature vectors of the target text box and the picture to be verified according to the output result of the model;利用预设相似度算法计算所述目标文本框与所述待验证图片的特征向量间的相似度;及Using a preset similarity algorithm to calculate the similarity between the target text box and the feature vector of the picture to be verified; and当所述相似度小于预设相似度阈值时,判断所述目标文本框为异常文本框。When the similarity is less than the preset similarity threshold, it is determined that the target text box is an abnormal text box.
- 根据权利要求9所述的电子设备,其中,所述预设相似度算法包括但不仅限于欧式距离算法、余弦相似度算法中的任意一种。The electronic device according to claim 9, wherein the preset similarity algorithm includes but is not limited to any one of Euclidean distance algorithm and cosine similarity algorithm.
- 根据权利要求9所述的电子设备,其中,所述基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,包括:9. The electronic device according to claim 9, wherein the performing text recognition on the image to be recognized based on a preset recognition rule to obtain the first target recognition result of the image to be recognized comprises:识别所述待识别图像的字段区域,确定待识别图像的多个第一文本框;Identifying the field area of the image to be recognized, and determining a plurality of first text boxes of the image to be recognized;对每个所述第一文本框进行多次随机透视变换,得到每个所述第一文本框对应的多个第二文本框;Performing multiple random perspective transformations on each of the first text boxes to obtain multiple second text boxes corresponding to each of the first text boxes;将每个所述第一文本框对应的多个第二文本框输入预设识别模型中,得到每个所述第一文本框对应的多个第二文本框的第一识别结果;Inputting multiple second text boxes corresponding to each of the first text boxes into a preset recognition model to obtain first recognition results of the multiple second text boxes corresponding to each of the first text boxes;基于每个所述第一文本框对应的多个第二文本框的第一识别结果从每个所述第一文本框对应的多个第二文本框中筛选出所述第一文本框对应的目标文本框;及Based on the first recognition results of the plurality of second text boxes corresponding to each of the first text boxes, filter out the plurality of second text boxes corresponding to each of the first text boxes to filter out the first text box Target text box; and根据每个所述第一文本框对应的所述目标文本框的第一识别结果确定所述待识别图像的第一目标识别结果。The first target recognition result of the image to be recognized is determined according to the first recognition result of the target text box corresponding to each of the first text boxes.
- 根据权利要求9所述的电子设备,其中,在所述识别步骤之前,所述文本识别程序被所述处理器执行时还实现如下步骤:9. The electronic device according to claim 9, wherein, before the recognition step, the following steps are further implemented when the text recognition program is executed by the processor:基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
- 一种文本识别装置,其中,所述文本识别装置包括:A text recognition device, wherein the text recognition device includes:接收模块,用于接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;A receiving module, configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;识别模块,用于基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;The recognition module is configured to perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized. The first target recognition result includes multiple target text boxes and the multiple First target text information corresponding to each target text box;分析模块,用于基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;The analysis module is configured to generate pictures to be verified corresponding to the multiple target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and output the results according to the model Identifying an abnormal text box from the first target recognition result;更新模块,用于将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;An update module, configured to send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, based on the second target text information of the abnormal text box Update the first target recognition result to generate a second target recognition result;第一反馈模块,用于将所述第二目标识别结果反馈至所述用户。The first feedback module is configured to feed back the second target recognition result to the user.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质中包括文本识别程序,所述文本识别程序被处理器执行时,可实现如下步骤:A computer-readable storage medium, wherein the computer-readable storage medium includes a text recognition program, and when the text recognition program is executed by a processor, the following steps can be implemented:接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;Receiving step: receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;识别步骤:基于预设识别规则对所述待识别图像进行文本识别,得到所述待识别图像的第一目标识别结果,所述第一目标识别结果包括多个目标文本框及所述多个目标文本框对应的第一目标文本信息;Recognition step: perform text recognition on the image to be recognized based on preset recognition rules to obtain a first target recognition result of the image to be recognized, and the first target recognition result includes multiple target text boxes and the multiple targets The first target text information corresponding to the text box;分析步骤:基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,将所述待验证图片与所述目标文本框输入预设分析模型中,根据模型输出结果从所述第一目标识别结果中识别出异常文本框;Analysis step: generate pictures to be verified corresponding to the plurality of target text boxes based on the first target recognition result, input the pictures to be verified and the target text boxes into a preset analysis model, and obtain the results from all of them according to the model output results. An abnormal text box is recognized in the first target recognition result;更新步骤:将所述异常文本框发送至预设终端,并接收所述预设终端反馈的所述异常文本框的第二目标文本信息,基于所述异常文本框的第二目标文本信息更新所述第一目标识别结果,生成第二目标识别结果;Update step: send the abnormal text box to a preset terminal, and receive the second target text information of the abnormal text box fed back by the preset terminal, and update all information based on the second target text information of the abnormal text box. The first target recognition result is described, and the second target recognition result is generated;第一反馈步骤:将所述第二目标识别结果反馈至所述用户。The first feedback step: feedback the second target recognition result to the user.
- 根据权利要求18所述的计算机可读存储介质,其中,所述基于所述第一目标识别结果生成所述多个目标文本框对应的待验证图片,包括:18. The computer-readable storage medium according to claim 18, wherein said generating the to-be-verified pictures corresponding to the plurality of target text boxes based on the first target recognition result comprises:读取所述第一识别结果中一个所述目标文本框的长宽信息,新建一张与所述目标文本框的长宽信息一致的背景图片;及Read the length and width information of one of the target text boxes in the first recognition result, and create a new background picture consistent with the length and width information of the target text box; and获取所述第一识别结果中所述目标文本框对应的第一目标文本信息,将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,生成所述目标文本框对应的待验证图像。Acquire the first target text information corresponding to the target text box in the first recognition result, place the first target text information corresponding to the target text box in the background picture in a preset format, and generate the target The image to be verified corresponding to the text box.
- 根据权利要求19所述的计算机可读存储介质,其中,所述将所述目标文本框对应的第一目标文本信息以预设格式放置在所述背景图片中,还包括:18. The computer-readable storage medium according to claim 19, wherein the placing the first target text information corresponding to the target text box in the background picture in a preset format further comprises:对所述目标文本框对应的第一目标文本信息进行随机格式调整;及Performing random format adjustment on the first target text information corresponding to the target text box; and将随机格式调整后的所述第一目标文本信息放置在所述背景图片中。The first target text information adjusted in a random format is placed in the background picture.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010073495.6 | 2020-01-22 | ||
CN202010073495.6A CN111325104B (en) | 2020-01-22 | 2020-01-22 | Text recognition method, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021147221A1 true WO2021147221A1 (en) | 2021-07-29 |
Family
ID=71167058
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/093605 WO2021147221A1 (en) | 2020-01-22 | 2020-05-30 | Text recognition method and apparatus, and electronic device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111325104B (en) |
WO (1) | WO2021147221A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114419643A (en) * | 2021-12-20 | 2022-04-29 | 华南理工大学 | Method, system, equipment and storage medium for identifying table structure |
CN116092087A (en) * | 2023-04-10 | 2023-05-09 | 上海蜜度信息技术有限公司 | OCR (optical character recognition) method, system, storage medium and electronic equipment |
CN116310806A (en) * | 2023-02-28 | 2023-06-23 | 北京理工大学珠海学院 | Intelligent agriculture integrated management system and method based on image recognition |
CN116597462A (en) * | 2023-03-29 | 2023-08-15 | 天云融创数据科技(北京)有限公司 | Certificate identification method based on OCR |
CN116939292A (en) * | 2023-09-15 | 2023-10-24 | 天津市北海通信技术有限公司 | Video text content monitoring method and system in rail transit environment |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898612A (en) * | 2020-06-30 | 2020-11-06 | 北京来也网络科技有限公司 | OCR recognition method and device combining RPA and AI, equipment and medium |
CN111931771B (en) * | 2020-09-16 | 2021-01-01 | 深圳壹账通智能科技有限公司 | Bill content identification method, device, medium and electronic equipment |
CN112132762A (en) * | 2020-09-18 | 2020-12-25 | 北京搜狗科技发展有限公司 | Data processing method and device and recording equipment |
CN113326833B (en) * | 2021-08-04 | 2021-11-16 | 浩鲸云计算科技股份有限公司 | Character recognition improved training method based on center loss |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549881A (en) * | 2018-05-02 | 2018-09-18 | 杭州创匠信息科技有限公司 | The recognition methods of certificate word and device |
CN109919076A (en) * | 2019-03-04 | 2019-06-21 | 厦门商集网络科技有限责任公司 | The method and medium of confirmation OCR recognition result reliability based on deep learning |
US10460114B1 (en) * | 2014-08-26 | 2019-10-29 | Amazon Technologies, Inc. | Identifying visually similar text |
CN110503089A (en) * | 2019-07-03 | 2019-11-26 | 平安科技(深圳)有限公司 | OCR identification model training method, device and computer equipment based on crowdsourcing technology |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN110569830B (en) * | 2019-08-01 | 2023-08-22 | 平安科技(深圳)有限公司 | Multilingual text recognition method, device, computer equipment and storage medium |
CN110443773A (en) * | 2019-08-20 | 2019-11-12 | 江西博微新技术有限公司 | File and picture denoising method, server and storage medium based on seal identification |
CN110706221A (en) * | 2019-09-29 | 2020-01-17 | 武汉极意网络科技有限公司 | Verification method, verification device, storage medium and device for customizing pictures |
-
2020
- 2020-01-22 CN CN202010073495.6A patent/CN111325104B/en active Active
- 2020-05-30 WO PCT/CN2020/093605 patent/WO2021147221A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10460114B1 (en) * | 2014-08-26 | 2019-10-29 | Amazon Technologies, Inc. | Identifying visually similar text |
CN108549881A (en) * | 2018-05-02 | 2018-09-18 | 杭州创匠信息科技有限公司 | The recognition methods of certificate word and device |
CN109919076A (en) * | 2019-03-04 | 2019-06-21 | 厦门商集网络科技有限责任公司 | The method and medium of confirmation OCR recognition result reliability based on deep learning |
CN110503089A (en) * | 2019-07-03 | 2019-11-26 | 平安科技(深圳)有限公司 | OCR identification model training method, device and computer equipment based on crowdsourcing technology |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114419643A (en) * | 2021-12-20 | 2022-04-29 | 华南理工大学 | Method, system, equipment and storage medium for identifying table structure |
CN116310806A (en) * | 2023-02-28 | 2023-06-23 | 北京理工大学珠海学院 | Intelligent agriculture integrated management system and method based on image recognition |
CN116310806B (en) * | 2023-02-28 | 2023-08-29 | 北京理工大学珠海学院 | Intelligent agriculture integrated management system and method based on image recognition |
CN116597462A (en) * | 2023-03-29 | 2023-08-15 | 天云融创数据科技(北京)有限公司 | Certificate identification method based on OCR |
CN116092087A (en) * | 2023-04-10 | 2023-05-09 | 上海蜜度信息技术有限公司 | OCR (optical character recognition) method, system, storage medium and electronic equipment |
CN116092087B (en) * | 2023-04-10 | 2023-08-08 | 上海蜜度信息技术有限公司 | OCR (optical character recognition) method, system, storage medium and electronic equipment |
CN116939292A (en) * | 2023-09-15 | 2023-10-24 | 天津市北海通信技术有限公司 | Video text content monitoring method and system in rail transit environment |
CN116939292B (en) * | 2023-09-15 | 2023-11-24 | 天津市北海通信技术有限公司 | Video text content monitoring method and system in rail transit environment |
Also Published As
Publication number | Publication date |
---|---|
CN111325104A (en) | 2020-06-23 |
CN111325104B (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021147221A1 (en) | Text recognition method and apparatus, and electronic device and storage medium | |
CN111126125B (en) | Method, device, equipment and readable storage medium for extracting target text in certificate | |
WO2021147219A1 (en) | Image-based text recognition method and apparatus, electronic device, and storage medium | |
US20240346069A1 (en) | Recognizing text in image data | |
US10657600B2 (en) | Systems and methods for mobile image capture and processing | |
WO2019205391A1 (en) | Apparatus and method for generating vehicle damage classification model, and computer readable storage medium | |
WO2021012382A1 (en) | Method and apparatus for configuring chat robot, computer device and storage medium | |
WO2019071660A1 (en) | Bill information identification method, electronic device, and readable storage medium | |
CN109255300B (en) | Bill information extraction method, bill information extraction device, computer equipment and storage medium | |
CN111310426B (en) | OCR-based table format recovery method, device and storage medium | |
CN110675940A (en) | Pathological image labeling method and device, computer equipment and storage medium | |
WO2022156178A1 (en) | Image target comparison method and apparatus, computer device and readable storage medium | |
CN113239910B (en) | Certificate identification method, device, equipment and storage medium | |
CN112380978B (en) | Multi-face detection method, system and storage medium based on key point positioning | |
CN108021863B (en) | Electronic device, age classification method based on image and storage medium | |
CN110795714A (en) | Identity authentication method and device, computer equipment and storage medium | |
CN112396047B (en) | Training sample generation method and device, computer equipment and storage medium | |
CN113486785A (en) | Video face changing method, device, equipment and storage medium based on deep learning | |
CN111401326A (en) | Target identity recognition method based on picture recognition, server and storage medium | |
CN112581344A (en) | Image processing method and device, computer equipment and storage medium | |
CN110717060A (en) | Image mask filtering method and device and storage medium | |
CN111695441B (en) | Image document processing method, device and computer readable storage medium | |
CN113936286A (en) | Image text recognition method and device, computer equipment and storage medium | |
CN110991270B (en) | Text recognition method, device, electronic equipment and storage medium | |
CN111539406B (en) | Certificate copy information identification method, server and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20915685 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20915685 Country of ref document: EP Kind code of ref document: A1 |