WO2021147219A1 - 基于图像的文本识别方法、装置、电子设备及存储介质 - Google Patents

基于图像的文本识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021147219A1
WO2021147219A1 PCT/CN2020/093563 CN2020093563W WO2021147219A1 WO 2021147219 A1 WO2021147219 A1 WO 2021147219A1 CN 2020093563 W CN2020093563 W CN 2020093563W WO 2021147219 A1 WO2021147219 A1 WO 2021147219A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition
text
image
recognition result
preset condition
Prior art date
Application number
PCT/CN2020/093563
Other languages
English (en)
French (fr)
Inventor
何嘉欣
刘鹏
刘玉宇
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021147219A1 publication Critical patent/WO2021147219A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to an image-based text recognition method, device, electronic equipment, and computer-readable storage medium.
  • the basic process of the existing general OCR recognition is to first detect the area where the text in the picture is located, draw the circumscribed rectangular frame of each area, and then perform a basic two-dimensional rotation correction on each rectangular frame, and then input the cut block for recognition Module to obtain all the text content of the entire picture.
  • the inventor realizes that although this process can correct the inclination of the target in a two-dimensional plane, in actual image recognition scenarios, there are often cases where the recognition object and the original picture are not coplanar. In this case, the image recognition result will be far from the correct result.
  • this application provides an image-based text recognition method, device, electronic equipment, and computer-readable storage medium, the main purpose of which is to improve the accuracy of text recognition from images.
  • this application provides an image-based text recognition method, which includes:
  • Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
  • the first recognition step input the image to be recognized into a preset recognition model to obtain a first recognition result, including a plurality of first text boxes;
  • the first judging step judging whether the first recognition result satisfies a first preset condition
  • Transformation step when it is determined that the first recognition result does not satisfy the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain the corresponding Multiple second text boxes;
  • Second recognition step input multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box;
  • the second determining step determining whether there is a second recognition result satisfying a second preset condition among the plurality of second recognition results corresponding to the first text box;
  • the first generating step when it is determined that there is a second recognition result that satisfies the second preset condition, determining the target text information corresponding to the first text box based on the second recognition result that satisfies the second preset condition, Generate a target recognition result, and show the target recognition result to the user.
  • an image-based text recognition device which includes:
  • a receiving module configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
  • the first recognition module is configured to input the to-be-recognized image into a preset recognition model to obtain a first recognition result, including a plurality of first text boxes;
  • the first judgment module is configured to judge whether the first recognition result meets a first preset condition
  • the transformation module is configured to, when it is determined that the first recognition result does not meet the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain each of the first text boxes Corresponding multiple second text boxes;
  • a second recognition module configured to input multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box;
  • the second judgment module is configured to judge whether there is a second recognition result satisfying a second preset condition among the multiple second recognition results corresponding to the first text box;
  • the first generating step when it is determined that there is a second recognition result that satisfies the second preset condition, determining the target text information corresponding to the first text box based on the second recognition result that satisfies the second preset condition, Generate a target recognition result, and show the target recognition result to the user.
  • the present application also provides an electronic device, which includes a memory and a processor.
  • the memory stores an image-based text recognition program that can run on the processor.
  • the image text recognition program is executed by the processor, the following steps can be implemented:
  • Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
  • the first recognition step input the image to be recognized into a preset recognition model to obtain a first recognition result, including a plurality of first text boxes;
  • the first judging step judging whether the first recognition result satisfies a first preset condition
  • Transformation step when it is determined that the first recognition result does not satisfy the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain the corresponding Multiple second text boxes;
  • Second recognition step input multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box;
  • the second determining step determining whether there is a second recognition result satisfying a second preset condition among the plurality of second recognition results corresponding to the first text box;
  • the first generating step when it is determined that there is a second recognition result that satisfies the second preset condition, determining the target text information corresponding to the first text box based on the second recognition result that satisfies the second preset condition, Generate a target recognition result, and show the target recognition result to the user.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium includes an image-based text recognition program, when the image-based text recognition program is executed by a processor, To achieve the following steps:
  • Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
  • the first recognition step input the image to be recognized into a preset recognition model to obtain a first recognition result, including a plurality of first text boxes;
  • the first judging step judging whether the first recognition result satisfies a first preset condition
  • Transformation step when it is determined that the first recognition result does not satisfy the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain the corresponding Multiple second text boxes;
  • Second recognition step input multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box;
  • the second determining step determining whether there is a second recognition result satisfying a second preset condition among the plurality of second recognition results corresponding to the first text box;
  • the first generating step when it is determined that there is a second recognition result that satisfies the second preset condition, determining the target text information corresponding to the first text box based on the second recognition result that satisfies the second preset condition, Generate a target recognition result, and show the target recognition result to the user.
  • the image-based text recognition method, device, electronic equipment, and computer-readable storage medium proposed in this application perform OCR recognition on the image to be recognized after receiving an instruction from a user to carry the image to be recognized.
  • the recognition result is directly fed back to the user as the target recognition result.
  • the recognition result is less than the preset reliability threshold, the image to be recognized is subjected to multiple random perspective transformations based on multiple random The result of perspective transformation is subjected to OCR recognition, and the recognition result is analyzed to obtain the target recognition result.
  • FIG. 1 is a flowchart of a preferred embodiment of an image-based text recognition method according to this application;
  • FIG. 2 is a schematic diagram of a preferred embodiment of the electronic device of this application.
  • FIG. 3 is a schematic diagram of modules of a preferred embodiment of the text recognition device of this application.
  • This application provides an image-based text recognition method.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • FIG. 1 it is a flowchart of a preferred embodiment of an image-based text recognition method according to this application.
  • the image-based text recognition method only includes: step S1-step S7.
  • Step S1 Receive a text recognition instruction sent by a user, where the text recognition instruction includes an image to be recognized.
  • the electronic device is used as the execution subject to describe each embodiment of the present application.
  • the user selects the image to be recognized through the APP on the client, and sends a text recognition instruction based on the selected image to be recognized.
  • the electronic device After receiving the instruction issued by the client, the electronic device performs a text recognition operation on the image to be recognized carried in the instruction.
  • Step S2 Input the image to be recognized into a preset recognition model to obtain a first recognition result, which includes a plurality of first text boxes.
  • the aforementioned preset recognition model is an OCR recognition model. Specifically, the OCR recognition model first detects the position of the text field in the image to be recognized, and determines the circumscribed rectangular box containing the position of the text field, that is, the text box, and then respectively recognizes the first text information corresponding to each text box And the first degree of confidence. Among them, the confidence is the accuracy corresponding to the text information in the recognition result output by the OCR recognition model. The higher the confidence, the closer the recognized text information is to the real text information in the image to be recognized.
  • the circumscribed rectangle in order to improve the recognition accuracy, before the text information corresponding to the text box is recognized, it is first determined whether the circumscribed rectangle has a two-dimensional angle. If so, the rotation correction is performed on the circumscribed rectangle, and the corrected The circumscribed rectangular box serves as the first text box.
  • Step S3 It is judged whether the first recognition result satisfies a first preset condition.
  • the first preset condition includes: the first confidence is greater than or equal to a preset confidence threshold, for example, 0.98.
  • the determining whether the first recognition result meets a first preset condition includes:
  • the preset reliability threshold can be adjusted according to actual needs.
  • Step S4 When it is determined that the first recognition result does not satisfy the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain a corresponding Multiple second text boxes.
  • the preset transformation algorithm is: a random perspective transformation algorithm.
  • the random values of the T 1 and T 2 matrices need to be preset.
  • the image of the second text box after the image transformation of the first text box can be obtained according to the perspective transformation matrix.
  • Step S5 Input a plurality of second text boxes corresponding to the first text box into the recognition model to obtain a plurality of second recognition results corresponding to the first text box.
  • the multiple second recognition results corresponding to the first text box include second text information and second confidence levels corresponding to the multiple second text boxes corresponding to the first text box. For example, perform 5 random perspective transformations on each first text box to obtain 5 second text boxes corresponding to one first text box, and use the OCR recognition model to identify the second text information in the 5 second text boxes and The second degree of confidence.
  • Step S6 It is judged whether there is a second recognition result satisfying a second preset condition among the multiple second recognition results corresponding to the first text box.
  • the foregoing second preset condition is: the second confidence is greater than or equal to the preset confidence threshold.
  • the judging whether the multiple second recognition results corresponding to the first text box have second recognition results that meet a second preset condition includes:
  • Step S7 When it is determined that there is a second recognition result that satisfies the second preset condition, the target text information corresponding to the first text box is determined based on the second recognition result that satisfies the second preset condition, and a target is generated The recognition result, and show the target recognition result to the user.
  • the target text information For example, take the second text information in the multiple second text boxes corresponding to one first text box whose second confidence exceeds the preset confidence threshold as the recognition result of the corresponding first text box, that is, the target text information, and summarize The target recognition result generated from the target text information of each first text box is fed back to the user through the display interface of the client.
  • generating the target recognition result based on the second recognition result that satisfies the second preset condition includes:
  • the second text information of the second recognition result corresponding to the highest confidence value is selected from the second recognition results meeting the preset condition as the target text information of the first text box.
  • the image-based text recognition method only includes steps S1-step S6, and step S8.
  • Step S8 When it is determined that there is no second recognition result that satisfies the second preset condition, determine the target corresponding to each first text box based on the first recognition result and the multiple second recognition results The text information generates the target recognition result, and shows the target recognition result to the user.
  • the generating a target recognition result based on the first recognition result and the multiple second recognition results includes:
  • the recognition result corresponding to the highest confidence value is selected as the target recognition result.
  • the image-based text recognition method only includes steps S1-step S3, and step S9.
  • Step S9 When it is determined that the first recognition result satisfies the first preset condition, a target recognition result is generated based on the first recognition result, and the target recognition result is displayed to the user.
  • the first recognition result is directly fed back to the user as the target result.
  • the inputting the image to be recognized into a preset recognition model to obtain the first recognition result includes:
  • the first candidate recognition result corresponding to the first one with the highest confidence is selected as the first recognition result.
  • the aforementioned preset number of recognition models include, but are not limited to: a first recognition model and a second recognition model; wherein the model structures of the aforementioned first recognition model and the second recognition model may be the same or different, for example, the first recognition model is CNN+RNN+CTC; The second recognition model is: CNN+Seq2Seq+Attention.
  • the training data of the first recognition model and the second recognition model must be independent of each other, so that the recognition results of different recognition models are also independent of each other.
  • the training data of the first recognition model only includes letters, symbols, and numbers; the training data of the second recognition model includes Chinese characters, letters, numbers, and so on. The objects that can be accurately identified by different recognition models are different.
  • the confidence of the recognition result obtained by the first recognition model must be very low, and the confidence of the second recognition model will be significantly higher than that of the first recognition model.
  • the confidence of the recognition result obtained by the second recognition model must be very low, and the confidence of the first recognition model will be significantly higher than that of the second recognition model.
  • the inputting multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box includes:
  • a second recognition result of the first text box corresponding to each of the second text boxes is generated based on the second recognition result corresponding to each of the second text boxes.
  • first text box as an example, which corresponds to five second text boxes, and enter the five second text boxes in the first recognition model and the second recognition model respectively.
  • Each second text box corresponds to two Second candidate recognition results, take the higher confidence of the two candidate recognition results as the second recognition result corresponding to the current second text box, and obtain the first of the five second text boxes corresponding to the current first text box 2.
  • Recognition results are used to determine whether the second recognition result meets the preset condition, and the second recognition result of the current first text box is determined according to the determination result.
  • the method further includes:
  • Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
  • the performing distortion correction on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction includes:
  • the coordinates of each pixel corner on the undistorted image are obtained by performing distortion correction on the pixel corners on the original image to be recognized with distortion, where the pixel corners can be the distorted image to be recognized If the image to be recognized is a quadrilateral, it is the four vertices of the quadrilateral.
  • the perspective transformation matrix At least the corresponding coordinates of four pixel points are needed to solve the solution. Therefore, when obtaining the pixel corner points on the image to be recognized with distortion, at least the coordinates of the four pixel corner points need to be obtained.
  • the perspective transformation of the image to be recognized can be performed to obtain the image to be recognized after distortion correction, and then the subsequent transformation and recognition operations are performed.
  • the coordinates of pixel corners on the undistorted image calculated by distortion correction are not one-to-one mapping, it is possible that the coordinates on the undistorted image calculated for the pixel corners on the original distorted image are not unique, in order to find The optimal coordinates of the pixel corners on the undistorted image.
  • the calculating the coordinates of the pixel corners on the undistorted image includes:
  • the coordinates of the pixel corners on the undistorted image are determined according to the coordinates of the respective neighborhood pixels on the image to be recognized.
  • the distance between each neighborhood pixel and the pixel corner can be calculated according to the coordinates of each neighborhood pixel on the original distorted image to be recognized, and then the coordinates corresponding to the shortest distance can be determined as the pixel corner is undistorted The coordinates on the image.
  • the neighborhood radius can be flexibly set according to the degree of distortion of the original distorted image to be recognized.
  • the degree of distortion is small, the neighborhood radius It can be set smaller, so that there are fewer pixels in the neighborhood that need to be traversed, which can reduce the amount of calculation.
  • the degree of distortion is large, the radius of the neighborhood can be set larger, so that the optimal pixel can be found.
  • OCR recognition is performed on the image to be recognized.
  • the confidence of the recognition result is greater than or equal to the preset reliability threshold, it is directly The recognition result is fed back to the user as the target recognition result.
  • the image to be recognized is subjected to multiple random perspective transformations, and OCR recognition is performed based on the results of multiple random perspective transformations, and the recognition is analyzed. As a result, the target recognition result is obtained.
  • FIG. 2 is a schematic diagram of a preferred embodiment of the electronic device of this application.
  • the electronic device 1 may be a terminal device with data processing functions such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, etc.
  • the server may be a rack server, a blade server, or a tower. Server or rack server.
  • the electronic device 1 includes a memory 11, a processor 12 and a network interface 13.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, for example, an image-based text recognition program 10, etc., but also to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, for running program codes or processing stored in the memory 11 Data, for example, image-based text recognition program 10 and the like.
  • CPU central processing unit
  • controller microcontroller
  • microprocessor microprocessor
  • other data processing chip for running program codes or processing stored in the memory 11 Data, for example, image-based text recognition program 10 and the like.
  • the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the electronic device 1 and other electronic devices, for example, a client (not marked in the figure). ).
  • the components 11-13 of the electronic device 1 communicate with each other via a communication bus.
  • FIG. 2 only shows the electronic device 1 with components 11-13. Those skilled in the art can understand that the structure shown in FIG. 2 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Multiple components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like.
  • the display may also be called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the memory 11 which is a computer storage medium, stores the program code of the image-based text recognition program 10, and the processor 12 executes the program code of the image-based text recognition program 10 , To achieve the following steps:
  • Receiving step receiving a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
  • the user selects the image to be recognized through the APP on the client, and sends a text recognition instruction based on the selected image to be recognized.
  • the electronic device 1 After receiving the instruction sent by the client, the electronic device 1 performs a text recognition operation on the image to be recognized carried in the instruction.
  • the first recognition step input the image to be recognized into a preset recognition model to obtain a first recognition result, including a plurality of first text boxes;
  • the aforementioned preset recognition model is an OCR recognition model. Specifically, the OCR recognition model first detects the position of the text field in the image to be recognized, and determines the circumscribed rectangular box containing the position of the text field, that is, the text box, and then respectively recognizes the first text information corresponding to each text box And the first degree of confidence. Among them, the confidence is the accuracy corresponding to the text information in the recognition result output by the OCR recognition model. The higher the confidence, the closer the recognized text information is to the real text information in the image to be recognized.
  • the circumscribed rectangle in order to improve the recognition accuracy, before the text information corresponding to the text box is recognized, it is first determined whether the circumscribed rectangle has a two-dimensional angle. If so, the rotation correction is performed on the circumscribed rectangle, and the corrected The circumscribed rectangular box serves as the first text box.
  • the first judging step judging whether the first recognition result satisfies a first preset condition
  • the first preset condition includes: the first confidence is greater than or equal to a preset confidence threshold, for example, 0.98.
  • the determining whether the first recognition result meets a first preset condition includes:
  • the preset reliability threshold can be adjusted according to actual needs.
  • Transformation step when it is determined that the first recognition result does not satisfy the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain the corresponding Multiple second text boxes;
  • the preset transformation algorithm is: a random perspective transformation algorithm.
  • the random values of the T 1 and T 2 matrices need to be preset.
  • the image of the second text box after the image transformation of the first text box can be obtained according to the perspective transformation matrix.
  • Second recognition step input multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box;
  • the multiple second recognition results corresponding to the first text box include second text information and second confidence levels corresponding to the multiple second text boxes corresponding to the first text box. For example, perform 5 random perspective transformations on each first text box to obtain 5 second text boxes corresponding to one first text box, and use the OCR recognition model to identify the second text information in the 5 second text boxes and The second degree of confidence.
  • the second determining step determining whether the multiple second recognition results corresponding to the first text box have second recognition results that meet a second preset condition
  • the foregoing second preset condition is: the second confidence is greater than or equal to the preset confidence threshold.
  • the judging whether the multiple second recognition results corresponding to the first text box have second recognition results that meet a second preset condition includes:
  • the first generation step when it is determined that there is a second recognition result that satisfies the second preset condition, determining the target text information corresponding to the first text box based on the second recognition result that satisfies the second preset condition, Generate a target recognition result, and show the target recognition result to the user.
  • the target text information For example, take the second text information in the multiple second text boxes corresponding to one first text box whose second confidence exceeds the preset confidence threshold as the recognition result of the corresponding first text box, that is, the target text information, and summarize The target recognition result generated from the target text information of each first text box is fed back to the user through the display interface of the client.
  • generating the target recognition result based on the second recognition result that satisfies the second preset condition includes:
  • the second text information of the second recognition result corresponding to the highest confidence value is selected from the second recognition results meeting the preset condition as the target text information of the first text box.
  • the second generation step when it is determined that there is no second recognition result that satisfies the second preset condition, determine that each of the first text boxes corresponds based on the first recognition result and the plurality of second recognition results The target text information of, generates the target recognition result, and shows the target recognition result to the user.
  • the generating a target recognition result based on the first recognition result and the multiple second recognition results includes:
  • the recognition result corresponding to the highest confidence value is selected as the target recognition result.
  • the third generation step when it is determined that the first recognition result meets the first preset condition, a target recognition result is generated based on the first recognition result, and the target recognition result is displayed to the user.
  • the first recognition result is directly fed back to the user as the target result.
  • the inputting the image to be recognized into a preset recognition model to obtain the first recognition result includes:
  • the first candidate recognition result corresponding to the first one with the highest confidence is selected as the first recognition result.
  • the aforementioned preset number of recognition models include, but are not limited to: a first recognition model and a second recognition model; wherein the model structures of the aforementioned first recognition model and the second recognition model may be the same or different, for example, the first recognition model is CNN+RNN+CTC; The second recognition model is: CNN+Seq2Seq+Attention.
  • the training data of the first recognition model and the second recognition model must be independent of each other, so that the recognition results of different recognition models are also independent of each other.
  • the training data of the first recognition model only includes letters, symbols, and numbers; the training data of the second recognition model includes Chinese characters, letters, numbers, and so on. The objects that can be accurately identified by different recognition models are different.
  • the confidence of the recognition result obtained by the first recognition model must be very low, and the confidence of the second recognition model will be significantly higher than that of the first recognition model.
  • the confidence of the recognition result obtained by the second recognition model must be very low, and the confidence of the first recognition model will be significantly higher than that of the second recognition model.
  • the inputting multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box includes:
  • a second recognition result of the first text box corresponding to each of the second text boxes is generated based on the second recognition result corresponding to each of the second text boxes.
  • first text box as an example, which corresponds to five second text boxes, and enter the five second text boxes in the first recognition model and the second recognition model respectively.
  • Each second text box corresponds to two Second candidate recognition results, take the higher confidence of the two candidate recognition results as the second recognition result corresponding to the current second text box, and obtain the first of the five second text boxes corresponding to the current first text box 2.
  • Recognition results are used to determine whether the second recognition result meets the preset condition, and the second recognition result of the current first text box is determined according to the determination result.
  • the image to be recognized may be captured by the user in real time, and when the user uses the camera to capture the image to be recognized, the image may be distorted due to the characteristics of the camera itself. Therefore, in order to further improve the accuracy of recognition, in other embodiments, when the processor 12 executes the image-based text recognition program 10, before the conversion step, the following steps may also be implemented:
  • Distortion correction is performed on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction.
  • the performing distortion correction on the image to be recognized based on a preset distortion correction rule to obtain the image to be recognized after distortion correction includes:
  • the coordinates of each pixel corner on the undistorted image are obtained by performing distortion correction on the pixel corners on the original image to be recognized with distortion, where the pixel corners can be the distorted image to be recognized If the image to be recognized is a quadrilateral, it is the four vertices of the quadrilateral.
  • the perspective transformation matrix At least the corresponding coordinates of four pixel points are needed to solve the solution. Therefore, when obtaining the pixel corner points on the image to be recognized with distortion, at least the coordinates of the four pixel corner points need to be obtained.
  • the perspective transformation of the image to be recognized can be performed to obtain the image to be recognized after distortion correction, and then the subsequent transformation and recognition operations are performed.
  • the coordinates of pixel corners on the undistorted image calculated by distortion correction are not one-to-one mapping, it is possible that the coordinates on the undistorted image calculated for the pixel corners on the original distorted image are not unique, in order to find The optimal coordinates of the pixel corners on the undistorted image.
  • the calculating the coordinates of the pixel corners on the undistorted image includes:
  • the coordinates of the pixel corners on the undistorted image are determined according to the coordinates of the respective neighborhood pixels on the image to be recognized.
  • the distance between each neighborhood pixel and the pixel corner can be calculated according to the coordinates of each neighborhood pixel on the original distorted image to be recognized, and then the coordinates corresponding to the shortest distance can be determined as the pixel corner is undistorted The coordinates on the image.
  • the neighborhood radius can be flexibly set according to the degree of distortion of the original distorted image to be recognized.
  • the degree of distortion is small, the neighborhood radius It can be set smaller, so that there are fewer pixels in the neighborhood that need to be traversed, which can reduce the amount of calculation.
  • the degree of distortion is large, the radius of the neighborhood can be set larger, so that the optimal pixel can be found.
  • This application also proposes a text recognition device.
  • FIG. 3 it is a schematic diagram of modules of a preferred embodiment of the text recognition device of this application.
  • the text recognition apparatus 2 in this embodiment may include: module 210-module 270 according to the realized functions.
  • the module can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the receiving module 210 is configured to receive a text recognition instruction issued by a user, where the text recognition instruction includes an image to be recognized;
  • the first recognition module 220 is configured to input the to-be-recognized image into a preset recognition model to obtain a first recognition result, including a plurality of first text boxes;
  • the first judgment module 230 is configured to judge whether the first recognition result meets a first preset condition
  • the transformation module 240 is configured to, when it is determined that the first recognition result does not meet the first preset condition, perform multiple transformations on the first text box based on a preset transformation algorithm to obtain each of the first texts Multiple second text boxes corresponding to the boxes;
  • the second recognition module 250 is configured to input multiple second text boxes corresponding to the first text box into the recognition model to obtain multiple second recognition results corresponding to the first text box;
  • the second judgment module 260 is configured to judge whether there is a second recognition result satisfying a second preset condition among the plurality of second recognition results corresponding to the first text box;
  • the feedback judgment module 270 is configured to determine the target text corresponding to the first text box based on the second recognition result satisfying the second preset condition when it is judged that there is a second recognition result satisfying the second preset condition Information, generate a target recognition result, and show the target recognition result to the user.
  • modules 210-270 The functions or operation steps implemented by the modules 210-270 are similar to the above, and will not be described in detail here.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium includes an image-based text recognition program 10, which implements any step of the image-based text recognition method when the image-based text recognition program 10 is executed by a processor.
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the foregoing method embodiment, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

本申请涉及人工智能,尤其涉及图像处理领域,揭露了一种基于图像的文本识别方法,包括:接收用户发出的携带待识别图像的文本识别指令;将所述待识别图像输入预设识别模型中,得到第一识别结果;判断所述第一识别结果是否满足第一预设条件;若否,对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;判断是否存在满足第二预设条件的第二识别结果;若是,基于所述满足第二预设条件的第二识别结果生成目标识别结果,并反馈至所述用户。本申请还揭露一种装置、设备及存储介质。利用本申请,可提高文本识别的准确性。

Description

基于图像的文本识别方法、装置、电子设备及存储介质
本申请要求于2020年01月22日提交中国专利局、申请号为202010076369.6、发明名称为“基于图像的文本识别方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于图像的文本识别方法、装置、电子设备及计算机可读存储介质。
背景技术
现今专用OCR识别已经有一套成熟的算法,分别承担目标文件检测,字段检测和字段识别,这个过程是端到端的,结果将直接输出至用户。
现有的通用OCR识别的基本流程是,首先检测图片中文字所在的区域,画出每个区域的外接矩形框,然后把每个矩形框进行基本的二维旋转矫正后,把切块输入识别模块,由此获得整张图片的全部文本内容。发明人意识到虽然这个流程可以矫正目标在二维平面内的倾斜,然而,在实际的图像识别情景中,经常有识别对象和原图片并不共平面的情况。这种情况下的图像识别结果也会与正确的结果相差甚远。
因此,亟待提供一种能准确从图片中识别文本的方法。
发明内容
鉴于以上内容,本申请提供一种基于图像的文本识别方法、装置、电子设备及计算机可读存储介质,其主要目的在于提高从图像中识别文本的准确性。
为实现上述目的,本申请提供一种基于图像的文本识别方法,该方法包括:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
第二判断步骤:判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
为实现上述目的,本申请提供一种基于图像的文本识别装置,该装置包括:
接收模块,用于接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
第一识别模块,用于将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
第一判断模块,用于判断所述第一识别结果是否满足第一预设条件;
变换模块,用于当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
第二识别模块,用于将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
第二判断模块,用于判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
此外,为实现上述目的,本申请还提供一种电子设备,该设备包括:存储器、处理器,所述存储器中存储有可在所述处理器上运行的基于图像的文本识别程序,所述基于图像的文本识别程序被所述处理器执行时可实现如下步骤:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
第二判断步骤:判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括基于图像的文本识别程序,所述基于图像的文本识别程序被处理器执行时,可实现如下步骤:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
第二判断步骤:判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
本申请提出的基于图像的文本识别方法、装置、电子设备及计算机可读存储介质,在接收到用户发出的携带待识别图像的指令后,对待识别图像进行OCR识别,当识别结果的置信度大于或等于预设置信度阈值时,直接将识别结果作为目标识别结果反馈给用户,当识别结果的置信度小于预设置信度阈值时,对待识别图像进行多次随机透视变换,并基于多次随机透视变换的结果进行OCR识别,分析识别结果得到目标识别结果,通过采取随机透视变换,增加了变换结果的多样性,避免了待识别图像因三维角度干扰造成的识别准确率下降的问题,从而提高了准确识别的可能,提高用户的使用体验;同时利用多种识别模型对待识别图像进行识别,取置信度最高的识别结果生成目标识别结果,提高了文本 识别的准确性;在对待识别图像进行随机透视变换前还对待识别图像进行畸变校正,并基于畸变校正结果进行透视变换,为准确识别文本奠定基础。
附图说明
图1为本申请基于图像的文本识别方法较佳实施例的流程图;
图2为本申请电子设备较佳实施例的示意图;
图3为本申请文本识别装置较佳实施例的模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种基于图像的文本识别方法。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
参照图1所示,为本申请基于图像的文本识别方法较佳实施例的流程图。
在本申请基于图像的文本识别方法一较佳实施例中,所述基于图像的文本识别方法仅包括:步骤S1-步骤S7。
步骤S1,接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像。
以下以电子设备作为执行主体对本申请各实施例进行说明。
用户通过客户端上的APP选择待识别图像,并基于选择的待识别图像发出文本识别指令。电子设备接收到客户端发出的指令后,对指令中携带的待识别图像执行文本识别操作。
步骤S2,将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框。
上述预设识别模型为OCR识别模型。具体地,OCR识别模型首先检测所述待识别图像中文本字段位置,并确定包含所述文本字段位置的外接矩形框,即,文本框,然后分别识别出每一个文本框对应的第一文本信息及第一置信度。其中,置信度为OCR识别模型输出的识别结果中文本信息对应的准确度,置信度越高,识别出的文本信息越接近待识别图像中的真实文本信息。
在其他实施例中,为了提高识别准确性,在识别出文本框对应的文本信息前,先判断外接矩形框是否存在二维角度,若存在,则对外接矩形框执行旋转校正,将校正后的外接矩形框作为第一文本框。
步骤S3,判断所述第一识别结果是否满足第一预设条件。
在本实施例中,所述第一预设条件包括:所述第一置信度大于或等于预设置信度阈值,例如,0.98。
所述判断所述第一识别结果是否满足第一预设条件,包括:
从所述第一识别结果获取所述第一文本信息对应的第一置信度,判断所述第一置信度是否超过预设置信度阈值;及
若是,则判断所述第一识别结果满足所述第一预设条件,若否,则判断所述第一识别结果不满足所述第一预设条件。
其中,预设置信度阈值可根据实际需求进行调整。
可以理解的是,若第一识别结果中置信度大于或等于预设置信度阈值,则认为识别结果的准确性满足实际需求,无需对识别进行优化。
步骤S4,当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框。
在本实施例中,所述预设变换算法为:随机透视变换算法。
透视变换的本质是将图像投影到一个新的视平面,其通用变换公式为:
Figure PCTCN2020093563-appb-000001
(u,v)为第一文本框的图像的像素坐标,(x=x′/w′,y=y′/w′)为变换之后的第二文本框的图像的像素坐标。透视变换矩阵图解如下:
Figure PCTCN2020093563-appb-000002
其中
Figure PCTCN2020093563-appb-000003
表示图像线性变换;T 2=[a 13 a 23] T用于产生图像的透视变换;T 3=[a 31 a 32]表示图像平移。在变换过程中,需要预设T 1和T 2矩阵的随机值。
在计算得到透视变换矩阵后,即可根据透视变换矩阵获取第一文本框的图像变换后的第二文本框的图像。
步骤S5,将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果。
其中,所述第一文本框对应的多个第二识别结果,包括所述第一文本框对应的多个第二文本框对应的第二文本信息及第二置信度。例如,对每个第一文本框进行5次随机透视变换,得到一个第一文本框对应的5个第二文本框,利用OCR识别模型识别出5个第二文本框中的第二文本信息及第二置信度。
步骤S6,判断所述第一文本框对应的多个第二识别结果是否存在满足第二预设条件的第二识别结果。
上述第二预设条件为:第二置信度大于或等于预设置信度阈值。
在本实施例中,所述判断所述第一文本框对应的多个第二识别结果是否存在满足第二预设条件的第二识别结果,包括:
分别从所述第一文本框对应的多个第二识别结果中获取所述第二文本信息对应的第二置信度,判断所述第二置信度是否超过预设置信度阈值;及
若是,则判断所述第一文本框对应的多个第二识别结果存在满足所述第二预设条件的第二识别结果,若否,则判断所述第一文本框对应的多个第二识别结果不存在满足所述第二预设条件的第二识别结果。
步骤S7,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
例如,将一个第一文本框对应的多个第二文本框中第二置信度超过预设置信度阈值的第二文本信息作为对应的第一文本框的识别结果,即目标目标文本信息,汇总每个第一文本框的目标文本信息生成目标识别结果通过客户端的展示界面反馈给用户。
在其他实施例中,当存在多个满足所述第二预设条件的第二识别结果时,所述基于所述满足第二预设条件的第二识别结果生成目标识别结果,包括:
从所述满足预设条件的第二识别结果中选择置信度最高值对应的第二识别结果的第二文本信息作为所述第一文本框的目标文本信息。
在其他实施例中,所述基于图像的文本识别方法仅包括:步骤S1-步骤S6、及步骤S8。
步骤S8,当判断不存在满足所述第二预设条件的第二识别结果时,基于所述第一识别结果及所述多个第二识别结果确定所述每个第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
在其他实施例中,所述基于所述第一识别结果及所述多个第二识别结果生成目标识别 结果,包括:
从所述第一识别结果及所述多个第二识别结果中选择置信度最高值对应的识别结果作为目标识别结果。
在其他实施例中,所述基于图像的文本识别方法仅包括:步骤S1-步骤S3、及步骤S9。
步骤S9,当判断所述第一识别结果满足所述第一预设条件时,基于所述第一识别结果生成目标识别结果,并向所述用户展示所述目标识别结果。
若第一置信度大于或等于预设置信度阈值,直接将第一识别结果作为目标结果反馈给用户。
在其他实施例中,为了进一步提高文本识别的准确性,所述将所述待识别图像输入预设识别模型中,得到第一识别结果,包括:
将所述待识别图像输入预设数量的识别模型中,分别得到所述预设数量的识别模型对应的第一备选识别结果;及
从所述预设数量的识别模型对应的第一备选识别结果中选择第一置信度最高者对应的第一备选识别结果作为所述第一识别结果。
上述预设数量的识别模型包括但不仅限于:第一识别模型和第二识别模型;其中,上述第一识别模型和第二识别模型的模型结构可以相同也可以不同,例如,第一识别模型为CNN+RNN+CTC;第二识别模型为:CNN+Seq2Seq+Attention。上述第一识别模型和第二识别模型的训练数据必须是相互独立的,使得不同的识别模型的识别结果也是相互独立的。例如,第一识别模型的训练数据仅包括字母、符号及数字;第二识别模型的训练数据包括汉字、字母、数字等。使得不同的识别模型能准确识别的对象有所区别。
可以理解的是,对于待识别图像中的“汉字内容”,第一识别模型得到的识别结果置信度必然很低,第二识别模型的置信度会明显高于第一识别模型的置信度,对于待识别图像中的“符号内容”,第二识别模型得到的识别结果置信度必然很低,第一识别模型的置信度会明显高于第二识别模型的置信度。
相应地,所述将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果,包括:
分别将所述多个第二文本框依次输入预设数量的识别模型中,分别得到各第二文本框对应的所述预设数量的识别模型对应的第二备选识别结果;
从所述各第二文本框对应的所述预设数量的识别模型对应的第二备选识别结果中选择第二置信度最高者对应的第二备选识别结果作为所述各第二文本框对应的第二识别结果;及
基于所述各第二文本框对应的第二识别结果生成与所述各第二文本框对应的第一文本框的第二识别结果。
需要说明的是,每一个第一文本框对应的多个第二文本框分别输入到上述第一识别模型和第二识别模型会得到每个第二文本框的两个识别结果包括第二文本信息及第二置信度。
同样,以一个第一文本框为例,其对应5个第二文本框,依次将5个第二文本框分别输入第一识别模型及第二识别模型中,每个第二文本框对应的两个第二备选识别结果,取两个备选识别结果中置信度较高者作为当前第二文本框对应的第二识别结果,得到当前第一文本框对应的5个第二文本框的第二识别结果。然后采用上述步骤判断第二识别结果是否满足预设条件,并根据判断结果确定当前第一文本框的第二识别结果。
可以理解的是,所述待识别图像可能是用户即时采集的,在用户采用摄像头采集待识别图像过程中,可能出现由于摄像头自身的特性导致图片出现畸变的情况。因此,为了进一步提高识别的准确性,在其他实施例中,在所述步骤S4之前,该方法还包括:
基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。
在本实施例中,所述基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像,包括:
获取所述待识别图像的像素角点,计算所述像素角点在无畸变图像上的坐标;
根据所述像素角点在所述无畸变图像上的坐标计算透视变换矩阵;及
根据所述透视变换矩阵对所述待识别图像进行畸变校正,生成所述畸变校正后的待识别图像。
在本实施例中,通过对原始存在畸变的待识别图像上的像素角点进行畸变矫正,获取各个像素角点在无畸变图像上的坐标,其中,像素角点可以是存在畸变的待识别图像的顶点,如果待识别图像为四边形,则是四边形的四个顶点。由于在计算透视变换矩阵时,至少需要四个像素点的对应坐标才能求解,因而,在获取存在畸变的待识别图像上的像素角点时至少需要获取四个像素角点的坐标。以二维码图像为例,可以先从原始的畸变图像中获取图像中的二维码区域的四个像素角点的坐标,即二维码的四个顶点的坐标,然后根据以下公式采用事先标定好的畸变参数求出四个角点在无畸变图像上的坐标:[x,y]=K[u,v],其中,[x,y]为原始畸变图像上的像素角点坐标,[u,v]为无畸变图像上的像素角点坐标,K为畸变参数。
求解出透视变换矩阵后,即可对待识别图像进行透视变换,得到经过畸变校正后的待识别图像,然后执行后续的变换及识别操作。
由于通过畸变矫正来计算像素角点在无畸变图像上的坐标并不是一一映射的,所以可能针对原始畸变图像上的像素角点计算得到的在无畸变图像上坐标并不是唯一的,为了找到像素角点在无畸变图像上的较优的坐标。
在其他实施例中,所述计算所述像素角点在无畸变图像上的坐标,包括:
首先,在所述无畸变图像上确定一个目标像素点,目标像素点的坐标与所述待识别图像上的像素角点的坐标相同;
然后,确定以所述目标像素点为圆心,预设邻域半径为半径的圆形区域内的像素点,作为邻域像素点;
然后,遍历所述无畸变图像上目标像素点的各个邻域像素点,分别计算所述各个邻域像素点在所述待识别图像上的坐标;及
最后,根据所述各个邻域像素点在所述待识别图像上的坐标确定所述像素角点在所述无畸变图像上的坐标。
例如,可以分别根据各个邻域像素点在原始畸变的待识别图像上的坐标计算各个邻域像素点与像素角点的距离,然后将最短距离对应的坐标确定为所述像素角点在无畸变图像上的坐标。在确定原始畸变的待识别图像上各个像素角点在无畸变图像中的坐标时,可以根据原始畸变的待识别图像的畸变程度去灵活地设置邻域半径,当畸变程度较小时,邻域半径可以设置得小一些,这样需要遍历的邻域像素点少一些,可以减少计算量,当畸变程度较大时,可以将邻域半径设置得大一些,这样便可以找到最优的像素点。
上述实施例提出的基于图像的文本识别方法,在接收到用户发出的携带待识别图像的指令后,对待识别图像进行OCR识别,当识别结果的置信度大于或等于预设置信度阈值时,直接将识别结果作为目标识别结果反馈给用户,当识别结果的置信度小于预设置信度阈值时,对待识别图像进行多次随机透视变换,并基于多次随机透视变换的结果进行OCR 识别,分析识别结果得到目标识别结果,通过采取随机透视变换,增加了变换结果的多样性,避免了待识别图像因三维角度干扰造成的识别准确率下降的问题,从而提高了准确识别的可能,提高用户的使用体验;同时利用多种识别模型对待识别图像进行识别,取置信度最高的识别结果生成目标识别结果,提高了文本识别的准确性;在对待识别图像进行随机透视变换前还对待识别图像进行畸变校正,并基于畸变校正结果进行透视变换,为准确识别文本奠定基础。
本申请还提出一种电子设备。参照图2所示,为本申请电子设备较佳实施例的示意图。
在本实施例中,电子设备1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有数据处理功能的终端设备,所述服务器可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器。
该电子设备1包括存储器11、处理器12及网络接口13。
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是所述电子设备1的内部存储单元,例如该电子设备1的硬盘。存储器11在另一些实施例中也可以是所述电子设备1的外部存储设备,例如该电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括该电子设备1的内部存储单元也包括外部存储设备。
存储器11不仅可以用于存储安装于该电子设备1的应用软件及各类数据,例如,基于图像的文本识别程序10等,还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如,基于图像的文本识别程序10等。
网络接口13可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子设备1与其他电子设备之间建立通信连接,例如,客户端(图中未标识)。电子设备1的组件11-13通过通信总线相互通信。
图2仅示出了具有组件11-13的电子设备1,本领域技术人员可以理解的是,图2示出的结构并不构成对电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
可选地,该电子设备1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。
可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。其中,显示器也可以称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
在图2所示的电子设备1实施例中,作为一种计算机存储介质的存储器11中存储基于图像的文本识别程序10的程序代码,处理器12执行基于图像的文本识别程序10的程序代码时,实现如下步骤:
接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
用户通过客户端上的APP选择待识别图像,并基于选择的待识别图像发出文本识别指令。电子设备1接收到客户端发出的指令后,对指令中携带的待识别图像执行文本识别操作。
第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
上述预设识别模型为OCR识别模型。具体地,OCR识别模型首先检测所述待识别图像中文本字段位置,并确定包含所述文本字段位置的外接矩形框,即,文本框,然后分别识别出每一个文本框对应的第一文本信息及第一置信度。其中,置信度为OCR识别模型输出的识别结果中文本信息对应的准确度,置信度越高,识别出的文本信息越接近待识别图像中的真实文本信息。
在其他实施例中,为了提高识别准确性,在识别出文本框对应的文本信息前,先判断外接矩形框是否存在二维角度,若存在,则对外接矩形框执行旋转校正,将校正后的外接矩形框作为第一文本框。
第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
在本实施例中,所述第一预设条件包括:所述第一置信度大于或等于预设置信度阈值,例如,0.98。
所述判断所述第一识别结果是否满足第一预设条件,包括:
从所述第一识别结果获取所述第一文本信息对应的第一置信度,判断所述第一置信度是否超过预设置信度阈值;及
若是,则判断所述第一识别结果满足所述第一预设条件,若否,则判断所述第一识别结果不满足所述第一预设条件。
其中,预设置信度阈值可根据实际需求进行调整。
可以理解的是,若第一识别结果中置信度大于或等于预设置信度阈值,则认为识别结果的准确性满足实际需求,无需对识别进行优化。
变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
在本实施例中,所述预设变换算法为:随机透视变换算法。
透视变换的本质是将图像投影到一个新的视平面,其通用变换公式为:
Figure PCTCN2020093563-appb-000004
(u,v)为第一文本框的图像的像素坐标,(x=x′/w′,y=y′/w′)为变换之后的第二文本框的图像的像素坐标。透视变换矩阵图解如下:
Figure PCTCN2020093563-appb-000005
其中
Figure PCTCN2020093563-appb-000006
表示图像线性变换;T 2=[a 13 a 23] T用于产生图像的透视变换;T 3=[a 31 a 32]表示图像平移。在变换过程中,需要预设T 1和T 2矩阵的随机值。
在计算得到透视变换矩阵后,即可根据透视变换矩阵获取第一文本框的图像变换后的第二文本框的图像。
第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
其中,所述第一文本框对应的多个第二识别结果,包括所述第一文本框对应的多个第二文本框对应的第二文本信息及第二置信度。例如,对每个第一文本框进行5次随机透视变换,得到一个第一文本框对应的5个第二文本框,利用OCR识别模型识别出5个第二文本框中的第二文本信息及第二置信度。
第二判断步骤:判断所述第一文本框对应的多个第二识别结果是否存在满足第二预设条件的第二识别结果;
上述第二预设条件为:第二置信度大于或等于预设置信度阈值。
在本实施例中,所述判断所述第一文本框对应的多个第二识别结果是否存在满足第二预设条件的第二识别结果,包括:
分别从所述第一文本框对应的多个第二识别结果中获取所述第二文本信息对应的第二置信度,判断所述第二置信度是否超过预设置信度阈值;及
若是,则判断所述第一文本框对应的多个第二识别结果存在满足所述第二预设条件的第二识别结果,若否,则判断所述第一文本框对应的多个第二识别结果不存在满足所述第二预设条件的第二识别结果。
第一生成步骤:当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
例如,将一个第一文本框对应的多个第二文本框中第二置信度超过预设置信度阈值的第二文本信息作为对应的第一文本框的识别结果,即目标目标文本信息,汇总每个第一文本框的目标文本信息生成目标识别结果通过客户端的展示界面反馈给用户。
在其他实施例中,当存在多个满足所述第二预设条件的第二识别结果时,所述基于所述满足第二预设条件的第二识别结果生成目标识别结果,包括:
从所述满足预设条件的第二识别结果中选择置信度最高值对应的第二识别结果的第二文本信息作为所述第一文本框的目标文本信息。
在其他实施例中,所述处理器12执行所述基于图像的文本识别程序10时,在所述变换步骤之前,还可实现以下步骤:
第二生成步骤:当判断不存在满足所述第二预设条件的第二识别结果时,基于所述第一识别结果及所述多个第二识别结果确定所述每个第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
在其他实施例中,所述基于所述第一识别结果及所述多个第二识别结果生成目标识别结果,包括:
从所述第一识别结果及所述多个第二识别结果中选择置信度最高值对应的识别结果作为目标识别结果。
在其他实施例中,所述处理器12执行所述基于图像的文本识别程序10时,在所述变换步骤之前,还可实现以下步骤:
第三生成步骤:当判断所述第一识别结果满足所述第一预设条件时,基于所述第一识别结果生成目标识别结果,并向所述用户展示所述目标识别结果。
若第一置信度大于或等于预设置信度阈值,直接将第一识别结果作为目标结果反馈给用户。
在其他实施例中,为了进一步提高文本识别的准确性,所述将所述待识别图像输入预设识别模型中,得到第一识别结果,包括:
将所述待识别图像输入预设数量的识别模型中,分别得到所述预设数量的识别模型对应的第一备选识别结果;及
从所述预设数量的识别模型对应的第一备选识别结果中选择第一置信度最高者对应的第一备选识别结果作为所述第一识别结果。
上述预设数量的识别模型包括但不仅限于:第一识别模型和第二识别模型;其中,上述第一识别模型和第二识别模型的模型结构可以相同也可以不同,例如,第一识别模型为CNN+RNN+CTC;第二识别模型为:CNN+Seq2Seq+Attention。上述第一识别模型和第二识别模型的训练数据必须是相互独立的,使得不同的识别模型的识别结果也是相互独立的。例如,第一识别模型的训练数据仅包括字母、符号及数字;第二识别模型的训练数据包括汉字、字母、数字等。使得不同的识别模型能准确识别的对象有所区别。
可以理解的是,对于待识别图像中的“汉字内容”,第一识别模型得到的识别结果置信度必然很低,第二识别模型的置信度会明显高于第一识别模型的置信度,对于待识别图像中的“符号内容”,第二识别模型得到的识别结果置信度必然很低,第一识别模型的置信度会明显高于第二识别模型的置信度。
相应地,所述将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果,包括:
分别将所述多个第二文本框依次输入预设数量的识别模型中,分别得到各第二文本框对应的所述预设数量的识别模型对应的第二备选识别结果;
从所述各第二文本框对应的所述预设数量的识别模型对应的第二备选识别结果中选择第二置信度最高者对应的第二备选识别结果作为所述各第二文本框对应的第二识别结果;及
基于所述各第二文本框对应的第二识别结果生成与所述各第二文本框对应的第一文本框的第二识别结果。
需要说明的是,每一个第一文本框对应的多个第二文本框分别输入到上述第一识别模型和第二识别模型会得到每个第二文本框的两个识别结果包括第二文本信息及第二置信度。
同样,以一个第一文本框为例,其对应5个第二文本框,依次将5个第二文本框分别输入第一识别模型及第二识别模型中,每个第二文本框对应的两个第二备选识别结果,取两个备选识别结果中置信度较高者作为当前第二文本框对应的第二识别结果,得到当前第一文本框对应的5个第二文本框的第二识别结果。然后采用上述步骤判断第二识别结果是否满足预设条件,并根据判断结果确定当前第一文本框的第二识别结果。
可以理解的是,所述待识别图像可能是用户即时采集的,在用户采用摄像头采集待识别图像过程中,可能出现由于摄像头自身的特性导致图片出现畸变的情况。因此,为了进一步提高识别的准确性,在其他实施例中,所述处理器12执行所述基于图像的文本识别程序10时,在所述变换步骤之前,还可实现以下步骤:
基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。
在本实施例中,所述基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像,包括:
获取所述待识别图像的像素角点,计算所述像素角点在无畸变图像上的坐标;
根据所述像素角点在所述无畸变图像上的坐标计算透视变换矩阵;及
根据所述透视变换矩阵对所述待识别图像进行畸变校正,生成所述畸变校正后的待识别图像。
在本实施例中,通过对原始存在畸变的待识别图像上的像素角点进行畸变矫正,获取各个像素角点在无畸变图像上的坐标,其中,像素角点可以是存在畸变的待识别图像的顶点,如果待识别图像为四边形,则是四边形的四个顶点。由于在计算透视变换矩阵时,至少需要四个像素点的对应坐标才能求解,因而,在获取存在畸变的待识别图像上的像素角点时至少需要获取四个像素角点的坐标。以二维码图像为例,可以先从原始的畸变图像中获取图像中的二维码区域的四个像素角点的坐标,即二维码的四个顶点的坐标,然后根据以下公式采用事先标定好的畸变参数求出四个角点在无畸变图像上的坐标:[x,y]=K[u,v],其中,[x,y]为原始畸变图像上的像素角点坐标,[u,v]为无畸变图像上的像素角点坐标,K为畸变参数。
求解出透视变换矩阵后,即可对待识别图像进行透视变换,得到经过畸变校正后的待识别图像,然后执行后续的变换及识别操作。
由于通过畸变矫正来计算像素角点在无畸变图像上的坐标并不是一一映射的,所以可能针对原始畸变图像上的像素角点计算得到的在无畸变图像上坐标并不是唯一的,为了找到像素角点在无畸变图像上的较优的坐标。
在其他实施例中,所述计算所述像素角点在无畸变图像上的坐标,包括:
首先,在所述无畸变图像上确定一个目标像素点,目标像素点的坐标与所述待识别图像上的像素角点的坐标相同;
然后,确定以所述目标像素点为圆心,预设邻域半径为半径的圆形区域内的像素点,作为邻域像素点;
然后,遍历所述无畸变图像上目标像素点的各个邻域像素点,分别计算所述各个邻域像素点在所述待识别图像上的坐标;及
最后,根据所述各个邻域像素点在所述待识别图像上的坐标确定所述像素角点在所述无畸变图像上的坐标。
例如,可以分别根据各个邻域像素点在原始畸变的待识别图像上的坐标计算各个邻域像素点与像素角点的距离,然后将最短距离对应的坐标确定为所述像素角点在无畸变图像上的坐标。在确定原始畸变的待识别图像上各个像素角点在无畸变图像中的坐标时,可以根据原始畸变的待识别图像的畸变程度去灵活地设置邻域半径,当畸变程度较小时,邻域半径可以设置得小一些,这样需要遍历的邻域像素点少一些,可以减少计算量,当畸变程度较大时,可以将邻域半径设置得大一些,这样便可以找到最优的像素点。
本申请还提出一种文本识别装置。
参照图3所示,为本申请文本识别装置较佳实施例的模块示意图。
本实施例所述文本识别装置2根据实现的功能可以包括:模块210-模块270。所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本申请文本识别装置2的一实施例中,关于各模块/单元的功能如下:
接收模块210,用于接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
第一识别模块220,用于将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
第一判断模块230,用于判断所述第一识别结果是否满足第一预设条件;
变换模块240,用于当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
第二识别模块250,用于将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
第二判断模块260,用于判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
反馈判断模块270,用于当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
所述模块210-270所实现的功能或操作步骤均与上文类似,此处不再详述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质中包括基于图像的文本识别程序10,所述基于图像的文本识别程序10被处理器执行时实现所述基于图像的文本识别方法的任意步骤。本申请计算机可读存储介质的具体实施方式与上述方法实施例大致相同,在 此不再赘述。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于图像的文本识别方法,适用于电子设备,其中,该方法包括:
    接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
    第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
    第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
    变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
    第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
    第二判断步骤:判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
    第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  2. 根据权利要求1所述的基于图像的文本识别方法,其中,所述基于图像的文本识别方法还包括:
    第二生成步骤:当判断不存在满足所述第二预设条件的第二识别结果时,基于所述第一识别结果及所述多个第二识别结果确定所述每个第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  3. 根据权利要求1所述的基于图像的文本识别方法,其中,所述基于图像的文本识别方法还包括:
    第三生成步骤:当判断所述第一识别结果满足所述第一预设条件时,基于所述第一识别结果生成目标识别结果,并向所述用户展示所述目标识别结果。
  4. 根据权利要求1至3中任意一项所述的基于图像的文本识别方法,其中,在所述变换步骤之前,该方法还包括:
    基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像。
  5. 根据权利要求4所述的基于图像的文本识别方法,其中,所述基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像,包括:
    获取所述待识别图像的像素角点,计算所述像素角点在无畸变图像上的坐标;
    根据所述像素角点在所述无畸变图像上的坐标计算透视变换矩阵;及
    根据所述透视变换矩阵对所述待识别图像进行畸变校正,生成所述畸变校正后的待识别图像。
  6. 根据权利要求5所述的基于图像的文本识别方法,其中,所述计算所述像素角点在无畸变图像上的坐标,包括:
    在所述无畸变图像上确定一个目标像素点,目标像素点的坐标与所述待识别图像上的像素角点的坐标相同;
    确定以所述目标像素点为圆心,预设邻域半径为半径的圆形区域内的像素点,作为邻域像素点;
    遍历所述无畸变图像上目标像素点的各个邻域像素点,分别计算所述各个邻域像素点在所述待识别图像上的坐标;及
    根据所述各个邻域像素点在所述待识别图像上的坐标确定所述像素角点在所述无畸变图像上的坐标。
  7. 根据权利要求1所述的基于图像的文本识别方法,其中,所述预设变换算法为随机透视变换算法。
  8. 根据权利要求1所述的基于图像的文本识别方法,其中,所述第一识别结果还包括所述多个第一文本框对应的第一文本信息及第一置信度;所述判断所述第一识别结果是否满足第一预设条件,包括:
    从所述第一识别结果获取所述第一文本信息对应的第一置信度,判断所述第一置信度是否超过预设置信度阈值;及
    若是,则判断所述第一识别结果满足所述第一预设条件,若否,则判断所述第一识别结果不满足所述第一预设条件;
    所述第二识别结果包括所述第一文本框对应的多个第二文本框对应的第二文本信息及第二置信度;所述判断所述第一文本框对应的多个第二识别结果是否存在满足第二预设条件的第二识别结果,包括:
    分别从所述第一文本框对应的多个第二识别结果中获取所述第二文本信息对应的第二置信度,判断所述第二置信度是否超过预设置信度阈值;及
    若是,则判断所述第一文本框对应的多个第二识别结果存在满足所述第二预设条件的第二识别结果,若否,则判断所述第一文本框对应的多个第二识别结果不存在满足所述第二预设条件的第二识别结果。
  9. 一种电子设备,其中,该设备包括存储器及处理器,所述存储器中存储有可在所述处理器上运行的基于图像的文本识别程序,所述基于图像的文本识别程序被所述处理器执行时可实现如下步骤:
    接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
    第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
    第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
    变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
    第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
    第二判断步骤:判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
    第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  10. 根据权利要求9所述的电子设备,其中,所述基于图像的文本识别程序被所述处理器执行时还可实现如下步骤:
    第二生成步骤:当判断不存在满足所述第二预设条件的第二识别结果时,基于所述第一识别结果及所述多个第二识别结果确定所述每个第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  11. 根据权利要求9所述的电子设备,其中,所述基于图像的文本识别程序被所述处理器执行时还可实现如下步骤:
    第三生成步骤:当判断所述第一识别结果满足所述第一预设条件时,基于所述第一识别结果生成目标识别结果,并向所述用户展示所述目标识别结果。
  12. 根据权利要求9至11中任意一项所述的电子设备,其中,在所述变换步骤之前,所述基于图像的文本识别程序被所述处理器执行时还可实现如下步骤:
    基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图 像。
  13. 根据权利要求12所述的电子设备,其中,所述基于预设畸变校正规则对所述待识别图像进行畸变校正,得到畸变校正后的待识别图像,包括:
    获取所述待识别图像的像素角点,计算所述像素角点在无畸变图像上的坐标;
    根据所述像素角点在所述无畸变图像上的坐标计算透视变换矩阵;及
    根据所述透视变换矩阵对所述待识别图像进行畸变校正,生成所述畸变校正后的待识别图像。
  14. 根据权利要求13所述的电子设备,其中,所述计算所述像素角点在无畸变图像上的坐标,包括:
    在所述无畸变图像上确定一个目标像素点,目标像素点的坐标与所述待识别图像上的像素角点的坐标相同;
    确定以所述目标像素点为圆心,预设邻域半径为半径的圆形区域内的像素点,作为邻域像素点;
    遍历所述无畸变图像上目标像素点的各个邻域像素点,分别计算所述各个邻域像素点在所述待识别图像上的坐标;及
    根据所述各个邻域像素点在所述待识别图像上的坐标确定所述像素角点在所述无畸变图像上的坐标。
  15. 根据权利要求9所述的电子设备,其中,所述预设变换算法为随机透视变换算法。
  16. 根据权利要求9所述的电子设备,其中,所述第一识别结果还包括所述多个第一文本框对应的第一文本信息及第一置信度;所述判断所述第一识别结果是否满足第一预设条件,包括:
    从所述第一识别结果获取所述第一文本信息对应的第一置信度,判断所述第一置信度是否超过预设置信度阈值;及
    若是,则判断所述第一识别结果满足所述第一预设条件,若否,则判断所述第一识别结果不满足所述第一预设条件;
    所述第二识别结果包括所述第一文本框对应的多个第二文本框对应的第二文本信息及第二置信度;所述判断所述第一文本框对应的多个第二识别结果是否存在满足第二预设条件的第二识别结果,包括:
    分别从所述第一文本框对应的多个第二识别结果中获取所述第二文本信息对应的第二置信度,判断所述第二置信度是否超过预设置信度阈值;及
    若是,则判断所述第一文本框对应的多个第二识别结果存在满足所述第二预设条件的第二识别结果,若否,则判断所述第一文本框对应的多个第二识别结果不存在满足所述第二预设条件的第二识别结果。
  17. 一种基于图像的文本识别装置,其中,该装置包括:
    接收模块,用于接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
    第一识别模块,用于将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
    第一判断模块,用于判断所述第一识别结果是否满足第一预设条件;
    变换模块,用于当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
    第二识别模块,用于将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
    第二判断模块,用于判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
    第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足 第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质中包括基于图像的文本识别程序,所述基于图像的文本识别程序被处理器执行时,可实现如下步骤:
    接收步骤:接收用户发出的文本识别指令,所述文本识别指令中包括待识别图像;
    第一识别步骤:将所述待识别图像输入预设识别模型中,得到第一识别结果,包括多个第一文本框;
    第一判断步骤:判断所述第一识别结果是否满足第一预设条件;
    变换步骤:当判断所述第一识别结果不满足所述第一预设条件时,基于预设变换算法对所述第一文本框进行多次变换,得到每个所述第一文本框对应的多个第二文本框;
    第二识别步骤:将所述第一文本框对应的多个第二文本框输入所述识别模型中,得到所述第一文本框对应的多个第二识别结果;
    第二判断步骤:判断所述第一文本框对应的多个第二识别结果中是否存在满足第二预设条件的第二识别结果;及
    第一生成步骤,当判断存在满足所述第二预设条件的第二识别结果时,基于所述满足第二预设条件的第二识别结果确定所述第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述基于图像的文本识别程序被处理器执行时,还可实现如下步骤:
    第二生成步骤:当判断不存在满足所述第二预设条件的第二识别结果时,基于所述第一识别结果及所述多个第二识别结果确定所述每个第一文本框对应的目标文本信息,生成目标识别结果,并向所述用户展示所述目标识别结果。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述基于图像的文本识别程序被处理器执行时,还可实现如下步骤:
    第三生成步骤:当判断所述第一识别结果满足所述第一预设条件时,基于所述第一识别结果生成目标识别结果,并向所述用户展示所述目标识别结果。
PCT/CN2020/093563 2020-01-22 2020-05-30 基于图像的文本识别方法、装置、电子设备及存储介质 WO2021147219A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010076369.6A CN111291753B (zh) 2020-01-22 2020-01-22 基于图像的文本识别方法、装置及存储介质
CN202010076369.6 2020-01-22

Publications (1)

Publication Number Publication Date
WO2021147219A1 true WO2021147219A1 (zh) 2021-07-29

Family

ID=71024405

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093563 WO2021147219A1 (zh) 2020-01-22 2020-05-30 基于图像的文本识别方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111291753B (zh)
WO (1) WO2021147219A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092087A (zh) * 2023-04-10 2023-05-09 上海蜜度信息技术有限公司 Ocr识别方法、系统、存储介质及电子设备
CN116311301A (zh) * 2023-02-17 2023-06-23 北京感易智能科技有限公司 无线表格识别方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112078593B (zh) * 2020-07-24 2021-12-21 西安电子科技大学 基于多种网络协同模型的自动驾驶系统及方法
CN112396050B (zh) * 2020-12-02 2023-09-15 度小满科技(北京)有限公司 图像的处理方法、设备以及存储介质
CN116152473B (zh) * 2022-12-26 2023-08-08 深圳市数聚能源科技有限公司 一种二维图片转换成ar图像降低黑像素干扰的方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5513304A (en) * 1993-04-19 1996-04-30 Xerox Corporation Method and apparatus for enhanced automatic determination of text line dependent parameters
CN103714327A (zh) * 2013-12-30 2014-04-09 上海合合信息科技发展有限公司 一种图像方向校正方法及系统
CN108446698A (zh) * 2018-03-15 2018-08-24 腾讯大地通途(北京)科技有限公司 在图像中检测文本的方法、装置、介质及电子设备
CN109902768A (zh) * 2019-04-26 2019-06-18 上海肇观电子科技有限公司 光学字符识别技术的输出结果的处理
US20190286896A1 (en) * 2018-03-15 2019-09-19 Sureprep, Llc System and method for automatic detection and verification of optical character recognition data
CN110659633A (zh) * 2019-08-15 2020-01-07 坎德拉(深圳)科技创新有限公司 图像文本信息的识别方法、装置以及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592124B (zh) * 2011-01-13 2013-11-27 汉王科技股份有限公司 文本图像的几何校正方法、装置和双目立体视觉系统
RU2571379C2 (ru) * 2013-12-25 2015-12-20 Общество с ограниченной ответственностью "Аби Девелопмент" Интеллектуальная обработка электронного документа
US10489645B2 (en) * 2018-03-15 2019-11-26 Sureprep, Llc System and method for automatic detection and verification of optical character recognition data
CN109409366B (zh) * 2018-10-30 2022-04-05 四川长虹电器股份有限公司 基于角点检测的畸变图像校正方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5513304A (en) * 1993-04-19 1996-04-30 Xerox Corporation Method and apparatus for enhanced automatic determination of text line dependent parameters
CN103714327A (zh) * 2013-12-30 2014-04-09 上海合合信息科技发展有限公司 一种图像方向校正方法及系统
CN108446698A (zh) * 2018-03-15 2018-08-24 腾讯大地通途(北京)科技有限公司 在图像中检测文本的方法、装置、介质及电子设备
US20190286896A1 (en) * 2018-03-15 2019-09-19 Sureprep, Llc System and method for automatic detection and verification of optical character recognition data
CN109902768A (zh) * 2019-04-26 2019-06-18 上海肇观电子科技有限公司 光学字符识别技术的输出结果的处理
CN110659633A (zh) * 2019-08-15 2020-01-07 坎德拉(深圳)科技创新有限公司 图像文本信息的识别方法、装置以及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311301A (zh) * 2023-02-17 2023-06-23 北京感易智能科技有限公司 无线表格识别方法及系统
CN116311301B (zh) * 2023-02-17 2024-06-07 北京感易智能科技有限公司 无线表格识别方法及系统
CN116092087A (zh) * 2023-04-10 2023-05-09 上海蜜度信息技术有限公司 Ocr识别方法、系统、存储介质及电子设备
CN116092087B (zh) * 2023-04-10 2023-08-08 上海蜜度信息技术有限公司 Ocr识别方法、系统、存储介质及电子设备

Also Published As

Publication number Publication date
CN111291753A (zh) 2020-06-16
CN111291753B (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
WO2021147219A1 (zh) 基于图像的文本识别方法、装置、电子设备及存储介质
US20210201445A1 (en) Image cropping method
US10657600B2 (en) Systems and methods for mobile image capture and processing
WO2021012382A1 (zh) 配置聊天机器人的方法、装置、计算机设备和存储介质
CN108345882B (zh) 用于图像识别的方法、装置、设备和计算机可读存储介质
CN109685870B (zh) 信息标注方法及装置、标注设备及存储介质
WO2021147221A1 (zh) 文本识别方法、装置、电子设备及存储介质
CN108769803B (zh) 带边框视频的识别方法、裁剪方法、系统、设备及介质
CN111553251B (zh) 证件四角残缺检测方法、装置、设备及存储介质
WO2022105569A1 (zh) 页面方向识别方法、装置、设备及计算机可读存储介质
CN108182457B (zh) 用于生成信息的方法和装置
CN111310426A (zh) 基于ocr的表格版式恢复方法、装置及存储介质
JP2022541977A (ja) 画像のラベリング方法、装置、電子機器及び記憶媒体
WO2018152710A1 (zh) 图像校正的方法及装置
WO2020228171A1 (zh) 数据增强方法、装置及计算机可读存储介质
CN112651399A (zh) 检测倾斜图像中同行文字的方法及其相关设备
CN110717060A (zh) 图像mask的过滤方法、装置及存储介质
WO2022105120A1 (zh) 图片文字检测方法、装置、计算机设备及存储介质
US9330310B2 (en) Methods and devices for obtaining card information
CN111325104B (zh) 文本识别方法、装置及存储介质
CN113538291B (zh) 卡证图像倾斜校正方法、装置、计算机设备和存储介质
CN110321788B (zh) 训练数据处理方法、装置、设备及计算机可读存储介质
CN110991270B (zh) 文本识别的方法、装置、电子设备和存储介质
CN118158512A (zh) 一种白板的画面校正方法、装置及存储介质
CN117911720A (zh) 图像特征的提取方法、图像处理方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20916191

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20916191

Country of ref document: EP

Kind code of ref document: A1