WO2020155763A1 - Ocr识别方法及其电子设备 - Google Patents

Ocr识别方法及其电子设备 Download PDF

Info

Publication number
WO2020155763A1
WO2020155763A1 PCT/CN2019/117914 CN2019117914W WO2020155763A1 WO 2020155763 A1 WO2020155763 A1 WO 2020155763A1 CN 2019117914 W CN2019117914 W CN 2019117914W WO 2020155763 A1 WO2020155763 A1 WO 2020155763A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition
information
image
text information
ocr
Prior art date
Application number
PCT/CN2019/117914
Other languages
English (en)
French (fr)
Inventor
许洋
刘鹏
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155763A1 publication Critical patent/WO2020155763A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of image recognition, and more specifically, to an OCR recognition method and electronic equipment.
  • OCR Optical Character Recognition, Optical Character Recognition
  • Optical Character Recognition mainly generates text output by recognizing the optical characters displayed on the carrier.
  • OCR recognition of paper documents by collecting the optical characters obtained from the printed matter on the paper documents and recognizing them, data such as text information can be obtained.
  • the OCR recognition method in the prior art often relies on the characteristics of the recognized object to customize the personalized template. For example, for the recognized objects such as bills, newspapers, teaching materials, and even optical character recognition for different font sizes and fonts, it needs to be re- Only by customizing the corresponding optical character recognition template can the specific optical character recognition template be used for recognition.
  • the training data volume of the customized optical character recognition template is very high, the training time is long, the efficiency of the customized recognition template is low, and it is difficult to transfer to other recognition objects.
  • the customized optical character recognition template is easily affected by characters. Influenced by changes and other factors, the customized optical character recognition template applied by the OCR recognition method is highly dependent on the object, which affects the efficiency of OCR recognition.
  • this application proposes an OCR recognition method and its electronic equipment, which can solve the problem of long training time, low efficiency of customized recognition templates, and difficult to transfer to other recognition objects.
  • Customized optical character recognition templates are easily affected by character changes. Affected by other factors, the customized optical character recognition template is highly dependent on the object, which affects at least one technical defect of the OCR recognition efficiency.
  • This application provides an OCR identification method, including:
  • the general OCR template includes a detection model and a general recognition model, and the general recognition model passes Field image samples of various business types of the business side are obtained through training;
  • the text information and its corresponding position information are synthesized into structured identification data.
  • This application also provides an electronic device, including:
  • a memory for storing processor executable instructions
  • the processor is configured to execute the steps of the OCR identification method of any of the above embodiments.
  • the present application also provides a non-transitory computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the mobile terminal, the mobile terminal can execute the OCR identification method of any of the foregoing embodiments.
  • the application also provides an OCR identification device, which includes a unit for executing the OCR identification method of the application.
  • the present application also provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, the computer program includes program instructions, the program instructions when executed by a processor At this time, the processor is caused to execute the OCR identification method of the present application.
  • the solution provided by this application-the OCR identification method and its electronic equipment acquires the image to be identified of the business party data; inputs the image to be identified into a general OCR template for identification, and obtains the record on the image to be identified
  • the text information and its corresponding location information wherein, the general OCR template includes a detection model and a general recognition model, and the general recognition model is obtained by training field image samples of various business types of the business party; and the text information
  • the technical solution of synthesizing structured identification data with its corresponding location information can efficiently and quickly identify the image of the object to be identified (for example, contract, invoice, bill, certificate, etc.) through the universal OCR template, and generate structured identification data. Complete the recognition between optical characters and text information.
  • the general OCR template used in this application has short training time, strong adaptability, can adapt to a variety of different objects to be identified, has high recognition accuracy and high overall efficiency.
  • FIG. 1 shows a method flowchart of an OCR identification method in an embodiment of the present application
  • FIG. 2 shows a schematic diagram of an invoice sample identified by the OCR identification method in an embodiment of the present application
  • Fig. 3 shows a schematic flow chart of the method for training a general recognition model according to the business type in this application
  • FIG. 4 shows a schematic flowchart of a method for constructing a universal recognition model in an embodiment of the present application
  • FIG. 5 shows a schematic flowchart of a method for training a detection model based on sub-images of pre-labeled fields in this application
  • Figure 6 shows a schematic diagram of the method for generating a detection model based on row height information and length information in this application
  • FIG. 7 shows a schematic flowchart of a method for adjusting model parameters according to the recognition accuracy rate in this application
  • FIG. 8 shows a schematic flow chart of the application for verifying whether structured identification data meets the verification conditions
  • FIG. 9 shows a schematic diagram of a contract sample identified by the OCR identification method in an embodiment of the present application.
  • FIG. 10 shows a block diagram of a part of the structure related to a terminal according to an embodiment of the present application.
  • FIG. 1 shows a method flowchart of the OCR identification method in an embodiment of the present application.
  • OCR refers to optical character recognition
  • the OCR recognition method includes:
  • Step S11 Obtain the to-be-identified image of the business party data.
  • the OCR recognition method in this application can be applied to the process of OCR template development to develop a universal OCR recognition template.
  • the business party refers to the party that needs the OCR template.
  • the image to be recognized refers to the image information obtained by the object to be recognized in the OCR recognition method through photographing, scanning, etc.
  • the text information recorded in optical characters on the image to be recognized is converted into text information and output.
  • Step S12 Input the image to be recognized into a general OCR template for recognition, and obtain the text information recorded on the image to be recognized and its corresponding position information; wherein, the general OCR template includes a detection model and a general recognition model.
  • the recognition model is obtained through training of field image samples of various business types of the business side;
  • the image to be recognized is input into a general OCR template for recognition.
  • the general OCR template includes a detection model and a general recognition model. Among them, the detection model recognizes the position of the corresponding text information and intercepts the corresponding position of the image to be recognized, and then transfers to the general recognition model for text recognition.
  • Step S13 Synthesize the text information and its corresponding location information into structured identification data.
  • the detection model when the detection model recognizes the location of the corresponding text information, it corresponds to the structured information that matches the location of the text information.
  • the structured information can be the type, classification, and characteristics of the text information.
  • the structured information can be the ID number, zip code, card number, identification code, etc., which are used to prompt the corresponding text content type.
  • the general recognition model recognizes and obtains text information, and generates structured data according to the combination of the aforementioned structured information and text information.
  • Figure 2 shows the to-be-identified image of the invoice sample identified by the OCR identification method. Use the method of this application to identify the invoice sample:
  • the to-be-recognized image of the invoice sample is input into a general OCR template for recognition, and the text information recorded on the to-be-recognized image and its corresponding location information are obtained.
  • the process includes:
  • the detection model of the universal OCR template identifies the area A where the "tax identification number" in the invoice sample is located, which is the corresponding location information.
  • the detection model intercepts the image corresponding to the area A where the "tax identification number” is located (may be called the "sub-image to be identified") and sends it to the general recognition model;
  • the universal recognition model of the universal OCR template recognizes the image corresponding to the area where the tax identification number A in the invoice sample is located, and the text information "12345" is obtained through the recognition of the mapping relationship between the optical characters and the text.
  • the general OCR template can also output structured data from text information and structured information obtained by matching location information. For example, in this example, the general OCR template can match the structured information of the corresponding "tax identification number” based on the location information of area A in the invoice sample, and combine the obtained text information "12345” and the structured information "tax payment”. "Identification number” is combined with structured identification data "Tax identification number: 12345” and output.
  • the above-mentioned OCR recognition method compared with the prior art that uses the traditional OCR recognition model, requires a large amount of data to train the model for positioning and text recognition, and retraining is required every time a recognition object is changed.
  • a model for positioning and text recognition At this time, the amount of data required for training is large and the training time is long, which severely restricts the efficiency of OCR recognition.
  • the technical solution of the present application can realize the conversion from optical information to text information for most optical characters by adopting a universal OCR template, and does not require training for each object to be recognized. Therefore, when establishing a general OCR template, you can continue to use the trained general recognition model instead of separately training the general recognition model. This saves training time, reduces the requirement for training data, and forms OCR recognition more quickly The template ultimately improves the overall efficiency of OCR recognition.
  • this embodiment also provides a technical solution for training the general recognition model, and the OCR recognition method further includes:
  • Step S31 Determine each type of business handled by the business party from the business party data.
  • the classification of the business type is determined for the business party data from the business party.
  • the type of business mainly refers to the type of identification object or the type related to the business of the business party, such as invoice identification business, certificate identification business, textbook identification business, packaging identification business or manual identification business.
  • Step S32 Obtain corresponding samples according to each service type.
  • the image to be identified in the sample can be derived from 50% of the invoice sample, 30% of the contract sample, and 20% of the reimbursement form sample.
  • Step S33 Use the sample training to obtain a general recognition model.
  • the foregoing samples of each service type are used to train a general recognition model, and the samples of different service types can enable the general recognition model obtained by training to effectively recognize the optical characters of the objects to be recognized in different service types.
  • this embodiment also provides a technical solution.
  • the recognition model used in other service types can be obtained, and samples of different service types can be used to adapt the recognition model. Train to get a general recognition model.
  • the recognition model itself has a high recognition rate
  • the recognition rate of samples of different business types can be tested.
  • the recognition model can be used as a general recognition model.
  • an OCR recognition method is also provided in an embodiment of this application, wherein the step of obtaining a general recognition model by using the sample training S33, including:
  • Step S41 Extract the text feature information of the text information recorded in the training image in the sample.
  • Characteristic information of text refers to the characteristic information of the font itself, which can reflect the carrier of text information. Since there may be multiple pieces of text information in the same training image, if the fonts of these text information are the same, that is, they have the same text feature information, they can be extracted at the same time. When there are multiple pieces of text information in the same training image with different fonts, it is necessary to intercept or label specific text information. According to the characteristics of different fonts, the characteristics of the font of the text information are removed, and only the characteristic information of the font itself used to express the shape, that is, the character characteristic information, is retained.
  • the embodiment of the present application also provides a solution by extracting the main structure of each character in the text information as the font feature information.
  • some optical characters that are less effective in recognizing and confirming the text will be filtered out, such as the characteristics of the stroke, the end of the stroke, and the thickness of the stroke.
  • Step S42 Obtain training text information corresponding to the text feature information, analyze the correspondence between the text feature information and the training text information, and obtain mapping information.
  • the mapping relationship between the text feature information and the training text information is obtained.
  • Step S43 Construct a general recognition model according to the mapping information.
  • mapping information reflecting the mapping relationship between the font feature information and the text information
  • a universal recognition model is constructed.
  • the general recognition model constructed by extracting the text feature information reflecting the main features of the text can effectively recognize text information of different fonts and font sizes.
  • the step S12 of inputting the image to be recognized into a general OCR template for recognition to obtain the text information recorded on the image to be recognized and its corresponding position information includes:
  • Step S51 Obtain the training image of the sub-image position of the pre-labeled field.
  • the positions of the field sub-images are pre-marked on the invoice sample, such as the “tax identification number” area A and the “unit name” area B in FIG. 2.
  • Step S52 Extract location feature information of the text information, and construct the detection model according to the location feature information.
  • the location feature information of the text information corresponding to the above-mentioned "tax identification number” and “unit name” is extracted according to the relative distances and margins of the above-mentioned area A, area B and other areas on the image.
  • the detection model needs to be trained or constructed according to different objects to be recognized.
  • the detection model is used to identify the location of the text information in the object to be recognized and capture the image at the corresponding location.
  • the length of the text information in the object to be recognized is often inconsistent.
  • the length of the text information of the "unit name" of the area B in FIG. 2 may be different, which will affect the area or length of the image to be recognized by the detection model.
  • the length of the text information in the area A where the "tax identification number" is recorded is different from the area C where the "header" of the invoice is recorded.
  • the embodiment of the present application also provides a technical solution for training a detection model of a variable-length recognition range to facilitate the recognition of display areas of different areas and different shapes.
  • the extraction in the OCR recognition method According to the location feature information of the text information, the step S52 of constructing the detection model according to the location feature information includes:
  • Step S61 Segment the training image used for training the detection model according to the line height information of the text information to obtain training sub-images.
  • row height information can be obtained through input.
  • the line spacing can be determined by the arrangement of optical characters, and then the line height can be determined according to the technical solution of the line spacing to extract the line height information in the sample.
  • the technical solution in this application may be to perform regional recognition of the optical characters in each text information display area. For example, by extracting the edge lines of the optical characters in the area A of the "tax identification number", and expanding the edge lines of the optical characters outward by a set margin, the line height information of the text information in the corresponding area A can be obtained.
  • the training image (that is, the invoice sample in Figure 2) is segmented.
  • Several display areas are divided into multiple training sub-images. Take area A as an example.
  • image segmentation of optical characters in area A It is a number of small segments, that is, a training sub-image of region A is generated.
  • Step S62 Input the training sub-image into the fully connected network model, and calculate the confidence in the training sub-image by recognizing characters in the character database.
  • the fully connected network model After obtaining the training sub-images corresponding to the area A, input the multiple training sub-images of the area A into the fully connected network model to output a one-dimensional vector. According to these one-dimensional vectors, the confidence of the corresponding training sub-image is calculated by matching the characters in the recognized character database. The confidence level indicates the possibility of matching between the optical character in the training sub-image and the character in the recognized character database, and within a certain probability range, it can be considered that the corresponding training sub-image has the corresponding character in the recognized character database.
  • the fully connected network model can be constructed through neural convolutional network algorithms.
  • Step S63 Generate length information of the text information according to the confidence of the training sub-image.
  • the length information of the text information can be determined according to the confidence of multiple training sub-images.
  • the row height information is determined, according to the confidence of the multiple training sub-images, there are training sub-images that have the characteristics of recognizing characters in the character database, and the length of the region A is obtained. information.
  • the length information obtained by training the confidence of the sub-images can be used to compare the "goods information" area E and the "tax amount” area F in the invoice sample in Fig. 2 and the "quantity of goods” and "goods” with recognizable optical characters between them. "Price” is divided into regions.
  • Step S64 Generate the position characteristic information of the text information from the line height information and the length information of the text information.
  • Combining the line height information and length information of the text information can determine the location feature information of the text information, and the location feature information of the text information can indicate the location and range of the area A.
  • Step S65 Construct the detection model according to the location feature information.
  • the location feature information of the area A where text information needs to be extracted, and the mapping relationship between the two, a detection model is constructed.
  • further training is required through training images similar to the invoice sample until the detection accuracy of the detection model reaches the preset requirement.
  • the edge line of the optical characters can be recognized only above the area D, which can reduce the amount of calculation and quickly determine the corresponding area .
  • the text information of area D is relatively compact, and there is no large gap inside the text information, and the corresponding optical characters will be relatively compact.
  • the optical character edge lines at both ends of the area D can also be used to determine the area D Range, at this time, the location feature information of area D can be directly obtained.
  • the direct extraction of the optical character edge line may cause the area E, the area F and the two The display areas between the participants are combined into one area. Therefore, in this embodiment, this problem can be overcome by the above-mentioned solution from step S61 to step S65.
  • this embodiment provides a technical solution for accuracy evaluation and corresponding adjustment of model parameters.
  • the OCR recognition method after the step S13 of synthesizing the text information and its corresponding position information into structured recognition data, it further includes:
  • Step S71 Perform accuracy evaluation on the structured recognition data to obtain the recognition accuracy rate.
  • the recognition accuracy rate of the structured recognition data is calculated.
  • the recognition accuracy can be evaluated based on the structured recognition data output from multiple images to be recognized.
  • Step S72 Adjust the model parameters of the general OCR template according to the recognition accuracy to generate the adjusted general OCR template.
  • the relevant recognition parameters of the general recognition model can be adjusted according to the recognition accuracy.
  • the neural convolution network algorithm can be used to further optimize the general recognition model in the general OCR template during the recognition process. Generate a new general OCR template based on the general recognition model and detection model after parameter optimization. Subsequent OCR recognition uses a new universal OCR template.
  • the method further includes:
  • Step S81 Verify whether the structured identification data meets the verification conditions.
  • Step S82 If not, input the to-be-recognized image corresponding to the text information that does not meet the verification conditions in the structured recognition data into the adjusted general OCR template for re-recognition.
  • Step S83 If yes, output the structured identification data.
  • a verification formula or joint verification it is verified whether the structured identification data meets the verification conditions. After obtaining the structured identification data, take the structured identification data "Tax Identification Number: 12345" as an example. It can be verified manually or by a verification formula corresponding to the structured identification data.
  • the structured identification data obtained by the identification is "ID number: 4401*11999****2459" (in order to avoid privacy risks, the data in certain locations is masked with "*", and here is When there are corresponding numbers in the actual recognition scene), whether the above-mentioned structured recognition data is accurately recognized can be obtained through the verification formula of the ID card number.
  • the content of the check includes the number of digits of the structured identification data, the structure, the end check code and so on.
  • structured identification data can also be combined with other types of structured identification data for joint verification, such as "ID number: 4401*11999****2459” and "location of household registration: Guangzhou, Guangdong province” in the structured identification data. City Tianhe District...” At this time, it can be judged that the first four digits of the ID card number in the structured identification data are recognized accurately, and step S83 can be executed.
  • step S82 is executed, and the image to be recognized is input into the adjusted general OCR template for secondary recognition. Furthermore, you can continue to check until the recognition accuracy reaches the requirement, otherwise continue to adjust the corresponding model parameters.
  • the image to be recognized is input into the general OCR template to perform
  • the method further includes: recognizing the relative positions of the multiple pieces of text information on the image to be recognized according to the detection model, and combining the multiple pieces of text information in sequence.
  • the multiple pieces of text information are combined in sequence according to the specific position where the text information appears.
  • the invoice sample in Fig. 2 as an exemplary example, when there are multiple pieces of goods information in the area E of the "goods information", for example, it is an invoice issued when a user purchases fruit and purchases apples, bananas, and pears.
  • the paper invoice issued by the user is the object to be identified.
  • the image of the invoice issued by the user for purchasing fruit is input into the general OCR template, and three pieces of corresponding text information-"apple, banana, and Sydney" are obtained in area E.
  • the above-mentioned three pieces of corresponding text information are combined according to the relative positions.
  • the text information is displayed in a structured form according to the structured information of the relative position in three lines.
  • the recognized text information and corresponding structured information it is combined into an invoice style for display.
  • several items identified in Figure 2 can also be used as multiple pieces of text information.
  • the general OCR template is spliced and displayed in the structured invoice template according to its relative position. This embodiment also provides a solution.
  • the general OCR template recognizes the information that cannot be displayed in text, for example, the official seal information of a certain unit is recorded in the “signature” area H in the invoice sample in FIG.
  • the OCR template can collect the image information of the area H and vectorize it to generate a signature vector diagram.
  • the signature vector diagram is spliced into the above invoice template.
  • the general OCR template recognizes that it cannot correspond to the information displayed in text, for example, the password information recorded in the "password area" area G in the invoice sample in FIG. Code form display.
  • the above-mentioned general OCR template can collect image information of area G, and obtain corresponding password information by recognizing the two-dimensional code.
  • the password information is spliced into the above-mentioned invoice template in a clear text or reversely generated barcode.
  • this embodiment also provides an OCR recognition method on the basis of the above solution, in which multiple pieces of text information are recognized according to the detection model.
  • the method further includes: adjusting the positioning pitch parameters in the detection model of the general OCR template according to the relative positions of the multiple pieces of text information on the image information.
  • the line spacing between multiple pieces of text information and the spacing information between the text in the same line are re-determined.
  • the positioning interval parameter can be used to locate the interval between each word in the text information and the line spacing between multiple pieces of text information when the detection model recognizes the image to be recognized, so that the detection model can intercept corresponding image information.
  • the spacing information can be obtained by comparing the display areas of the identified corresponding fields with the recognized text information when the font size, the same line spacing, and the spacing of the individual characters are the same.
  • This embodiment also provides an electronic device correspondingly, including:
  • a memory for storing processor executable instructions
  • the processor is configured to execute the steps of the OCR identification method in any one of the foregoing embodiments.
  • this embodiment can also identify objects of contracts, bills, and certificates waiting to be identified.
  • this embodiment will now be further explained in conjunction with the to-be-identified image of the contract sample in FIG. 9.
  • the above OCR identification method includes:
  • Step S11 Obtain the to-be-identified image of the business party data.
  • the method of acquiring the to-be-identified image may be to obtain the to-be-identified image by scanning or photographing.
  • Step S12 Input the image to be recognized into a general OCR template for recognition, and obtain the text information recorded on the image to be recognized and its corresponding position information; wherein, the general OCR template includes a detection model and a general recognition model.
  • the recognition model is obtained through training of field image samples of various business types of the business side.
  • the to-be-recognized image of the contract sample is input into a general OCR template for recognition, and the text information recorded on the to-be-recognized image of the contract sample and its corresponding location information are obtained.
  • the detection model in the general OCR template for identifying contract samples needs to be trained using the business party data provided by the business party, and the business party data includes the same type of training image as the contract sample as the training object.
  • the trained detection model can intercept the corresponding sub-image to be recognized according to the position of the text information in the contract sample for the general recognition model to recognize optical characters to text information.
  • the detection model needs to be retrained, while the general recognition model may not require repeated training. For example, before identifying contract samples, only the detection model needs to be trained accordingly, and the general recognition model can use the general recognition model in the general OCR template when recognizing invoice samples.
  • the general OCR template identifies the "contract name” area I, "party information” area J, “contract body” area K, “signature information” area L, and "signature and date” area M in the contract sample. Wait for the sub-images to be recognized in the area to obtain the corresponding text information.
  • the detection model in the general OCR template will detect the "contract name” area I, "party information” area J, “contract body” area K, and "signature" through the pre-trained mapping relationship.
  • the general OCR template can also match structured information according to the relative position of each area. These structured information can be "contract name", "party information”, “contract body”, “signature information”, "signature and date” Wait for the information corresponding to the recognition area.
  • Step S13 Synthesize the text information and its corresponding location information into structured identification data.
  • the structured recognition data is generated according to the text information obtained by the recognition of the general OCR template and the corresponding position information.
  • the detection model obtains the relative position of each display area in the contract sample to be identified to generate a contract template corresponding to the contract sample in FIG. 9.
  • the text information is written into the contract template according to the corresponding location in the location information to generate structured identification data.
  • the structured information obtained by general OCR template matching can also be used to generate structured recognition data by combining the recognized text information and its corresponding position information.
  • This embodiment also provides an electronic device, including:
  • a memory for storing processor executable instructions
  • the processor is configured to execute the steps of the OCR identification method of any of the above embodiments.
  • the electronic device provided by the embodiment of the present application is shown in FIG. 10.
  • the terminal can be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales, sales terminal), a car computer, etc. Take the terminal as a mobile phone as an example:
  • FIG. 10 shows a block diagram of a part of the structure of a mobile phone related to a terminal provided in an embodiment of the present application.
  • the mobile phone includes: a radio frequency (RF) circuit 1010, a memory 1020, an input unit 1030, a display unit 1040, a sensor 1010, an audio circuit 1060, a wireless fidelity (WiFi) module 1070, and a processor 1080 , And power supply 1090 and other components.
  • RF radio frequency
  • the RF circuit 1010 can be used for receiving and sending signals during the process of sending and receiving information or talking. In particular, after receiving the downlink information of the base station, it is processed by the processor 1080; in addition, the designed uplink data is sent to the base station.
  • the RF circuit 1010 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
  • the RF circuit 1010 can also communicate with the network and other devices through wireless communication.
  • the above wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Messaging Service
  • the memory 1020 may be used to store software programs and modules.
  • the processor 1080 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1020.
  • the memory 1020 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones.
  • the memory 1020 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 1030 can be used to receive inputted digital or character information, and generate key signal input related to the user settings and function control of the mobile phone.
  • the input unit 1030 may include a touch panel 1031 and other input devices 1032.
  • the touch panel 1031 also known as a touch screen, can collect user touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 1031 or near the touch panel 1031. Operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1031 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 1080, and can receive and execute the commands sent by the processor 1080.
  • the touch panel 1031 can be realized by various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 1030 may also include other input devices 1032.
  • other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, joystick, and the like.
  • the display unit 1040 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
  • the display unit 1040 may include a display panel 1041.
  • the display panel 1041 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
  • the touch panel 1031 can cover the display panel 1041. When the touch panel 1031 detects a touch operation on or near it, it transmits it to the processor 1080 to determine the type of touch event, and then the processor 1080 responds to the touch event. Type provides corresponding visual output on the display panel 1041.
  • the touch panel 1031 and the display panel 1041 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 1031 and the display panel 1041 can be integrated. Realize the input and output functions of mobile phones.
  • the mobile phone may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor can include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 1041 according to the brightness of the ambient light.
  • the proximity sensor can close the display panel 1041 and/or when the mobile phone is moved to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary.
  • mobile phone posture applications such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile phones, I will not here Repeat.
  • the audio circuit 1060, the speaker 1061, and the microphone 1062 can provide an audio interface between the user and the mobile phone.
  • the audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the speaker 1061 converts it into a sound signal for output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, which is then output by the audio circuit 1060. After being received, it is converted into audio data, and then processed by the audio data output processor 1080, and sent to, for example, another mobile phone via the RF circuit 1010, or the audio data is output to the memory 1020 for further processing.
  • WiFi is a short-distance wireless transmission technology.
  • the mobile phone can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 1070. It provides users with wireless broadband Internet access.
  • FIG. 10 shows the WiFi module 1070, it is understandable that it is not a necessary component of the mobile phone and can be omitted as needed without changing the essence of the invention.
  • the processor 1080 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 1020, and calling data stored in the memory 1020. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
  • the processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, and application programs, etc. , The modem processor mainly deals with wireless communication. It is understandable that the foregoing modem processor may not be integrated into the processor 1080.
  • the mobile phone also includes a power supply 1090 (such as a battery) for supplying power to various components.
  • a power supply 1090 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the processor 1080 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.
  • the processor 1080 included in the terminal also has the following functions:
  • the general OCR template includes a detection model and a general recognition model, and the general recognition model passes Field image samples of various business types of the business side are obtained through training;
  • the text information and its corresponding position information are synthesized into structured identification data.
  • the disclosed system, device, and method may be implemented in other ways.
  • the above-described device (electronic device) embodiments are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units or Components can be combined or integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • This embodiment also provides a non-transitory computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the mobile terminal, the mobile terminal can execute the OCR identification method of any of the above embodiments.
  • the image to be recognized is obtained by obtaining the business party data; the image to be recognized is input into a general OCR template for recognition, and the text information recorded on the image to be recognized and its corresponding location information are obtained ;
  • the general OCR template includes a detection model and a general recognition model, the general recognition model is obtained through training of field image samples of various business types of the business side; the text information and its corresponding location information are synthesized and structured
  • the technical solution for identifying data can efficiently and quickly recognize images of objects to be identified (such as contracts, invoices, bills, certificates, etc.) through universal OCR templates, generate structured recognition data, and complete the optical character to text information. Recognition.
  • the general OCR template used in this application has short training time, strong adaptability, can adapt to a variety of different objects to be identified, has high recognition accuracy and high overall efficiency.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium can include: Read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
  • the program can be stored in a computer-readable storage medium.
  • the medium can be read-only memory, magnetic disk or optical disk, etc.

Abstract

一种OCR识别方法和OCR识别电子设备。所述方法包括步骤:获取业务方数据的待识别图像(S11);将待识别图像输入通用OCR模版识别,得到待识别图像记载的文本信息及其对应的位置信息;其中,通用OCR模板包括检测模型和通用识别模型(S12);将文本信息及其对应位置信息合成结构化识别数据(S13)。所述方法能高效快速通过通用OCR模板对待识别对象(例如合同、发票、票据、证件等)的图像进行识别,生成结构化识别数据,完成光学字符到文本信息的识别,其采用的通用OCR模板的训练时间短,适应性强,能适应多种不同的待识别对象,识别准确率高,识别过程整体效率高。

Description

OCR识别方法及其电子设备
本申请要求于2019年01月28日提交中国专利局、申请号为201910078744.8、申请名称为“OCR识别方法及其电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别领域,更具体地,涉及一种OCR识别方法及其电子设备。
背景技术
OCR(Optical Character Recognition,光学字符识别)主要通过对载体上显示的光学字符进行识别,生成文本输出。以纸质文件的OCR识别为例,通过采集纸质文件上的印刷体得到的光学字符,对其进行识别,即可得到文本信息等数据。
现有技术中的OCR识别方法往往依赖于识别的对象的特点,进行个性化的模版定制,例如针对票据、报纸、教材等识别的对象,甚至针对不同字号、字体的光学字符识别,都需要重新定制相应的光学字符识别模板,才能采用特定的光学字符识别模板的进行识别。
现有技术的OCR识别方法中,定制光学字符识别模板的训练数据量很高,训练时间长,定制识别模板的效率低,很难转移到其他识别对象中应用,定制光学字符识别模板容易受字符变化等因素的影响,OCR识别方法应用的定制光学字符识别模板对对象的依赖性强,影响了OCR识别效率。
发明内容
鉴于上述问题,本申请提出了一种OCR识别方法及其电子设备,其能够解决训练时间长,定制识别模板的效率低,难以转移到其他识别对象中应用,定制光学字符识别模板容易受字符变化等因素的影响,定制光学字符识别模板对对象的依赖性强,影响了OCR识别效率的至少一种技术缺陷。
本申请提供一种OCR识别方法,包括:
获取业务方数据的待识别图像;
将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;
将所述文本信息及其对应的位置信息合成结构化识别数据。
本申请还提供一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行上述任一实施例的所述OCR识别方法的步骤。
本申请还提供一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终 端的处理器执行时,使得移动终端能够执行上述任一实施例的所述OCR识别方法。
本申请还提供了一种OCR识别装置,该OCR识别装置包括用于执行本申请的OCR识别方法的单元。
本申请还提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行本申请的OCR识别方法。
相对于现有技术,本申请提供的方案——OCR识别方法及其电子设备,通过获取业务方数据的待识别图像;将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;将所述文本信息及其对应的位置信息合成结构化识别数据的技术方案,能够高效快速地通过通用OCR模板对待识别对象(例如是合同、发票、票据、证件等对象)的图像进行识别,生成结构化识别数据,完成光学字符到文本信息之间的识别。本申请中采用的通用OCR模板的训练时间短,适应性强,能够适应多种不同的待识别对象,识别准确率高,整体的效率高。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图进行说明。
图1示出本申请实施例中OCR识别方法的方法流程图;
图2示出本申请实施例中所述OCR识别方法识别的发票样本的示意图;
图3示出本申请中根据业务类型训练通用识别模型的方法流程示意图;
图4示出本申请实施例中构建通用识别模型的方法流程示意图;
图5示出本申请根据预标注字段子图像训练检测模型的方法流程示意图;
图6示出本申请根据行高信息、长度信息生成检测模型的方法流程示意图
图7示出本申请中根据识别准确率调整模型参数的方法流程示意图;
图8示出本申请验证结构化识别数据是否符合验证条件的流程示意图;
图9示出本申请实施例中OCR识别方法识别的合同样本的示意图;
图10示出本申请实施例提供的终端相关的部分结构的框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
在本申请的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如S1、S21等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可 以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
请参考图1,图1示出图1示出本申请实施例中OCR识别方法的方法流程图。其中,OCR是指光学字符识别,所述OCR识别方法包括:
步骤S11:获取业务方数据的待识别图像。
本申请中的OCR识别方法可以应用到OCR模版开发的过程中,用于开发通用型OCR识别模版。在上述过程中,业务方是指需要OCR模版的一方。待识别图像是指OCR识别方法的待识别对象通过拍照、扫描等方式获得的图像信息。通过OCR识别方法将待识别图像上以光学字符记载的文字信息转化为文本信息输出。
步骤S12:将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;
在上述识别的过程中,将待识别图像输入通用OCR模版进行识别,通用OCR模版中包括检测模型和通用识别模型。其中,检测模型识别对应文本信息的位置并将待识别图像对应位置截取后,转至通用识别模型进行文本识别。
步骤S13:将所述文本信息及其对应的位置信息合成结构化识别数据。
在上述过程中,检测模型识别对应文本信息的位置时,对应匹配该文本信息的位置的结构化信息。结构化信息可以是文本信息的类别、分类、特点等等信息,在一些场景中结构化信息可以是身份证号、邮编、卡号、识别码等等用于提示对应文本内容种类的信息。相应地,通用识别模型识别得到文本信息,根据前述结构化信息与文本信息相结合,生成结构化数据。
为了更好地展示本申请的技术方案,在下文将以一个具体的场景和识别对象结合解释本方案。请参考图2,图2中展示了OCR识别方法识别的发票样本的待识别图像。应用本申请的方法,对发票样本进行识别:
首先,从业务方处获取业务方数据中发票样本的待识别图像,如图2。
其后,将发票样本的待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息。
其中的过程包括:
通用OCR模板的检测模型识别得到发票样本中的“纳税识别号”所在的区域A,就是对应的位置信息。检测模型将“纳税识别号”所在区域A对应的图像(可以称为“待识别子图像”)截取并发送至通用识别模型;
通用OCR模板的通用识别模型识别得到发票样本中纳税识别号A所在区域对应的图像,通过光学字符与文字之间的映射关系的识别得到文本信息“12345”。通用OCR模板还可以将文本信息和根据位置信息匹配得到的结构化信息输出得到结构化数据。例如在本例中,通用OCR模板可以根据在发票样本中的区域A的位置信息,匹配得到相应的“纳 税识别号”的结构化信息,将得到的文本信息“12345”和结构化信息“纳税识别号”合成结构化识别数据“纳税识别号:12345”输出。
上述OCR识别方法,相对于采用传统的OCR识别模型的现有技术,现有技术中需要大量的数据用于训练用于定位和文本识别的模型,而且每次更换一种识别对象都需要重新训练用于定位和文本识别的模型。此时需要的用于训练的数据量很大和训练时间很长,严重制约了OCR识别的效率。本申请的技术方案由于采用通用OCR模板可以对大多数光学字符实现光学信息到文本信息之间的转化,不需要针对每一种待识别对象都进行训练。因此在建立通用OCR模板时,可以继续沿用已经训练的通用识别模型,而不需要单独对通用识别模型进行额外的训练,节省了训练时间,降低了训练数据量的要求,更快速形成OCR识别的模板,最终提升OCR识别的整体的效率。
请参考图3,为了更好实现OCR识别,提升识别效果,本实施例还提供一种技术方案,用于训练所述通用识别模型,所述OCR识别方法还包括:
步骤S31:从业务方数据中确定业务方处理的各个业务类型。
在上述过程中,对来自业务方的业务方数据确定业务类型的分类。业务类型主要是指识别对象的类型或者与业务方业务相关的类型,例如是发票识别业务、证件识别业务、课本识别业务、包装识别业务或者说明书识别业务。
步骤S32:根据各个业务类型分别获取对应的样本。
根据不同的业务类型获取对应业务类型的样本,每一种业务类型都提供相应的样本数量。为了更好地训练通用识别模型,可以根据业务类型的比例相应调整样本中各个业务类型对应样本的占比。
例如,当业务方的发票识别事项达到50%,合同识别事项达到30%,报销单识别事项达到20%。此时,样本中的待识别图像可以来源于50%的发票样本,30%的合同样本,20%的报销单样本。
步骤S33:利用所述样本训练得到通用识别模型。
上述的各个业务类型的样本用于训练通用识别模型,不同业务类型的样本可以令训练得到的通用识别模型对不同业务类型的待识别对象的光学字符进行有效的识别。
为了进一步节省训练的时间,本实施例还提供一种技术方案,在训练通用识别模型之前还可以获取在其他业务类型中应用的识别模型,并采用不同业务类型的样本对该识别模型进行适应性训练,得到通用识别模型。当然,在一些场景下该识别模型本身就具有较高的识别率时,可以测试对不同业务类型的样本的识别率,当识别率达到识别阈值时,可以采用该识别模型作为通用识别模型。
请参考图4,为了更进一步训练更好的通用识别模型,提升通用识别模型的识别准确性,本申请实施例中还提供一种OCR识别方法,其中利用所述样本训练得到通用识别模型的步骤S33,包括:
步骤S41:提取所述样本中训练图像所记载文本信息的文字特征信息。
在上述过程中,样本的训练图像记载的文本信息提取的文字特征信息。文字特征信息 指的是可以反映文本信息的载体——字体本身的特征信息。由于在同一训练图像中可能存在多段文本信息,如果这些文本信息的字体都相同,也就是具有同样的文字特征信息时,可以同时提取。当同一训练图像中存在多段文本信息且字体不相同,此时需要截取或者标注特定的文本信息。根据不同字体的特点,将文本信息的字体的特点除去,仅保留字体本身用于表示外形的特征信息,也就是文字特征信息。
请参考图2中,当发票样本作为训练图像用于构建通用识别模型时,区域A的“纳税识别号”的相关内容可以是以楷体四号字体并以光学字符显示,区域B的“单位名称”可以是以黑体小四号字体并以光学字符显示时,此时需要提取文本信息对应的光学字符,滤去与黑体、楷体相关的字体特征,将字号根据光学字符占据的面积大小按照比例缩小或放大至合适的比例。进一步,在同一展示比例和滤去字体特征后,得到文字以光学字符排布的特征就是文字特征信息。
同样的字体在行书、楷书、黑体等字体的展示下,光学字符都会有区别。即使是相同的文本内容以同样的字体展示,在不同的字号、光学采集条件、环境条件下都会有所区别。因此,本申请的实施例还提供一种方案,通过提取文本信息中每一个字的主体结构作为字体特征信息。提取文本信息中每一个字的主体结构时,将滤去一些对识别确认文字作用较低的光学字符,例如是笔锋、笔画收尾、笔画粗细等特征。
步骤S42:获取文字特征信息对应的训练文本信息,分析文字特征信息与训练文本信息之间的对应关系,得到映射信息。
根据文字特征信息意义对应的文本信息,根据两者之间的关系,得到文字特征信息与训练文本信息之间的映射关系。
步骤S43:根据所述映射信息构建通用识别模型。
根据反映字体特征信息与文本信息之间的映射关系的映射信息构建得到通用识别模型。
通过提取上述反映文字主体特征的文字特征信息构建的通用识别模型能够有效识别不同字体、不同字号的文本信息。
请参考图5,为了得到检测字段位置效果更优的检测模型,提升检测模型的识别准确性,本申请实施例中还提供一种OCR识别方法。其中,将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息的步骤S12,包括:
步骤S51:获取预标注字段子图像位置的训练图像。
以图2中的发票样本作为训练图像时,预先在发票样本上标注字段子图像的位置,例如是图2中的“纳税识别号”区域A和“单位名称”区域B。
步骤S52:提取所述文本信息的位置特征信息,根据所述位置特征信息构建所述检测模型。
根据上述区域A、区域B以及其他区域在图像上的相对距离、边距等等因素提取得到上述“纳税识别号”和“单位名称”对应内容的文本信息所在的位置特征信息。
由于不同识别对象中文本信息对应的位置特征都不尽相同,检测模型需要根据不同的待识别对象进行训练或者构建。检测模型用于识别待识别对象中文本信息所在的位置和截 取对应位置的图像。
待识别对象中的文本信息的长度往往都不一致,例如图2中的区域B的“单位名称”的文本信息的长度可以不同,此时将会影响检测模型截取待识别图像的区域或长度。请参考图2中的发票样本中,记载“纳税识别号”的区域A与记载发票“票头”的区域C的文本信息的长度不一致。
为此请参考图6,本申请的实施例还提供一种技术方案,用于训练变长的识别范围的检测模型以便于识别不同面积和不同形状的显示区域,所述OCR识别方法中的提取所述文本信息的位置特征信息,根据所述位置特征信息构建所述检测模型步骤S52包括:
步骤S61:根据文本信息的行高信息对用于训练检测模型的训练图像进行分割,得到训练子图像。
在上述过程中,行高信息可以通过输入获取的方式得到。更进一步,为了提升效率,可以通过光学字符的排布确定行间距,再根据行间距确定行高的技术方案,提取样本中的行高信息。为了更好地解释本实施例,当图2中的发票样本作为训练检测模型的样本时,本申请中的技术方案可以是对各个文本信息展示区域的光学字符进行区域化识别。例如是对“纳税识别号”的区域A中的光学字符通过提取边缘线,将光学字符边缘线向外拓展设定的边距,就可以得到相应区域A中文本信息的行高信息。
根据上述方法得到的行高信息对训练图像(也就是图2中的发票样本)进行分割将若干展示区域分割为多个训练子图像,以区域A为例,区域A中有光学字符的图像分割为若干小的片段,即生成区域A的训练子图像。
步骤S62:将所述训练子图像输入全连接网络模型,通过识别字符数据库中的字符,计算得到在训练子图像的置信度。
得到区域A对应的训练子图像后,将上述区域A的多个训练子图像输入全连接网络模型输出一维的向量。根据这些一维的向量,在通过识别字符数据库中的字符的匹配,计算对应训练子图像的置信度。置信度表明训练子图像中的光学字符与识别字符数据库中的字符之间的匹配可能性,一定概率范围内可以认为对应训练子图像存在识别字符数据库中的相应字符。其中,全连接网络模型可以通过神经卷积网络算法构建。
步骤S63:根据训练子图像的置信度生成文本信息的长度信息。
在上述过程中,根据多个训练子图像的置信度可以确定文本信息的长度信息。换而言之,在行高信息确定的情况下,根据多个训练子图像的置信度所指示多个训练子图像中存在具有识别字符数据库中的字符特性的训练子图像,得到区域A的长度信息。简单来说,通过识别字符数据库中的字符匹配得到区域A中多个训练子图像手否具有可识别字符的信息。步骤S63通过训练子图像置信度得到的长度信息,可以将图2发票样本中“货物信息”区域E和“税额”区域F及两者之间存在可识别光学字符的“货物数量”和“货物价格”的区域区分开。
步骤S64:将所述文本信息的行高信息和长度信息生成文本信息的位置特征信息。
综合文本信息的行高信息和长度信息就能够确定文本信息的位置特征信息,文本信息 的位置特征信息能够指示区域A位置和范围。
步骤S65:根据所述位置特征信息构建所述检测模型。
根据发票样本和需要提取文字信息的区域A的位置特征信息以及两者之间的映射关系,构建检测模型。当然,在上述过程中还需要通过与发票样本同类的训练图像进一步训练,直到检测模型的检测准确率达到预设的要求。
在一些情况下,例如是“脚注”的区域D,由于区域D比较靠近样本的边缘区域,此时可以只对区域D上方进行光学字符边缘线的识别,可以降低运算量,快捷地确定相应区域。
在一些情况下,例如是区域D的文本信息比较紧凑,文本信息内部没有较大的空隙,相应的光学字符也会比较紧凑,此时也可以通过区域D两端的光学字符边缘线确定区域D的范围,此时可以直接得到区域D的位置特征信息。但是,如果出现图2的发票样本中的记载“货物信息”区域E至记载“税额”的区域F之间比较紧凑的情形时,直接提取光学字符边缘线可能会将区域E和区域F以及两者之间的展示区域都并为一个区域。因此,本实施例中通过上述步骤S61至步骤S65的方案可以克服这个问题。
为了实现更好的识别准确率和识别精度,本实施例中提供一种准确性评估和对应调整模型参数的技术方案。请参考图7,OCR识别方法中,所述将所述文本信息及其对应的位置信息合成结构化识别数据的步骤S13之后,还包括:
步骤S71:对所述结构化识别数据进行准确性评估,得到识别准确率。
根据上述情形,计算结构化识别数据的识别准确率。识别准确率可以根据多张待识别图像输出的结构化识别数据进行评估。
步骤S72:根据识别准确率调整通用OCR模版的模型参数,生成调整后的通用OCR模版。
根据上述识别准确率调整通用OCR模版的模型参数。其中可以根据识别准确率调整通用识别模型的相关识别参数,此时可以采用神经卷积网络算法,在识别的过程中进一步优化通用OCR模版中的通用识别模型。根据参数优化后的通用识别模型与检测模型生成新的通用OCR模版。后续的OCR识别采用新的通用OCR模版。
为了验证所述OCR识别方法结构化识别数据的识别准确率和识别精度,本实施例中提供一种验证的技术方案。请参考图8,OCR识别方法中,所述将所述文本信息及其对应的位置信息合成结构化识别数据的步骤S13之后,还包括:
步骤S81:验证所述结构化识别数据是否符合验证条件。
步骤S82:若否,将所述结构化识别数据中不符合验证条件的文本信息对应的待识别图像,输入调整后的通用OCR模版中进行重新识别。
步骤S83:若是,输出所述结构化识别数据。
通过验证公式或者联合验证的方式,验证所述结构化识别数据是否符合验证的条件。在得到结构化识别数据后,以结构化识别数据“纳税识别号:12345”为例。可以通过人工识别或者通过对应结构化识别数据的校验公式进行校验。
再举一个例子,当识别得到的结构化识别数据是“身份证号:4401*11999****2459”(为了规避隐私权风险,某些位置的数据使用“*”遮蔽,而此处在实际识别场景中存在相应数字)时,可以通过身份证号的校验公式得到上述结构化识别数据是否识别准确。校验的内容包括结构化识别数据的数字位数、结构、末尾校验码等等。
此外,结构化识别数据还可以相应结合其他类型的结构化识别数据进行联合校验,例如结构化识别数据中“身份证号:4401*11999****2459”与“户籍所在地:广东省广州市天河区……”,此时可以判断结构化识别数据中身份证号的前四位识别是准确的,可以执行步骤S83。
如果结构化识别数据不符合验证条件那么执行步骤S82,将待识别图像输入上述经过调整的通用OCR模版中进行二次识别。更进一步,还可以通过继续检验的方式,直到识别准确率达到要求,否则继续调整相应的模型参数。
当通过通用OCR模版得到多段所述文本信息时,为了得到结构化和可读性更强的结构化识别数据,本实施例提供的OCR识别方法中,将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息的步骤S12之后,还包括:根据检测模型识别得到多段文本信息在待识别图像上的相对位置,将多段文本信息依序拼合。
在上述过程中,当待识别图像通过通用OCR模版识别得到多段文本信息时,根据文本信息出现的具体位置将多段文本信息依序拼合。以图2中的发票样本作为示范性举例,当“货物信息”的区域E中存在多段货物信息时,例如是某用户购买了水果时开具的发票,购买了苹果、香蕉和雪梨。此时用户开具的纸质发票就是待识别对象,对用户购买水果开具的发票图像输入通用OCR模板,在区域E得到三段对应的文本信息——“苹果、香蕉、雪梨”。此时根据上述三段对应文本信息的相对位置拼合,在本例中分三行将文本信息根据相对位置的结构化信息以结构化的形式展示。例如,根据识别得到的文本信息和对应结构化信息,拼合成发票样式进行展示。除此之外,根据图2中识别的若干项目也可以作为多段文字信息,通用OCR模板根据其相对位置在结构化的发票模板中进行拼接和展示。本实施例还提供一种方案,当通用OCR模板识别到无法相应以文字展示的信息时,例如是图2的发票样本中的“签章”区域H中记载某单位的公章信息,上述的通用OCR模板可以采集区域H的图像信息,并将其矢量化,生成签章矢量图。在拼接和展示时,将签章矢量图拼接到上述发票模板中。在另一种常用的场景中,当通用OCR模板识别到无法相应以文字展示的信息时,例如是图2的发票样本中的“密码区域”区域G中记载的密码信息,该密码以二维码形式展示。上述的通用OCR模板可以采集区域G的图像信息,并通过识别二维码的方式得到相应的密码信息。在拼接和展示时,将密码信息以明文或者反向生成条码的方式,将密码信息拼接到上述发票模板中。
为了调整生成根据多段文本信息在待识别图像上的相对位置依照顺序拼合时的位置偏差,本实施例在上述方案的基础上还提供一种OCR识别方法,其中在根据检测模型识别得到多段文本信息在待识别图像上的相对位置,将多段文本信息依序拼合的步骤之后,还包 括:根据多段文本信息在图像信息上的相对位置,调整通用OCR模版的检测模型中的定位间距参数。
仍以上述例子作为解释的基础,根据多段文本信息“苹果、香蕉、雪梨”在图像信息的相对位置,重新确定多段文本信息之间的行距、同行文字之间的间距信息。根据间距信息调整通用OCR模版中检测模型的定位间隔参数。其中,定位间隔参数可以用于检测模型在识别待识别图像时,定位文本信息中每一个字之间的间隔以及多段文本信息之间的行距,便于检测模型截取相应的图像信息。在上述过程中,间距信息可以通过识别的文本信息在显示字号、相同行距、单字间隔相同的情况下,比较识别的相应字段的显示区域,进而得到间隔信息。
本实施例还相应提供一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行上述任一项实施例所述OCR识别方法的步骤。
本实施例除了上述用于示例性识别图2中发票样本的用途,还可以识别合同、票据、证件等待识别对象。为了更好解释本申请的技术方案,本实施例现结合图9的合同样本的待识别图像进行进一步解释。
请结合图1和图9,上述的OCR识别方法包括:
步骤S11:获取业务方数据的待识别图像。
获取业务方数据中合同样本的待识别图像,待识别图像的获取方式可以是通过扫描或者拍照等方式获得待识别图像。
步骤S12:将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到。
在上述过程中,将合同样本的待识别图像输入通用OCR模版进行识别,得到合同样本的待识别图像上记载的文本信息及其对应的位置信息。其中,识别合同样本的通用OCR模板中的检测模型需要利用业务方提供的业务方数据进行训练,业务方数据中包括与合同样本同类型的训练图像作为训练的对象。利用与合同样本同类型的训练图像作为训练的对象,训练得到的检测模型能够根据合同样本中文本信息的位置截取相应的待识别子图像,以供通用识别模型进行光学字符到文本信息的识别。本申请中,针对不同的待识别对象,只需要重新训练检测模型,而通用识别模型可以不需要重复训练。例如,在识别合同样本前,只需要相应训练检测模型,而通用识别模型可以采用识别发票样本时通用OCR模板中的通用识别模型。
在上述过程中,通用OCR模板通过识别合同样本中的“合同名称”区域I、“当事人信息”区域J、“合同正文”区域K、“签章信息”区域L、“落款与日期”区域M等区域的待识别子图像,得到相应的文本信息。其中,包括两个过程:首先,通用OCR模板中的检测模型将通过预先训练的映射关系,检测得到“合同名称”区域I、“当事人信息”区 域J、“合同正文”区域K、“签章信息”区域L、“落款与日期”区域M的相对位置的位置信息,并截取相应区域的待识别子图像;其后,通用OCR模板中的通用识别模型识别上述“合同名称”区域I、“当事人信息”区域J、“合同正文”区域K、“签章信息”区域L、“落款与日期”区域M的待识别子图像中的光学字符,并根据光学字符和文字之间的映射关系,最终识别得到待识别子图像对应的文本信息。通用OCR模板中还可以根据每个区域的相对位置匹配结构化信息,这些结构化信息可以是“合同名称”、“当事人信息”、“合同正文”、“签章信息”、“落款与日期”等与识别区域相对应的信息。
步骤S13:将所述文本信息及其对应的位置信息合成结构化识别数据。
根据上述通用OCR模板识别得到的文本信息及其对应的位置信息生成结构化识别数据。其中,检测模型得到待识别的合同样本中的各个展示区域的相对位置生成与图9中合同样本对应的合同模板。此时,将文本信息根据位置信息中对应的位置写入合同模板中,生成结构化识别数据。
此外,还可以通过通用OCR模板匹配得到的结构化信息,结合识别得到的文本信息及其对应位置信息生成结构化识别数据。
本实施例还提供一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行上述任一实施例的所述OCR识别方法的步骤。
本申请的实施例提供的电子设备,如图10所示,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。该终端可以为包括手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等任意终端设备,以终端为手机为例:
图10示出的是与本申请实施例提供的终端相关的手机的部分结构的框图。参考图10,手机包括:射频(Radio Frequency,RF)电路1010、存储器1020、输入单元1030、显示单元1040、传感器1010、音频电路1060、无线保真(wireless fidelity,WiFi)模块1070、处理器1080、以及电源1090等部件。本领域技术人员可以理解,图10中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图10对手机的各个构成部件进行具体的介绍:
RF电路1010可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1080处理;另外,将设计上行的数据发送给基站。通常,RF电路1010包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路1010还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分 多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器1020可用于存储软件程序以及模块,处理器1080通过运行存储在存储器1020的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1020可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1020可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1030可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元1030可包括触控面板1031以及其他输入设备1032。触控面板1031,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1031上或在触控面板1031附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1031可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1080,并能接收处理器1080发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1031。除了触控面板1031,输入单元1030还可以包括其他输入设备1032。具体地,其他输入设备1032可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1040可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元1040可包括显示面板1041,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1041。进一步的,触控面板1031可覆盖显示面板1041,当触控面板1031检测到在其上或附近的触摸操作后,传送给处理器1080以确定触摸事件的类型,随后处理器1080根据触摸事件的类型在显示面板1041上提供相应的视觉输出。虽然在图10中,触控面板1031与显示面板1041是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1031与显示面板1041集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1050,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1041的亮度,接近传感器可在手机移动到耳边时,关闭显示面板1041和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此 不再赘述。
音频电路1060、扬声器1061,传声器1062可提供用户与手机之间的音频接口。音频电路1060可将接收到的音频数据转换后的电信号,传输到扬声器1061,由扬声器1061转换为声音信号输出;另一方面,传声器1062将收集的声音信号转换为电信号,由音频电路1060接收后转换为音频数据,再将音频数据输出处理器1080处理后,经RF电路1010以发送给比如另一手机,或者将音频数据输出至存储器1020以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1070可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图10示出了WiFi模块1070,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器1080是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1020内的软件程序和/或模块,以及调用存储在存储器1020内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1080可包括一个或多个处理单元;优选的,处理器1080可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1080中。
手机还包括给各个部件供电的电源1090(比如电池),优选的,电源可以通过电源管理系统与处理器1080逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该终端所包括的处理器1080还具有以下功能:
获取业务方数据的待识别图像;
将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;
将所述文本信息及其对应的位置信息合成结构化识别数据。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置(电子设备)实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络 单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本实施例还提供一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得移动终端能够执行上述任一实施例的所述OCR识别方法。
在本申请的OCR识别方法及其电子设备,通过获取业务方数据的待识别图像;将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;将所述文本信息及其对应的位置信息合成结构化识别数据的技术方案,能够高效快速地通过通用OCR模板对待识别对象(例如是合同、发票、票据、证件等对象)的图像进行识别,生成结构化识别数据,完成光学字符到文本信息之间的识别。本申请中采用的通用OCR模板的训练时间短,适应性强,能够适应多种不同的待识别对象,识别准确率高,整体的效率高。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上对本申请所提供的一种电子设备进行了详细介绍,对于本领域的一般技术人员,依据本申请实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种OCR识别方法,OCR是指光学字符识别,其特征在于,包括:
    获取业务方数据的待识别图像;
    将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述检测模型用于识别对应文本信息的位置并将所述待识别图像对应位置截取后,转至所述通用识别模型进行文本识别,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;
    将所述文本信息及其对应的位置信息合成结构化识别数据。
  2. 根据权利要求1所述的OCR识别方法,其特征在于,还包括:
    从业务方数据中确定业务方处理的各个业务类型;
    根据各个业务类型分别获取对应的样本;
    利用所述样本训练得到通用识别模型。
  3. 根据权利要求2所述的OCR识别方法,其特征在于,所述利用所述样本训练得到通用识别模型的步骤,包括:
    提取所述样本中训练图像所记载文本信息的文字特征信息;
    获取文字特征信息对应的训练文本信息,分析文字特征信息与训练文本信息之间的对应关系,得到映射信息;
    根据所述映射信息构建通用识别模型。
  4. 根据权利要求1所述的OCR识别方法,其特征在于,所述将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息的步骤之前,还包括:
    获取预标注字段子图像位置的训练图像;
    提取所述文本信息的位置特征信息,根据所述位置特征信息构建所述检测模型。
  5. 根据权利要求4所述的OCR识别方法,其特征在于,所述提取所述文本信息的位置特征信息,根据所述位置特征信息构建所述检测模型的步骤,包括:
    根据文本信息的行高信息对用于训练检测模型的训练图像进行分割,得到训练子图像;
    将所述训练子图像输入全连接网络模型,通过识别字符数据库中的字符,计算得到在训练子图像的置信度;
    根据训练子图像的置信度生成文本信息的长度信息;
    将所述文本信息的行高信息和长度信息生成文本信息的位置特征信息;
    根据所述位置特征信息构建所述检测模型。
  6. 根据权利要求1所述的OCR识别方法,其特征在于,所述将所述文本信息及其对应的位置信息合成结构化识别数据的步骤之后,还包括:
    对所述结构化识别数据进行准确性评估,得到识别准确率;
    根据识别准确率调整通用OCR模版的模型参数,生成调整后的通用OCR模版。
  7. 根据权利要求6所述的OCR识别方法,其特征在于,所述将所述文本信息及其对应的位置信息合成结构化识别数据的步骤之后,还包括:
    验证所述结构化识别数据是否符合验证条件;
    若是,输出所述结构化识别数据;
    若否,将所述结构化识别数据中不符合验证条件的文本信息对应的待识别图像,输入调整后的通用OCR模版中进行重新识别。
  8. 根据权利要求1所述的OCR识别方法,其特征在于,将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息的步骤之后,当通过通用OCR模版得到多段所述文本信息时,还包括:根据检测模型识别得到多段文本信息在待识别图像上的相对位置,将多段文本信息依序拼合。
  9. 根据权利要求8所述的OCR识别方法,其特征在于,根据检测模型识别得到多段文本信息在待识别图像上的相对位置,将多段文本信息依序拼合的步骤之后,还包括:根据多段文本信息在图像信息上的相对位置,调整通用OCR模版的检测模型中的定位间距参数。
  10. 本申请还提供一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行以下步骤:
    获取业务方数据的待识别图像;
    将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息;其中,所述通用OCR模板包括检测模型和通用识别模型,所述检测模型用于识别对应文本信息的位置并将所述待识别图像对应位置截取后,转至所述通用识别模型进行文本识别,所述通用识别模型通过业务方的各种业务类型的字段图像样本训练得到;
    将所述文本信息及其对应的位置信息合成结构化识别数据。
  11. 根据权利要求10所述的电子设备,其特征在于,所述处理器还用于执行以下步骤:
    从业务方数据中确定业务方处理的各个业务类型;
    根据各个业务类型分别获取对应的样本;
    利用所述样本训练得到通用识别模型。
  12. 根据权利要求11所述的电子设备,其特征在于,所述处理器在执行所述利用所述样本训练得到通用识别模型时,具体执行以下步骤:
    提取所述样本中训练图像所记载文本信息的文字特征信息;
    获取文字特征信息对应的训练文本信息,分析文字特征信息与训练文本信息之间的对应关系,得到映射信息;
    根据所述映射信息构建通用识别模型。
  13. 根据权利要求10所述的电子设备,其特征在于,所述处理器在执行所述将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置 信息的步骤之前,还执行以下步骤:
    获取预标注字段子图像位置的训练图像;
    提取所述文本信息的位置特征信息,根据所述位置特征信息构建所述检测模型。
  14. 根据权利要求13所述的电子设备,其特征在于,所述处理器在执行所述提取所述文本信息的位置特征信息,根据所述位置特征信息构建所述检测模型的步骤时,具体执行以下步骤:
    根据文本信息的行高信息对用于训练检测模型的训练图像进行分割,得到训练子图像;
    将所述训练子图像输入全连接网络模型,通过识别字符数据库中的字符,计算得到在训练子图像的置信度;
    根据训练子图像的置信度生成文本信息的长度信息;
    将所述文本信息的行高信息和长度信息生成文本信息的位置特征信息;
    根据所述位置特征信息构建所述检测模型。
  15. 根据权利要求10所述的电子设备,其特征在于,所述处理器在执行所述将所述文本信息及其对应的位置信息合成结构化识别数据的步骤之后,还执行以下步骤:
    对所述结构化识别数据进行准确性评估,得到识别准确率;
    根据识别准确率调整通用OCR模版的模型参数,生成调整后的通用OCR模版。
  16. 根据权利要求15所述的电子设备,其特征在于,所述处理器在执行所述将所述文本信息及其对应的位置信息合成结构化识别数据的步骤之后,还执行以下步骤:
    验证所述结构化识别数据是否符合验证条件;
    若是,输出所述结构化识别数据;
    若否,将所述结构化识别数据中不符合验证条件的文本信息对应的待识别图像,输入调整后的通用OCR模版中进行重新识别。
  17. 根据权利要求10所述的电子设备,其特征在于,所述处理器在执行将所述待识别图像输入通用OCR模版进行识别,得到待识别图像上记载的文本信息及其对应的位置信息的步骤之后,当通过通用OCR模版得到多段所述文本信息时,还执行以下步骤:根据检测模型识别得到多段文本信息在待识别图像上的相对位置,将多段文本信息依序拼合。
  18. 根据权利要求17所述的电子设备,其特征在于,所述处理器在执行根据检测模型识别得到多段文本信息在待识别图像上的相对位置,将多段文本信息依序拼合之后,还执行以下步骤:根据多段文本信息在图像信息上的相对位置,调整通用OCR模版的检测模型中的定位间距参数。
  19. 一种光学字符识别OCR识别装置,其特征在于,包括用于执行如权利要求1-9任一项权利要求所述的方法的单元。
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-9任一项所述的方法。
PCT/CN2019/117914 2019-01-28 2019-11-13 Ocr识别方法及其电子设备 WO2020155763A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910078744.8A CN109919014B (zh) 2019-01-28 2019-01-28 Ocr识别方法及其电子设备
CN201910078744.8 2019-01-28

Publications (1)

Publication Number Publication Date
WO2020155763A1 true WO2020155763A1 (zh) 2020-08-06

Family

ID=66960870

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117914 WO2020155763A1 (zh) 2019-01-28 2019-11-13 Ocr识别方法及其电子设备

Country Status (2)

Country Link
CN (1) CN109919014B (zh)
WO (1) WO2020155763A1 (zh)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085012A (zh) * 2020-09-04 2020-12-15 泰康保险集团股份有限公司 项目名称和类别识别方法及装置
CN112115907A (zh) * 2020-09-27 2020-12-22 北京捷通华声科技股份有限公司 固定版面证件结构化信息提取方法、装置、设备及介质
CN112287936A (zh) * 2020-09-24 2021-01-29 深圳市智影医疗科技有限公司 光学字符识别测试方法、装置、可读存储介质及终端设备
CN112329708A (zh) * 2020-11-24 2021-02-05 北京百度网讯科技有限公司 票据识别方法和装置
CN112507973A (zh) * 2020-12-29 2021-03-16 中国电子科技集团公司第二十八研究所 一种基于ocr技术的文本和图片识别系统
CN112508000A (zh) * 2020-11-26 2021-03-16 上海展湾信息科技有限公司 一种用于ocr图像识别模型训练数据生成的方法及设备
CN112528889A (zh) * 2020-12-16 2021-03-19 中国平安财产保险股份有限公司 Ocr信息检测修正方法、装置、终端及存储介质
CN112541443A (zh) * 2020-12-16 2021-03-23 平安科技(深圳)有限公司 发票信息抽取方法、装置、计算机设备及存储介质
CN112613380A (zh) * 2020-12-17 2021-04-06 网联清算有限公司 一种机房巡检方法、装置及电子设备、存储介质
CN112631586A (zh) * 2020-12-24 2021-04-09 软通动力信息技术(集团)股份有限公司 一种应用开发方法、装置、电子设备和存储介质
CN112926313A (zh) * 2021-03-10 2021-06-08 新华智云科技有限公司 一种槽位信息的提取方法与系统
CN112966583A (zh) * 2021-02-26 2021-06-15 深圳壹账通智能科技有限公司 图像处理方法、装置、计算机设备和存储介质
CN113205041A (zh) * 2021-04-29 2021-08-03 百度在线网络技术(北京)有限公司 结构化信息提取方法、装置、设备和存储介质
CN113569834A (zh) * 2021-08-05 2021-10-29 五八同城信息技术有限公司 营业执照识别方法、装置、电子设备及存储介质
CN113762100A (zh) * 2021-08-19 2021-12-07 杭州米数科技有限公司 医疗票据中名称提取及标准化方法、装置、计算设备及存储介质
CN114187605A (zh) * 2021-12-13 2022-03-15 苏州方兴信息技术有限公司 一种数据集成方法、装置和可读存储介质
CN114332865A (zh) * 2022-03-11 2022-04-12 北京锐融天下科技股份有限公司 一种证件ocr识别方法及系统
CN114724136A (zh) * 2022-04-27 2022-07-08 上海弘玑信息技术有限公司 标注数据生成的方法及电子设备
CN115035360A (zh) * 2021-11-22 2022-09-09 荣耀终端有限公司 图像的文字识别方法、电子设备及存储介质
CN115719465A (zh) * 2022-11-24 2023-02-28 北京百度网讯科技有限公司 车辆检测方法、装置、设备、存储介质以及程序产品
WO2023024793A1 (zh) * 2021-08-26 2023-03-02 北京有竹居网络技术有限公司 一种文字识别方法及其相关设备
CN116304266A (zh) * 2023-03-03 2023-06-23 苏州工业园区航星信息技术服务有限公司 档案管理系统
CN116362816A (zh) * 2023-05-30 2023-06-30 浙江爱信诺航天信息技术有限公司 凭证信息遗漏识别及处理方法、系统与介质
CN117197816A (zh) * 2023-06-19 2023-12-08 珠海盈米基金销售有限公司 一种用户材料识别方法和系统
CN117475453A (zh) * 2023-12-25 2024-01-30 欣诚信息技术有限公司 一种基于ocr的文书检测方法、装置及电子设备
CN112541443B (zh) * 2020-12-16 2024-05-10 平安科技(深圳)有限公司 发票信息抽取方法、装置、计算机设备及存储介质

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919014B (zh) * 2019-01-28 2023-11-03 平安科技(深圳)有限公司 Ocr识别方法及其电子设备
CN110490190B (zh) * 2019-07-04 2021-10-26 贝壳技术有限公司 一种结构化图像文字识别方法及系统
CN110489747A (zh) * 2019-07-31 2019-11-22 北京大米科技有限公司 一种图像处理方法、装置、存储介质及电子设备
CN110399932A (zh) * 2019-07-31 2019-11-01 中国工商银行股份有限公司 信用证软条款识别方法及装置
CN110852894B (zh) * 2019-11-04 2023-06-27 泰康保险集团股份有限公司 保险核保方法及装置、计算机存储介质、电子设备
CN110766010A (zh) * 2019-11-05 2020-02-07 上海鲸骞金融信息服务有限公司 一种信息识别方法、模型训练方法和相关装置
CN112949638B (zh) * 2019-11-26 2024-04-05 金毛豆科技发展(北京)有限公司 一种证件图像上传方法和装置
CN111046864A (zh) * 2019-12-13 2020-04-21 盈佳云创科技(深圳)有限公司 一种合同扫描件五要素自动提取方法及系统
CN111191715A (zh) * 2019-12-27 2020-05-22 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质
CN111209827B (zh) * 2019-12-31 2023-07-14 中国南方电网有限责任公司 一种基于特征检测的ocr识别票据问题的方法及系统
CN111275037B (zh) * 2020-01-09 2021-06-08 上海知达教育科技有限公司 票据识别方法及装置
CN110874618B (zh) * 2020-01-19 2020-11-27 同盾控股有限公司 基于小样本的ocr模板学习方法、装置、电子设备及介质
CN111310693B (zh) * 2020-02-26 2023-08-29 腾讯科技(深圳)有限公司 图像中文本的智能标注方法、装置及存储介质
CN113313120A (zh) * 2020-02-27 2021-08-27 顺丰科技有限公司 智能卡图像识别模型的建立方法以及装置
CN111476227B (zh) * 2020-03-17 2024-04-05 平安科技(深圳)有限公司 基于ocr的目标字段识别方法、装置及存储介质
CN111414917B (zh) * 2020-03-18 2023-05-12 民生科技有限责任公司 一种低像素密度文本的识别方法
CN111428484B (zh) * 2020-04-14 2022-02-18 广州云从鼎望科技有限公司 一种信息管理方法、系统、设备和介质
CN113591884B (zh) * 2020-04-30 2023-11-14 上海高德威智能交通系统有限公司 字符识别模型的确定方法、装置、设备及存储介质
CN111582273B (zh) * 2020-05-09 2023-10-10 中国工商银行股份有限公司 图像文本识别方法及装置
WO2021151270A1 (zh) * 2020-05-20 2021-08-05 平安科技(深圳)有限公司 图像结构化数据提取方法、装置、设备及存储介质
CN111626383B (zh) * 2020-05-29 2023-11-07 Oppo广东移动通信有限公司 字体识别方法及装置、电子设备、存储介质
CN111626244B (zh) * 2020-05-29 2023-09-12 中国工商银行股份有限公司 图像识别方法、装置、电子设备和介质
CN111680679A (zh) * 2020-06-03 2020-09-18 重庆数道科技有限公司 一种基于ocr的单据自动识别方法
CN111652219B (zh) * 2020-06-03 2023-08-04 有米科技股份有限公司 一种图文标识检测识别方法、装置、服务器及存储介质
CN111666940B (zh) * 2020-06-05 2024-01-16 厦门美图之家科技有限公司 聊天截图内容处理方法、装置、电子设备和可读存储介质
CN111652162A (zh) * 2020-06-08 2020-09-11 成都知识视觉科技有限公司 一种医疗单证结构化知识提取的文本检测与识别方法
CN111814785B (zh) * 2020-06-11 2024-03-29 浙江大华技术股份有限公司 发票识别方法及相关模型的训练方法以及相关设备、装置
CN111753717B (zh) * 2020-06-23 2023-07-28 北京百度网讯科技有限公司 用于提取文本的结构化信息的方法、装置、设备及介质
CN111985500B (zh) * 2020-07-28 2024-03-29 国网山东省电力公司禹城市供电公司 一种继电保护定值输入的校核方法、系统及装置
CN111931835A (zh) * 2020-07-31 2020-11-13 中国工商银行股份有限公司 一种图像识别方法、装置及系统
CN111932766A (zh) * 2020-08-11 2020-11-13 上海眼控科技股份有限公司 发票核验方法、装置、计算机设备和可读存储介质
CN111709412A (zh) * 2020-08-24 2020-09-25 国信电子票据平台信息服务有限公司 一种电子发票打开即查验的方法及系统
CN112381087A (zh) * 2020-08-26 2021-02-19 北京来也网络科技有限公司 结合rpa和ai的图像识别方法、装置、计算机设备和介质
CN112100431B (zh) * 2020-11-16 2021-02-26 深圳壹账通智能科技有限公司 Ocr系统的评估方法、装置、设备及可读存储介质
CN112508011A (zh) * 2020-12-02 2021-03-16 上海逸舟信息科技有限公司 一种基于神经网络的ocr识别方法及设备
CN112861861B (zh) * 2021-01-15 2024-04-09 珠海世纪鼎利科技股份有限公司 识别数码管文本的方法、装置及电子设备
CN113011341A (zh) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 佛经抄写辅助方法、装置、设备及存储介质
CN113111869B (zh) * 2021-04-06 2022-12-09 上海交通大学 提取文字图片及其描述的方法和系统
CN113378710B (zh) * 2021-06-10 2024-03-08 平安科技(深圳)有限公司 图像文件的版面分析方法、装置、计算机设备和存储介质
CN113469029A (zh) * 2021-06-30 2021-10-01 上海犀语科技有限公司 一种金融类pdf扫描件的文本识别方法及装置
CN113591657B (zh) * 2021-07-23 2024-04-09 京东科技控股股份有限公司 Ocr版面识别的方法、装置、电子设备及介质
CN113609324A (zh) * 2021-08-10 2021-11-05 上海交通大学 基于光学字符识别的地理图片位置信息识别方法及系统
CN113449698A (zh) * 2021-08-30 2021-09-28 湖南文盾信息技术有限公司 纸质文档的自动化录入方法、系统、装置及存储介质
CN114298006B (zh) * 2021-12-30 2023-05-09 福建博思软件股份有限公司 一种电子票据生成方法、系统和存储设备
CN114550177B (zh) * 2022-02-25 2023-06-20 北京百度网讯科技有限公司 图像处理的方法、文本识别方法及装置
CN115311663A (zh) * 2022-08-09 2022-11-08 青岛海信信息科技股份有限公司 一种ocr识别方法和设备
CN116612475B (zh) * 2023-06-01 2024-01-23 凯泰铭科技(北京)有限公司 一种车险数据中车型名称智能校正方法和设备
CN116958996A (zh) * 2023-07-24 2023-10-27 凯泰铭科技(北京)有限公司 Ocr信息提取方法、系统及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512657A (zh) * 2015-08-20 2016-04-20 北京旷视科技有限公司 字符识别方法和设备
CN108009546A (zh) * 2016-10-28 2018-05-08 北京京东尚科信息技术有限公司 信息识别方法和装置
US20180150956A1 (en) * 2016-11-25 2018-05-31 Industrial Technology Research Institute Character recognition systems and character recognition methods thereof using convolutional neural network
CN109034159A (zh) * 2018-05-28 2018-12-18 北京捷通华声科技股份有限公司 图像信息提取方法和装置
CN109919014A (zh) * 2019-01-28 2019-06-21 平安科技(深圳)有限公司 Ocr识别方法及其电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6140685A (ja) * 1984-08-01 1986-02-26 Matsushita Electric Ind Co Ltd 文字認識装置
US6704449B1 (en) * 2000-10-19 2004-03-09 The United States Of America As Represented By The National Security Agency Method of extracting text from graphical images
US20110112995A1 (en) * 2009-10-28 2011-05-12 Industrial Technology Research Institute Systems and methods for organizing collective social intelligence information using an organic object data model
US10169670B2 (en) * 2015-11-30 2019-01-01 International Business Machines Corporation Stroke extraction in free space
CN108288078B (zh) * 2017-12-07 2020-09-29 腾讯科技(深圳)有限公司 一种图像中字符识别方法、装置和介质
CN109271967B (zh) * 2018-10-16 2022-08-26 腾讯科技(深圳)有限公司 图像中文本的识别方法及装置、电子设备、存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512657A (zh) * 2015-08-20 2016-04-20 北京旷视科技有限公司 字符识别方法和设备
CN108009546A (zh) * 2016-10-28 2018-05-08 北京京东尚科信息技术有限公司 信息识别方法和装置
US20180150956A1 (en) * 2016-11-25 2018-05-31 Industrial Technology Research Institute Character recognition systems and character recognition methods thereof using convolutional neural network
CN109034159A (zh) * 2018-05-28 2018-12-18 北京捷通华声科技股份有限公司 图像信息提取方法和装置
CN109919014A (zh) * 2019-01-28 2019-06-21 平安科技(深圳)有限公司 Ocr识别方法及其电子设备

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085012B (zh) * 2020-09-04 2024-03-08 泰康保险集团股份有限公司 项目名称和类别识别方法及装置
CN112085012A (zh) * 2020-09-04 2020-12-15 泰康保险集团股份有限公司 项目名称和类别识别方法及装置
CN112287936A (zh) * 2020-09-24 2021-01-29 深圳市智影医疗科技有限公司 光学字符识别测试方法、装置、可读存储介质及终端设备
CN112115907A (zh) * 2020-09-27 2020-12-22 北京捷通华声科技股份有限公司 固定版面证件结构化信息提取方法、装置、设备及介质
CN112329708A (zh) * 2020-11-24 2021-02-05 北京百度网讯科技有限公司 票据识别方法和装置
CN112508000B (zh) * 2020-11-26 2023-04-07 上海展湾信息科技有限公司 一种用于ocr图像识别模型训练数据生成的方法及设备
CN112508000A (zh) * 2020-11-26 2021-03-16 上海展湾信息科技有限公司 一种用于ocr图像识别模型训练数据生成的方法及设备
CN112528889A (zh) * 2020-12-16 2021-03-19 中国平安财产保险股份有限公司 Ocr信息检测修正方法、装置、终端及存储介质
CN112541443A (zh) * 2020-12-16 2021-03-23 平安科技(深圳)有限公司 发票信息抽取方法、装置、计算机设备及存储介质
CN112528889B (zh) * 2020-12-16 2024-02-06 中国平安财产保险股份有限公司 Ocr信息检测修正方法、装置、终端及存储介质
CN112541443B (zh) * 2020-12-16 2024-05-10 平安科技(深圳)有限公司 发票信息抽取方法、装置、计算机设备及存储介质
CN112613380A (zh) * 2020-12-17 2021-04-06 网联清算有限公司 一种机房巡检方法、装置及电子设备、存储介质
CN112613380B (zh) * 2020-12-17 2024-03-15 网联清算有限公司 一种机房巡检方法、装置及电子设备、存储介质
CN112631586A (zh) * 2020-12-24 2021-04-09 软通动力信息技术(集团)股份有限公司 一种应用开发方法、装置、电子设备和存储介质
CN112631586B (zh) * 2020-12-24 2023-05-26 软通动力信息技术(集团)股份有限公司 一种应用开发方法、装置、电子设备和存储介质
CN112507973B (zh) * 2020-12-29 2022-09-06 中国电子科技集团公司第二十八研究所 一种基于ocr技术的文本和图片识别系统
CN112507973A (zh) * 2020-12-29 2021-03-16 中国电子科技集团公司第二十八研究所 一种基于ocr技术的文本和图片识别系统
CN112966583A (zh) * 2021-02-26 2021-06-15 深圳壹账通智能科技有限公司 图像处理方法、装置、计算机设备和存储介质
CN112926313B (zh) * 2021-03-10 2023-08-15 新华智云科技有限公司 一种槽位信息的提取方法与系统
CN112926313A (zh) * 2021-03-10 2021-06-08 新华智云科技有限公司 一种槽位信息的提取方法与系统
CN113205041A (zh) * 2021-04-29 2021-08-03 百度在线网络技术(北京)有限公司 结构化信息提取方法、装置、设备和存储介质
CN113569834A (zh) * 2021-08-05 2021-10-29 五八同城信息技术有限公司 营业执照识别方法、装置、电子设备及存储介质
CN113762100A (zh) * 2021-08-19 2021-12-07 杭州米数科技有限公司 医疗票据中名称提取及标准化方法、装置、计算设备及存储介质
CN113762100B (zh) * 2021-08-19 2024-02-09 杭州米数科技有限公司 医疗票据中名称提取及标准化方法、装置、计算设备及存储介质
WO2023024793A1 (zh) * 2021-08-26 2023-03-02 北京有竹居网络技术有限公司 一种文字识别方法及其相关设备
CN115035360A (zh) * 2021-11-22 2022-09-09 荣耀终端有限公司 图像的文字识别方法、电子设备及存储介质
CN114187605A (zh) * 2021-12-13 2022-03-15 苏州方兴信息技术有限公司 一种数据集成方法、装置和可读存储介质
CN114332865A (zh) * 2022-03-11 2022-04-12 北京锐融天下科技股份有限公司 一种证件ocr识别方法及系统
CN114724136A (zh) * 2022-04-27 2022-07-08 上海弘玑信息技术有限公司 标注数据生成的方法及电子设备
CN115719465A (zh) * 2022-11-24 2023-02-28 北京百度网讯科技有限公司 车辆检测方法、装置、设备、存储介质以及程序产品
CN115719465B (zh) * 2022-11-24 2023-11-03 北京百度网讯科技有限公司 车辆检测方法、装置、设备、存储介质以及程序产品
CN116304266A (zh) * 2023-03-03 2023-06-23 苏州工业园区航星信息技术服务有限公司 档案管理系统
CN116304266B (zh) * 2023-03-03 2024-02-27 苏州工业园区航星信息技术服务有限公司 档案管理系统
CN116362816B (zh) * 2023-05-30 2023-09-26 浙江爱信诺航天信息技术有限公司 凭证信息遗漏识别及处理方法、系统与介质
CN116362816A (zh) * 2023-05-30 2023-06-30 浙江爱信诺航天信息技术有限公司 凭证信息遗漏识别及处理方法、系统与介质
CN117197816A (zh) * 2023-06-19 2023-12-08 珠海盈米基金销售有限公司 一种用户材料识别方法和系统
CN117475453B (zh) * 2023-12-25 2024-02-27 欣诚信息技术有限公司 一种基于ocr的文书检测方法、装置及电子设备
CN117475453A (zh) * 2023-12-25 2024-01-30 欣诚信息技术有限公司 一种基于ocr的文书检测方法、装置及电子设备

Also Published As

Publication number Publication date
CN109919014A (zh) 2019-06-21
CN109919014B (zh) 2023-11-03

Similar Documents

Publication Publication Date Title
WO2020155763A1 (zh) Ocr识别方法及其电子设备
WO2020164270A1 (zh) 基于深度学习的行人检测方法、系统、装置及存储介质
WO2019096008A1 (zh) 身份识别方法、计算机设备及存储介质
US10430648B2 (en) Method of processing content and electronic device using the same
CN111586287B (zh) 通过利用相机的应用提供各种功能的电子装置及操作方法
CN108875451B (zh) 一种定位图像的方法、装置、存储介质和程序产品
US20160042228A1 (en) Systems and methods for recognition and translation of gestures
CN111191715A (zh) 图像处理方法及装置、电子设备和存储介质
CN112100431B (zh) Ocr系统的评估方法、装置、设备及可读存储介质
CN108874283B (zh) 图片识别方法、移动终端及计算机可读存储介质
CN107766403B (zh) 一种相册处理方法、移动终端以及计算机可读存储介质
CN104516893B (zh) 信息存储方法、装置和通讯终端
WO2019105457A1 (zh) 图像处理方法、计算机设备和计算机可读存储介质
CN109670507B (zh) 图片处理方法、装置及移动终端
CN112418214A (zh) 一种车辆识别码识别方法、装置、电子设备及存储介质
US20170277423A1 (en) Information processing method and electronic device
CN110007836B (zh) 一种账单生成方法及移动终端
US20230048495A1 (en) Method and platform of generating document, electronic device and storage medium
WO2020224127A1 (zh) 视频流截取方法、装置及存储介质
CN112435671B (zh) 汉语精准识别的智能化语音控制方法及系统
CN111353422B (zh) 信息提取方法、装置及电子设备
WO2022103519A1 (en) Semantic segmentation for stroke classification in inking application
CN109544170B (zh) 一种交易快照验证方法、设备及计算机可读存储介质
US20240160299A1 (en) An electronic input writing device for digital creation and a method for operating the same
WO2021120420A1 (zh) 阅读辅助方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19912489

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19912489

Country of ref document: EP

Kind code of ref document: A1