CN111612157B

CN111612157B - Training method, character recognition device, storage medium and electronic equipment

Info

Publication number: CN111612157B
Application number: CN202010440288.XA
Authority: CN
Inventors: 梁宇; 许春阳; 程芃森; 陈航; 张冬; 崔凯铜; 黄勇
Original assignee: Sichuan Silence Information Technology Co ltd
Current assignee: Sichuan Silence Information Technology Co ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2023-06-30
Anticipated expiration: 2040-05-22
Also published as: CN111612157A

Abstract

The application provides a training method, a character recognition device, a storage medium and electronic equipment. The machine learning model is trained through sample images of various colors, various character sizes, various fuzzy layering degrees and various character inclination angles, so that the trained machine learning model can adapt to images to be identified in different styles, and the adaptability of the machine learning model to different scenes is improved.

Description

Training method, character recognition device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data processing, and in particular, to a training method, a text recognition device, a storage medium, and an electronic device.

Background

With the continuous development of OCR (Optical Character Recognition) technology, the application field of the technology is becoming more and more abundant (for example, identification card information recognition, picture information extraction, financial information extraction, license plate information recognition, etc.). However, in practical applications, the recognition effect is easily limited by the scene of the image to be recognized. Namely, aiming at pictures in some scenes, the recognition effect is good; but for pictures in other scenes, the recognition effect is relatively poor.

Disclosure of Invention

To overcome at least one of the deficiencies in the prior art, it is an object of an embodiment of the present application to provide a training method applied to an electronic device configured with an untrained machine learning model, the method comprising:

acquiring sample images containing multiple colors, multiple word sizes, multiple fuzzy layers and multiple word inclination angles, wherein each sample image packet carries words of a specific language type, and the number of the words of the specific language type is the same;

and training the machine learning model through the sample image to identify the characters of the specific language type in the sample image, so that the trained machine learning model can be used for identifying the characters of the specific language type from the sample image.

Optionally, the sample image is crawled from a variety of network platforms.

Optionally, the specific language type is cantonese.

A second object of the embodiment of the present application is to provide a text recognition method, which is applied to an electronic device, where the electronic device is configured with a machine learning model trained by the training method and a dictionary file, where the machine learning model sequentially includes a convolutional layer, a recursive network layer, and a transcription layer, where the convolutional layer carries a residual network, and the method includes:

acquiring an image to be identified;

the characters of the specific language type in the image to be identified are identified sequentially through the convolution layer carrying the residual error network, the recursion network layer and the transcription layer, and the character codes of the characters are obtained;

and indexing the dictionary file according to the literal code, and determining literal information corresponding to the literal code.

Optionally, the method further comprises:

and checking the text information through a full word masking technology, and correcting the text which does not accord with the text information context in the text information.

Optionally, the electronic device is further configured with a verification model, and the step of verifying the text information by using a full word mask technology and correcting the text in the text information, which does not conform to the text information context, includes:

hiding part of characters in the character information to obtain character information to be checked;

predicting the characters at the hidden positions according to the context of the character information to be verified through the verification model to obtain predicted characters;

and comparing the predicted text with the text at the hidden position, and correcting the text which does not accord with the text information context in the text information.

The third object of the embodiment of the present application is to provide a text recognition device, which is applied to an electronic device, the electronic device is configured with a machine learning model trained by a training method and a dictionary file, the machine learning model sequentially includes a convolution layer, a recursion network layer and a transcription layer, which carry a residual network, and the text recognition device includes:

the image acquisition module is used for acquiring an image to be identified;

the code acquisition module is used for identifying characters of specific language types in the image to be identified through the convolution layer carrying the residual error network, the recursion network layer and the transcription layer in sequence to acquire character codes of the characters;

and the character index module is used for indexing the dictionary file according to the character codes and determining character information corresponding to the character codes.

Optionally, the text recognition device further includes:

and the character verification module is used for verifying the character information through a full word mask technology and correcting characters which do not accord with the character information context in the character information.

A fourth object of the embodiments of the present application is to provide a storage medium storing a computer program, which when executed by a processor, implements the word recognition method.

It is a fifth object of embodiments of the present application to provide an electronic device, where the electronic device includes a memory and a processor, where the memory stores machine executable instructions that can be executed by the processor, and where the machine executable instructions implement the word recognition method when executed by the processor.

Compared with the prior art, the application has the following beneficial effects:

the training method, the character recognition method, the device, the storage medium and the electronic equipment are provided. The machine learning model is trained through sample images of various colors, various character sizes, various fuzzy layering degrees and various character inclination angles, so that the trained machine learning model can adapt to images to be identified in different styles, and the adaptability of the machine learning model to different scenes is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of a training method provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a machine learning model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating steps of a text recognition method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Icon: 100-an electronic device; 110-a software virtual device; 120-memory; 130-a processor; 1101-an image acquisition module; 1102-a code acquisition module; 1103-text indexing module; 1104-text verification module.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

As described in the background section, with the development of OCR (Optical Character Recognition ) technology, the application scene thereof is becoming more and more rich (e.g., identification card information recognition, picture information extraction, financial information extraction, license plate information recognition, etc.). However, in practical applications, the recognition effect is easily limited by the scene of the image to be recognized. Namely, aiming at pictures in some scenes, the recognition effect is good; but for pictures in other scenes, the recognition effect is relatively poor.

In view of this, the embodiment of the application provides a training method applied to an electronic device. Wherein the electronic device is configured with an untrained machine learning model. Referring to fig. 1, a flowchart of steps of the training method according to an embodiment of the present application will be described in detail below.

Step S110, obtaining sample images containing various colors, various character sizes, various fuzzy layers and various character inclination angles, wherein each sample image packet carries characters of specific language types, and the number of the characters of the specific language types is the same.

The sample images with the multiple colors are images with different text colors and images with different background colors respectively. The sample images with various character inclination angles respectively show different inclination angles for the relative positions of characters in the images, and show different inclination angles in the picture contents for the character contents in the picture contents.

Step S120, training the machine learning model to identify the text of the specific language type in the sample image through the sample image, so that the trained machine learning model can be used to identify the text of the specific language type from the sample image.

Because the sample image comprises multiple colors, multiple word sizes, multiple fuzzy degrees and multiple word inclination angles, the trained machine learning model can adapt to images to be identified in different patterns, and the adaptability of the machine learning model to different scenes is improved.

Further, in order to enable the trained machine learning model to adapt to text contents of different expression styles. The crawler technology can be used for crawling from various network platforms, wherein the specific network platform can be adaptively adjusted according to actual requirements.

It should be understood that the literal representation of the official platform, its dispatch sentence making, is more formal than the public forum. Therefore, the training machine learning model can adapt to the official formal language expression by crawling sample images from various network platforms, and can also adapt to the popular spoken language expression.

Wherein, as a possible implementation manner, the characters of the specific language type are cantonese.

The embodiment of the application also provides a character recognition method which is applied to the electronic equipment. The electronic equipment is configured with a machine learning model trained by the training method and a dictionary file, wherein the dictionary file records a plurality of characters and codes of the characters. Referring to fig. 2, for the machine learning model, the machine learning model includes a convolution layer carrying a residual network, a recursive network layer, and a transcription layer.

Referring to fig. 3, a flowchart of steps of the text recognition method according to an embodiment of the present application will be described in detail below.

Step S210, an image to be identified is acquired.

Step S220, recognizing the characters of the specific language type in the image to be recognized through the convolution layer carrying the residual error network, the recursion network layer and the transcription layer in sequence, and obtaining the character codes of the characters.

The convolution layer is used for extracting characteristic information in the image to be identified. The residual error network in the convolution layer can relieve gradient dispersion, gradient explosion and network degradation caused by excessive hidden layers in the machine learning model; meanwhile, the efficiency of training a machine learning model can be improved.

The circulating layer is used for analyzing semantic information in the characters, extracting time sequence information in the semantic information and outputting character coding sequences with indefinite lengths. For example, if the text in the image to be identified is "hello", the text corresponding to the text coding sequence output by the loop layer is "hheeellooo".

The transcription layer (Connectionist Temporal Classification, CTC) is used for performing processes such as de-duplication and integration on the text coding sequence outputted by the circulation layer, and converting the text coding sequence into a final text code. For example, after the repeated character in "hheeellooo" is removed, "hello" is output.

And step S230, indexing the dictionary file according to the literal code, and determining literal information corresponding to the literal code.

In the neural network model, a residual network is added in the convolution layer, so that the problems of gradient dispersion, gradient explosion, network degradation and the like caused by excessive hidden layers in the machine learning model are relieved through the residual network, and the recognition accuracy of the trained machine learning model is improved.

In order to further verify the recognized text information, the electronic device verifies the text information through a full word masking technology and corrects the text which does not accord with the text information context in the text information.

As a possible implementation, the electronic device is further configured with a verification model. The electronic equipment randomly conceals part of characters in the character information to obtain character information to be checked; predicting the characters at the hidden positions according to the context of the character information to be verified through the verification model to obtain predicted characters; and comparing the predicted text with the text at the hidden position, and correcting the text which does not accord with the text information context in the text information.

For example, for a certain hidden position, if the predicted text of the hidden position is different from the original text of the hidden position, the original text of the hidden position is replaced by the predicted text. Therefore, the verification model can improve the recognition accuracy of characters in the image to be recognized.

It should be understood that the electronic device used to train the machine learning model and the electronic device used to perform text recognition may be the same electronic device or may be different electronic devices.

For the electronic device, please refer to a block diagram of the electronic device 100 shown in fig. 4. The electronic device 100 comprises a software virtual device, a memory 120 and a processor 130. The memory 120, processor 130, and other elements are communicatively coupled, either directly or indirectly, to one another to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The software virtual device 110 includes at least one software function module that may be stored in the memory 120 in the form of software or firmware (firmware) or cured in an Operating System (OS) of the electronic device 100. The processor 130 is configured to execute executable modules stored in the memory 120, such as software functional modules and computer programs included in the software virtual device 110.

The Memory 120 may be, but is not limited to, a random access Memory (RandomAccess Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 120 is configured to store a program, and the processor 130 executes the program after receiving an execution instruction.

The processor 130 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The embodiment of the application also provides a character recognition device which is applied to the electronic equipment. The electronic equipment is configured with a machine learning model and a dictionary file which are trained by the training method, and the machine learning model sequentially comprises a convolution layer, a recursion network layer and a transcription layer which carry a residual error network. Referring to fig. 5, the character recognition device includes:

the image acquisition module 1101 is configured to acquire an image to be identified.

In the embodiment of the present application, the image acquisition module 1101 is configured to perform step S210 in fig. 3, and for a detailed description of the image acquisition module 1101, reference may be made to the detailed description of step S210.

The code obtaining module 1102 is configured to identify, sequentially through the convolutional layer carrying the residual network, the recursive network layer, and the transcription layer, characters of a specific language type in the image to be identified, and obtain a character code of each character.

In the embodiment of the present application, the code obtaining module 1102 is configured to perform step S220 in fig. 3, and for a detailed description of the code obtaining module 1102, reference may be made to the detailed description of step S220.

And the text index module 1103 is configured to index the dictionary file according to the text code, and determine text information corresponding to the text code.

In the embodiment of the present application, the text indexing module 1103 is configured to perform step S230 in fig. 3, and for a detailed description of the text indexing module 1103, reference may be made to the detailed description of step S230.

Optionally, referring to fig. 5 again, the text recognition device further includes:

and the text verification module 1104 is used for verifying the text information through a full word mask technology and correcting the text which does not accord with the text information context in the text information.

The embodiment of the application also provides a storage medium, and the storage medium stores a computer program which realizes the character recognition method when being executed by a processor.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores machine executable instructions which can be executed by the processor, and the word recognition method is realized when the machine executable instructions are executed by the processor.

In summary, the training method, the text recognition device, the storage medium and the electronic equipment provided by the application are implemented. The machine learning model is trained through sample images of various colors, various character sizes, various fuzzy layering degrees and various character inclination angles, so that the trained machine learning model can adapt to images to be identified in different styles, and the adaptability of the machine learning model to different scenes is improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The character recognition method is characterized by being applied to electronic equipment, wherein the electronic equipment is provided with a verification model and an untrained machine learning model;

training the machine learning model to identify characters of a specific language type in the sample image through the sample image to obtain a trained machine learning model and a dictionary file, wherein the trained machine learning model sequentially comprises a convolution layer, a recursion network layer and a transcription layer which carry a residual network, and the method comprises the following steps:

acquiring an image to be identified;

according to the word code index, the dictionary file determines word information corresponding to the word code;

2. The character recognition device is characterized by being applied to electronic equipment, wherein the electronic equipment is provided with a verification model and an untrained machine learning model;

the electronic equipment acquires sample images containing various colors, various character sizes, various fuzzy layers and various character inclination angles, wherein each sample image packet carries characters of specific language types, and the number of the characters of the specific language types is the same;

through the sample image, training the machine learning model is to the characters of specific language class in the sample image are discerned, obtain trained machine learning model and dictionary file, trained machine learning model is including carrying convolution layer, the recursion network layer and the transcribing layer of residual error network in proper order, character recognition device includes:

the image acquisition module is used for acquiring an image to be identified;

the character index module is used for indexing the dictionary file according to the character codes and determining character information corresponding to the character codes;

the character verification module is used for hiding part of characters in the character information to obtain character information to be verified;

3. A storage medium storing a computer program which, when executed by a processor, implements the word recognition method of claim 1.

4. An electronic device comprising a memory and a processor, the memory storing machine-executable instructions executable by the processor, the machine-executable instructions, when executed by the processor, implementing the word recognition method of claim 1.