WO2023093361A1

WO2023093361A1 - Image character recognition model training method, and image character recognition method and apparatus

Info

Publication number: WO2023093361A1
Application number: PCT/CN2022/125575
Authority: WO
Inventors: 范湉湉; 黄灿; 王长虎
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2021-11-25
Filing date: 2022-10-17
Publication date: 2023-06-01
Also published as: CN114049632A

Abstract

An image character recognition model training method, and an image character recognition method and apparatus. An image character recognition model consisting of a first model and a second model is trained by using a training image and a character identifier corresponding to the training image. The training image comprises a masked character area and a displayed character area. The image character recognition model is trained by using the training image comprising the masked character area, such that the image character recognition model can better extract bidirectional context information, and the image character recognition model obtained by such training has high accuracy. An image to be recognized is recognized by using the trained image character recognition model, so that a more accurate character comprised in the image to be recognized can be obtained.

Description

Image character recognition model training method, image character recognition method and device

This application claims the priority of the Chinese patent application with the application number 202111415332.2 and the title of "Image Character Recognition Model Training Method, Image Character Recognition Method and Device" submitted to the China Patent Office on November 25, 2021, the entire content of which is passed References are incorporated in this application.

technical field

The present application relates to the field of image processing, in particular to an image character recognition model training method, image character recognition method and device.

Background technique

OCR (Optical Character Recognition, Optical Character Recognition) is a technology that recognizes and analyzes images including characters to obtain characters in the image. Using OCR technology, the character information in the image can be obtained.

The OCR technology can first determine the character area in the image, then segment the character area in the image, and finally recognize the characters in the character area. At present, the accuracy of recognizing characters in an image by using the OCR technology is relatively low.

Contents of the invention

In view of this, the embodiments of the present application provide an image character recognition model training method, image character recognition method and device, which can more accurately recognize characters in an image.

Based on this, the technical solutions provided by the embodiments of the present application are as follows:

In the first aspect, the embodiment of the present application provides a method for training an image character recognition model, the method comprising:

Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;

Inputting the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;

According to the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.

In a second aspect, the embodiment of the present application provides a method for image character recognition, the method comprising:

Input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is obtained based on the training method of the image character recognition model described in the first aspect above;

Obtain the recognition result output by the image character recognition model.

In a third aspect, the embodiment of the present application provides an image character recognition model training device, the device comprising:

The first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;

The second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;

A training unit, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model includes the first model and the second model, so The character identifier is used to identify the masked character.

In a fourth aspect, an embodiment of the present application provides an image character recognition device, the device comprising:

The input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is trained based on the training method of the image character recognition model described in the first aspect above owned;

The recognition unit is configured to obtain the recognition result output by the image character recognition model.

In a fifth aspect, the embodiment of the present application provides an electronic device, including:

one or more processors;

a storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of the first aspect and the second aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of the first aspect and the second aspect is implemented.

In a seventh aspect, an embodiment of the present application provides a computer program product, the computer program product includes computer programs/instructions, and when the computer programs/instructions are executed by a processor, the computer programs described in any one of the first aspect and the second aspect are implemented. described method.

It can be seen that the embodiment of the present application has the following beneficial effects:

An image character recognition model training, image character recognition method and device provided in the embodiments of the present application uses a training image and character identifiers corresponding to the training image to train an image character recognition model composed of a first model and a second model. The training images include masked character regions and revealed character regions. The image character recognition model is trained by using the training image including the masked character area, which can enable the image character recognition model to better extract bidirectional context information, and the image character recognition model trained in this way has a higher accuracy rate. By using the trained image character recognition model to recognize the image to be recognized, more accurate characters included in the image to be recognized can be obtained.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.

FIG. 1 is a schematic framework diagram of an exemplary application scenario provided by an embodiment of the present application;

Fig. 2 is the flowchart of a kind of image character recognition model training method that the embodiment of the present application provides;

FIG. 3 is a schematic diagram of an original image provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a training image provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of another training image provided by the embodiment of the present application;

FIG. 6 is a schematic diagram of a training image character recognition model provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an image character recognition model training device provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an image character recognition device provided in an embodiment of the present application;

FIG. 9 is a schematic diagram of a basic structure of an electronic device provided by an embodiment of the present application.

Detailed ways

In order to make the above objects, features and advantages of the present application more obvious and understandable, the embodiments of the present application will be further described in detail below in conjunction with the accompanying drawings and specific implementation methods.

In order to facilitate the understanding of the technical solution provided by this application, the background technology involved in this application will first be described below.

After studying the traditional OCR technology, it is found that the current OCR technology based on deep learning is mainly divided into CTC (Connectionist Temporal Classification, connection timing classification) method and Attention method represented by Transformer. The above two methods can only extract one-way information, and the character recognition model trained is not accurate enough.

Based on this, an embodiment of the present application provides an image character recognition model training method, image character recognition method and device, using the training image and the character identifier corresponding to the training image to train the image character recognition model composed of the first model and the second model. The training images include masked character regions and revealed character regions. The image character recognition model is trained by using the training image including the masked character area, which can enable the image character recognition model to better extract bidirectional context information, and the image character recognition model trained in this way has a higher accuracy rate. By using the trained image character recognition model to recognize the image to be recognized, more accurate characters included in the image to be recognized can be obtained.

In order to facilitate the understanding of the technical solutions provided by the embodiments of the present application, the image character recognition method provided by the embodiments of the present application will be described below with reference to the accompanying drawings. Referring to FIG. 1 , the figure is a schematic framework diagram of an exemplary application scenario provided by an embodiment of the present application.

In practical applications, the image to be recognized 101 includes at least one character. The image to be recognized 101 that needs to be recognized is input into the image character recognition model 102 that has been trained, and the recognition result 103 output by the image character recognition model 102 is obtained. The recognition result includes at least one character recognized by the image character recognition model.

Those skilled in the art can understand that the schematic frame diagram shown in FIG. 1 is only an example in which the embodiments of the present application can be implemented. The scope of applicability of the embodiments of the present application is not limited by any aspect of this framework.

Based on the above description, the image character recognition model training method provided by the present application will be described in detail below with reference to the accompanying drawings.

Referring to Fig. 2, this figure is a flow chart of a kind of image character recognition model training method provided by the embodiment of the present application, as shown in Fig. 2, the method may include S201-S203:

S201: Input the training image into the first model, obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area, display The character area includes at least one display character, and the masking character area is used to cover at least one masking character.

Training images are images used to train the image character recognition model. The training images include at least two characters.

The training images include character regions. The character area is the area that a character occupies in the training image. In order to extract contextual information, some characters in the training image are occluded characters, that is, characters that are not displayed. The training image can be obtained by performing character masking on the original image. It should be noted that the characters in the middle order in the training image can be set as masked characters, so that the image character recognition model can extract the context information of the characters. Correspondingly, the training image includes at least one masked character area and at least one displayed character area. The masked character area may include at least one masked character. Masked character areas can be represented by full black areas. At least one display character is included in the display character area.

As an example, refer to FIG. 3 , which is a schematic diagram of an original image provided in an embodiment of the present application. Referring to FIG. 4 , the figure is a schematic diagram of a training image provided by an embodiment of the present application. Among them, the training image is obtained by performing character masking on the two characters "in" and "line" in the original image. The training images include two masked character regions and eleven revealed character regions. The masked character area is the character area where the two masked characters "in" and "line" are located. Each masked character area includes a character. The training images shown in Fig. 4 are masked from the original images based on character granularity. In addition, the original image can also be masked based on word segmentation granularity. Each masked character region is used to mask a masked participle. A masked participle includes at least two masked characters. Character IDs are used to mark shadowed characters in shadowed participles. Refer to FIG. 5 , which is a schematic diagram of another training image provided by the embodiment of the present application. Wherein, the training image is obtained by performing word segmentation processing on the text composed of characters in the original image, and then performing masking processing on at least one of the word segments. The training image shown in Fig. 5 includes one masked character region and six displayed character regions. The masked character area consists of the character area where the individual characters in the "line" are located. The six display character areas are composed of the character areas where each character in the six participles of "simple", "easy to use", "of", "picture", "production" and "software" is located.

Input the training image into the first model, and obtain the feature vectors corresponding to the characters included in the training image output by the first model. Wherein, the displayed character corresponds to the first feature vector, and the masked character corresponds to the second feature vector.

In a possible implementation manner, the first model may be formed by an encoder. The encoder can be a transformer.

The embodiment of the present application provides a specific implementation method of inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model. Please refer to the following for details.

S202: Input the second feature vector of the masked character into the second model, and obtain the predicted character of the masked character output by the second model.

The second model is used to classify the input feature vector and determine the character corresponding to the feature vector. Inputting the output second feature vector into the second model to obtain a predicted character corresponding to the second feature vector output by the second model.

In a possible implementation manner, the second model is composed of a classifier. Specifically, the classifier may be a linear classifier.

S203: Train an image character recognition model according to the character identification and predicted characters corresponding to the training image, the image character recognition model includes a first model and a second model, and the character identification is used to identify masked characters.

Character IDs correspond to training images. The character identification is used to identify the occluded characters included in each occluded character area in the training image. In a possible implementation manner, the character identification is the character serial number of the masked character in the preset dictionary.

It should be noted that the character identification corresponds to the number of masked characters included in each masked character area in the training image. As shown in the training image shown in FIG. 4 above, the training image corresponds to two character identifiers, which respectively identify the two characters "in" and "line". As shown in the training image shown in FIG. 5 above, the character identifier corresponding to the training image is one, which identifies the "online" participle, that is, the two characters "in" and "line".

Character IDs are used to identify the correct masked characters. Based on character identification and predicted characters, model parameters in the first model and the second model can be adjusted to realize the training of the image character recognition model composed of the first model and the second model.

In a possible implementation manner, when the image character recognition model satisfies a preset condition, it may be determined that the image character recognition model has been trained. The preset condition may be that the number of times of training the image character recognition model reaches the number threshold, or the accuracy of the character prediction of the image character recognition model reaches the number threshold.

Based on the relevant content of S201-S203 above, it can be seen that by using the image including the masked character area as the training image, the image character recognition model can learn context information during training, and the accuracy of the trained image character recognition model can be improved.

In a possible implementation, the first model includes an encoder and a visual word embedding layer. The visual word embedding layer is used to extract word vectors corresponding to revealed characters and masked characters. Input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, including the following two steps:

A1: Input the displayed character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character.

The visual word embedding layer is used to convert characters in the input character region into corresponding word vectors. By inputting the displayed character area and the masked character area of the training image into the visual word embedding layer, the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character output by the visual word embedding layer can be obtained.

In a possible implementation manner, the first model further includes a region-of-interest alignment layer. The training image can be processed by using the ROI alignment layer to obtain the displayed character area and the masked character area. Input the training image into the ROI alignment layer, and obtain the displayed character area and the masked character area output by the ROI alignment layer.

Refer to FIG. 6 , which is a schematic diagram of a training image character recognition model provided by an embodiment of the present application. First input the training image into the region of interest alignment layer, that is, the "Rol Align" layer in Figure 6, and segment the training image to obtain the displayed character area and the masked character area in the training image. The display character area is the characters of "simplified", "single", "good", "used", "de", "picture", "piece", "made", "made", "soft" and "piece" area. The masked character area is the character area where "in" and "line" are located.

Then input the obtained display character area and masked character area into the visual word embedding layer, that is, "Visual Token Embeddings" in Figure 6, to obtain the first word vector corresponding to each displayed character and the second word vector corresponding to each masked character. Among them, the first word vectors corresponding to each display character are "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", " v10" and "v11". The second word vectors corresponding to each masked character are "vmask1" and "vmask2" respectively.

A2: Input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the encoder.

The first word vector corresponding to the displayed character and the second word vector corresponding to the masked character output by the visual word embedding layer are input into the encoder of the first model. Referring to Figure 6, input the first word vector and the second word vector into the encoder, that is, "transformer encoder", and obtain the first feature vectors "h1", "h2", and "h3" corresponding to the display characters output by the encoder ", "h4", "h5", "h6", "h7", "h8", "h9", "h10" and "h11", and the second feature vectors "hmask1" and "hmask2" corresponding to the masked characters .

Further, the second feature vector is input into the second model, that is, the "linear classifier" in Fig. 6, to obtain the corresponding first predicted character and the second predicted character. Finally, based on the first predicted character and the corresponding character identifier, and the second predicted character and the corresponding character identifier, the model parameters of the second model and the model parameters of the first model can be adjusted to realize the training of the image character recognition model.

Based on the image character recognition model training method provided by the above-mentioned embodiments, the embodiment of the present application also provides an image character recognition method, including:

Input the image to be recognized into the image character recognition model;

Obtain the recognition result output by the image character recognition model.

Wherein, the image to be recognized is an image requiring character recognition. The image to be recognized includes at least one character to be recognized.

The image character recognition model is trained by the above-mentioned image character recognition model training method. The image character recognition model consists of a first model and a second model. The first model extracts and generates feature vectors corresponding to each character based on each character area in the image to be recognized. The second model generates a recognition result based on the feature vector corresponding to the character.

The recognition result includes the recognized character corresponding to the character to be recognized obtained after the image character recognition model recognizes the image to be recognized.

Based on the method for training an image character recognition model provided by the above method embodiment, the embodiment of the present application also provides an image character recognition model training device. The image character recognition model training device will be described below with reference to the accompanying drawings.

Refer to FIG. 7 , which is a schematic structural diagram of an image character recognition model training device provided by an embodiment of the present application. As shown in Figure 7, this image character recognition model training device comprises:

The first input unit 701 is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one display a character area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;

The second input unit 702 is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;

A training unit 703, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model including the first model and the second model, The character identifier is used to identify the masked character.

In a possible implementation manner, the masked character area is used for masking a masked participle, the masked participle includes at least two masked characters, and the character identifier is used to mark the masked participle.

In a possible implementation manner, the first model is formed by an encoder.

In a possible implementation manner, the first model further includes a visual word embedding layer, and the first input unit 701 includes:

The first input subunit is used to input the display character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character;

The second input subunit is configured to input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character output by the first model and the first feature vector corresponding to the masked character. Two eigenvectors.

In a possible implementation manner, the first model further includes an ROI alignment layer, and the displayed character area and the masked character area are obtained by inputting the training image into the ROI alignment layer.

In a possible implementation manner, the character identifier is a character sequence number of the masked character in a preset dictionary.

Based on the image character recognition method provided by the above-mentioned method embodiment, the embodiment of the present application also provides an image character recognition device. The image character recognition device will be described below in conjunction with the accompanying drawings.

Referring to FIG. 8 , this figure is a schematic structural diagram of an image character recognition device provided by an embodiment of the present application. As shown in Figure 8, the image character recognition device includes:

The input unit 801 is used to input the image to be recognized into the image character recognition model;

The recognition unit 802 is configured to obtain a recognition result output by the image character recognition model.

Based on the image character recognition model training method and the image character recognition method provided by the above method embodiments, the present application also provides an electronic device, including: one or more processors; a storage device, on which one or more A program, when the one or more programs are executed by the one or more processors, so that the one or more processors implement the image character recognition model training method or image character recognition as described in the above embodiment method.

Referring now to FIG. 9 , it shows a schematic structural diagram of an electronic device 900 suitable for implementing the embodiment of the present application. The terminal equipment in the embodiment of the present application may include but not limited to mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, personal digital assistant), PAD (portable android device, tablet computer), PMP (Portable Media Player, portable multimedia player), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs (television, television sets), desktop computers, and the like. The electronic device shown in FIG. 9 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.

As shown in FIG. 9, an electronic device 900 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 901, which may be randomly accessed according to a program stored in a read-only memory (ROM) 902 or loaded from a storage device 908. Various appropriate actions and processes are executed by programs in the memory (RAM) 903 . In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904 .

In general, the following devices can be connected to the I/O interface 905: input devices 908 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device 900 to perform wireless or wired communication with other devices to exchange data. While FIG. 9 shows electronic device 900 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to the embodiments of the present application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 909 , or from storage means 908 , or from ROM 902 . When the computer program is executed by the processing device 901, the above-mentioned functions defined in the method of the embodiment of the present application are executed.

The electronic device provided in the embodiment of the present application and the image character recognition model training method and the image character recognition method provided in the above embodiment belong to the same inventive concept. For technical details not described in detail in this embodiment, please refer to the above embodiment, and this The embodiment has the same beneficial effect as the above-mentioned embodiment.

Based on the image character recognition model training method and image character recognition method provided by the above method embodiments, the embodiment of the present application provides a computer storage medium on which a computer program is stored, wherein the program is executed by a processor At the same time, the training method of the image character recognition model or the image character recognition method as described in any of the above-mentioned embodiments is implemented.

It should be noted that the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some implementations, the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (Hyper Text Transfer Protocol, Hypertext Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the above-mentioned image character recognition model training method or image character recognition method.

Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware. Wherein, the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances, for example, the voice data collection module can also be described as a "data collection module".

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present application, [Example 1] provides a method for training an image character recognition model, the method comprising:

According to one or more embodiments of the present application, [Example 2] provides an image character recognition model training method, the masked character area is used for masked word segmentation, the masked word includes at least two masked characters, the character ID is used to mark the masked participle.

According to one or more embodiments of the present application, [Example 3] provides a method for training an image character recognition model, the first model is composed of an encoder.

According to one or more embodiments of the present application, [Example 4] provides a method for training an image character recognition model, the first model further includes a visual word embedding layer, and the training image is input into the first model to obtain the The first feature vector corresponding to the displayed character output by the first model and the second feature vector corresponding to the masked character include:

Inputting the display character region and the shielding character region into the visual word embedding layer to obtain the first word vector corresponding to the display character and the second word vector corresponding to the shielding character;

Inputting the first word vector and the second word vector into the encoder to obtain a first feature vector corresponding to a displayed character and a second feature vector corresponding to a masked character output by the first model.

According to one or more embodiments of the present application, [Example 5] provides a method for training an image character recognition model, the first model further includes a region-of-interest alignment layer, and the displayed character region and the masked character region are obtained by inputting the training image into the ROI alignment layer.

According to one or more embodiments of the present application, [Example 6] provides a method for training an image character recognition model, where the character identifier is the character sequence number of the masked character in the preset dictionary.

According to one or more embodiments of the present application, [Example 7] provides an image character recognition method, the method comprising:

Input the image to be recognized into the image character recognition model to obtain the recognition result output by the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on the above [Example 1]-[Example Six] obtained by the training method of the image character recognition model described in any one.

According to one or more embodiments of the present application, [Example 8] provides an image character recognition model training device, the device comprising:

According to one or more embodiments of the present application, [Example 9] provides an image character recognition model training device, the masked character area is used for masked word segmentation, the masked word includes at least two masked characters, the character ID is used to mark the masked participle.

According to one or more embodiments of the present application, [Example 10] provides an image character recognition model training device, the first model is composed of an encoder.

According to one or more embodiments of the present application, [Example Eleven] provides an image character recognition model training device, the first model also includes a visual word embedding layer, and the first input unit includes:

According to one or more embodiments of the present application, [Example 12] provides an image character recognition model training device, the first model further includes a region of interest alignment layer, the display character region and the masked character region is obtained by inputting the training image into the ROI alignment layer.

According to one or more embodiments of the present application, [Example 13] provides an image character recognition model training device, where the character identifier is the character sequence number of the masked character in the preset dictionary.

According to one or more embodiments of the present application, [Example Fourteen] provides an image character recognition device, the device comprising:

The input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on any one of the above-mentioned [Example 1]-[Example 6] The training method training of the image character recognition model obtains;

According to one or more embodiments of the present application, [Example 15] provides an electronic device, including:

one or more processors;

a storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method described in any one of [Example 1]-[Example 7].

According to one or more embodiments of the present application, [Example 16] provides a computer-readable medium on which a computer program is stored, wherein, when the program is executed by a processor, the implementation of [Example 1]-[Example Seven] the method described in any one.

It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system or device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant details, please refer to the description of the method part.

It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.

It should also be noted that in this article, relational terms such as first and second etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations Any such actual relationship or order exists between. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A method for training an image character recognition model, the method comprising:

Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;

Inputting the second feature vector of the masked character into a second model to obtain the predicted character of the masked character output by the second model;

According to the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.
The method according to claim 1, wherein the masked character area is used to mask a masked participle, the masked participle includes at least two masked characters, and the character identifier is used to mark the masked participle.
The method of claim 1, wherein the first model is formed by an encoder.
The method according to claim 3, wherein the first model further includes a visual word embedding layer, and the training image is input into the first model to obtain the first feature vector and the corresponding first feature vector of the displayed characters output by the first model The second feature vector corresponding to the masked character includes:

Input the display character region and the shielding character region into the visual word embedding layer to obtain the first word vector corresponding to the display character and the second word vector corresponding to the shielding character;

Inputting the first word vector and the second word vector into the encoder to obtain a first feature vector corresponding to a displayed character and a second feature vector corresponding to a masked character output by the first model.
The method according to claim 4, wherein the first model further comprises a region-of-interest alignment layer, and the displayed character region and the masked character region are obtained by inputting the training image into the region-of-interest alignment layer of.
The method according to any one of claims 1-5, wherein the character identifier is a character sequence number of the masked character in a preset dictionary.
A method for image character recognition, said method comprising:

The image to be recognized is input into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is trained based on the training method of the image character recognition model described in any one of claims 1-6 owned;

Obtain the recognition result output by the image character recognition model.
An image character recognition model training device, said device comprising:

The first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;

The second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;

a training unit, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character in the masked character area, the image character recognition model including the first model and the second model, The character identifier is used to identify the masked character.
An image character recognition device, said device comprising:

The input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on the image character recognition described in any one of claims 1-6. Obtained by the training method of the model;

The recognition unit is configured to obtain the recognition result output by the image character recognition model.
An electronic device comprising:

one or more processors;

a storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-6, or implement the above claim 7 the method described.
A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-6 is realized, or the method according to claim 7 above is realized .
A computer program product, the computer program product comprising a computer program/instruction, when the computer program/instruction is executed by a processor, the method according to any one of the above claims 1-6 is realized, or the above claim 7 is realized the method described.