WO2023093361A1 - Image character recognition model training method, and image character recognition method and apparatus - Google Patents

Image character recognition model training method, and image character recognition method and apparatus Download PDF

Info

Publication number
WO2023093361A1
WO2023093361A1 PCT/CN2022/125575 CN2022125575W WO2023093361A1 WO 2023093361 A1 WO2023093361 A1 WO 2023093361A1 CN 2022125575 W CN2022125575 W CN 2022125575W WO 2023093361 A1 WO2023093361 A1 WO 2023093361A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
image
model
masked
training
Prior art date
Application number
PCT/CN2022/125575
Other languages
French (fr)
Chinese (zh)
Inventor
范湉湉
黄灿
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023093361A1 publication Critical patent/WO2023093361A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present application relates to the field of image processing, in particular to an image character recognition model training method, image character recognition method and device.
  • OCR Optical Character Recognition, Optical Character Recognition
  • the OCR technology can first determine the character area in the image, then segment the character area in the image, and finally recognize the characters in the character area. At present, the accuracy of recognizing characters in an image by using the OCR technology is relatively low.
  • the embodiments of the present application provide an image character recognition model training method, image character recognition method and device, which can more accurately recognize characters in an image.
  • the embodiment of the present application provides a method for training an image character recognition model, the method comprising:
  • the training image Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;
  • the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.
  • the embodiment of the present application provides a method for image character recognition, the method comprising:
  • the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is obtained based on the training method of the image character recognition model described in the first aspect above;
  • the embodiment of the present application provides an image character recognition model training device, the device comprising:
  • the first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
  • the second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
  • a training unit configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model includes the first model and the second model, so The character identifier is used to identify the masked character.
  • an image character recognition device comprising:
  • the input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is trained based on the training method of the image character recognition model described in the first aspect above owned;
  • the recognition unit is configured to obtain the recognition result output by the image character recognition model.
  • the embodiment of the present application provides an electronic device, including:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of the first aspect and the second aspect.
  • an embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of the first aspect and the second aspect is implemented.
  • an embodiment of the present application provides a computer program product, the computer program product includes computer programs/instructions, and when the computer programs/instructions are executed by a processor, the computer programs described in any one of the first aspect and the second aspect are implemented. described method.
  • An image character recognition model training, image character recognition method and device uses a training image and character identifiers corresponding to the training image to train an image character recognition model composed of a first model and a second model.
  • the training images include masked character regions and revealed character regions.
  • the image character recognition model is trained by using the training image including the masked character area, which can enable the image character recognition model to better extract bidirectional context information, and the image character recognition model trained in this way has a higher accuracy rate. By using the trained image character recognition model to recognize the image to be recognized, more accurate characters included in the image to be recognized can be obtained.
  • FIG. 1 is a schematic framework diagram of an exemplary application scenario provided by an embodiment of the present application
  • Fig. 2 is the flowchart of a kind of image character recognition model training method that the embodiment of the present application provides;
  • FIG. 3 is a schematic diagram of an original image provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a training image provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another training image provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a training image character recognition model provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an image character recognition model training device provided in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image character recognition device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a basic structure of an electronic device provided by an embodiment of the present application.
  • an embodiment of the present application provides an image character recognition model training method, image character recognition method and device, using the training image and the character identifier corresponding to the training image to train the image character recognition model composed of the first model and the second model.
  • the training images include masked character regions and revealed character regions.
  • the image character recognition model is trained by using the training image including the masked character area, which can enable the image character recognition model to better extract bidirectional context information, and the image character recognition model trained in this way has a higher accuracy rate. By using the trained image character recognition model to recognize the image to be recognized, more accurate characters included in the image to be recognized can be obtained.
  • FIG. 1 the figure is a schematic framework diagram of an exemplary application scenario provided by an embodiment of the present application.
  • the image to be recognized 101 includes at least one character.
  • the image to be recognized 101 that needs to be recognized is input into the image character recognition model 102 that has been trained, and the recognition result 103 output by the image character recognition model 102 is obtained.
  • the recognition result includes at least one character recognized by the image character recognition model.
  • FIG. 1 is only an example in which the embodiments of the present application can be implemented.
  • the scope of applicability of the embodiments of the present application is not limited by any aspect of this framework.
  • this figure is a flow chart of a kind of image character recognition model training method provided by the embodiment of the present application, as shown in Fig. 2, the method may include S201-S203:
  • S201 Input the training image into the first model, obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area, display The character area includes at least one display character, and the masking character area is used to cover at least one masking character.
  • Training images are images used to train the image character recognition model.
  • the training images include at least two characters.
  • the training images include character regions.
  • the character area is the area that a character occupies in the training image.
  • some characters in the training image are occluded characters, that is, characters that are not displayed.
  • the training image can be obtained by performing character masking on the original image.
  • the characters in the middle order in the training image can be set as masked characters, so that the image character recognition model can extract the context information of the characters.
  • the training image includes at least one masked character area and at least one displayed character area.
  • the masked character area may include at least one masked character.
  • Masked character areas can be represented by full black areas. At least one display character is included in the display character area.
  • FIG. 3 is a schematic diagram of an original image provided in an embodiment of the present application.
  • FIG. 4 the figure is a schematic diagram of a training image provided by an embodiment of the present application.
  • the training image is obtained by performing character masking on the two characters "in” and "line” in the original image.
  • the training images include two masked character regions and eleven revealed character regions.
  • the masked character area is the character area where the two masked characters "in” and "line” are located. Each masked character area includes a character.
  • the training images shown in Fig. 4 are masked from the original images based on character granularity.
  • the original image can also be masked based on word segmentation granularity.
  • Each masked character region is used to mask a masked participle.
  • a masked participle includes at least two masked characters. Character IDs are used to mark shadowed characters in shadowed participles.
  • FIG. 5 is a schematic diagram of another training image provided by the embodiment of the present application. Wherein, the training image is obtained by performing word segmentation processing on the text composed of characters in the original image, and then performing masking processing on at least one of the word segments.
  • the training image shown in Fig. 5 includes one masked character region and six displayed character regions.
  • the masked character area consists of the character area where the individual characters in the "line” are located.
  • the six display character areas are composed of the character areas where each character in the six participles of "simple", “easy to use", “of”, “picture”, “production” and “software” is located.
  • the displayed character corresponds to the first feature vector
  • the masked character corresponds to the second feature vector.
  • the first model may be formed by an encoder.
  • the encoder can be a transformer.
  • the embodiment of the present application provides a specific implementation method of inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model. Please refer to the following for details.
  • S202 Input the second feature vector of the masked character into the second model, and obtain the predicted character of the masked character output by the second model.
  • the second model is used to classify the input feature vector and determine the character corresponding to the feature vector. Inputting the output second feature vector into the second model to obtain a predicted character corresponding to the second feature vector output by the second model.
  • the second model is composed of a classifier.
  • the classifier may be a linear classifier.
  • S203 Train an image character recognition model according to the character identification and predicted characters corresponding to the training image, the image character recognition model includes a first model and a second model, and the character identification is used to identify masked characters.
  • Character IDs correspond to training images.
  • the character identification is used to identify the occluded characters included in each occluded character area in the training image.
  • the character identification is the character serial number of the masked character in the preset dictionary.
  • the character identification corresponds to the number of masked characters included in each masked character area in the training image.
  • the training image corresponds to two character identifiers, which respectively identify the two characters "in” and "line”.
  • the character identifier corresponding to the training image is one, which identifies the "online” participle, that is, the two characters "in” and "line”.
  • Character IDs are used to identify the correct masked characters. Based on character identification and predicted characters, model parameters in the first model and the second model can be adjusted to realize the training of the image character recognition model composed of the first model and the second model.
  • the image character recognition model when the image character recognition model satisfies a preset condition, it may be determined that the image character recognition model has been trained.
  • the preset condition may be that the number of times of training the image character recognition model reaches the number threshold, or the accuracy of the character prediction of the image character recognition model reaches the number threshold.
  • the image character recognition model can learn context information during training, and the accuracy of the trained image character recognition model can be improved.
  • the first model includes an encoder and a visual word embedding layer.
  • the visual word embedding layer is used to extract word vectors corresponding to revealed characters and masked characters.
  • A1 Input the displayed character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character.
  • the visual word embedding layer is used to convert characters in the input character region into corresponding word vectors.
  • the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character output by the visual word embedding layer can be obtained.
  • the first model further includes a region-of-interest alignment layer.
  • the training image can be processed by using the ROI alignment layer to obtain the displayed character area and the masked character area. Input the training image into the ROI alignment layer, and obtain the displayed character area and the masked character area output by the ROI alignment layer.
  • FIG. 6 is a schematic diagram of a training image character recognition model provided by an embodiment of the present application.
  • the region of interest alignment layer that is, the "Rol Align” layer in Figure 6, and segment the training image to obtain the displayed character area and the masked character area in the training image.
  • the display character area is the characters of "simplified”, “single”, “good”, “used”, “de”, “picture”, “piece”, “made”, “made”, “soft” and “piece” area.
  • the masked character area is the character area where "in” and "line” are located.
  • the visual word embedding layer that is, "Visual Token Embeddings” in Figure 6, to obtain the first word vector corresponding to each displayed character and the second word vector corresponding to each masked character.
  • the first word vectors corresponding to each display character are "v1", “v2", “v3”, “v4", “v5", “v6", “v7”, “v8”, “v9”, “ v10” and “v11".
  • the second word vectors corresponding to each masked character are "vmask1" and "vmask2" respectively.
  • A2 Input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the encoder.
  • the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character output by the visual word embedding layer are input into the encoder of the first model.
  • the encoder that is, "transformer encoder”
  • the second feature vector is input into the second model, that is, the "linear classifier" in Fig. 6, to obtain the corresponding first predicted character and the second predicted character.
  • the model parameters of the second model and the model parameters of the first model can be adjusted to realize the training of the image character recognition model.
  • the embodiment of the present application also provides an image character recognition method, including:
  • the image to be recognized is an image requiring character recognition.
  • the image to be recognized includes at least one character to be recognized.
  • the image character recognition model is trained by the above-mentioned image character recognition model training method.
  • the image character recognition model consists of a first model and a second model.
  • the first model extracts and generates feature vectors corresponding to each character based on each character area in the image to be recognized.
  • the second model generates a recognition result based on the feature vector corresponding to the character.
  • the recognition result includes the recognized character corresponding to the character to be recognized obtained after the image character recognition model recognizes the image to be recognized.
  • the embodiment of the present application also provides an image character recognition model training device.
  • the image character recognition model training device will be described below with reference to the accompanying drawings.
  • FIG. 7 is a schematic structural diagram of an image character recognition model training device provided by an embodiment of the present application. As shown in Figure 7, this image character recognition model training device comprises:
  • the first input unit 701 is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one display a character area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
  • the second input unit 702 is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
  • a training unit 703 configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model including the first model and the second model, The character identifier is used to identify the masked character.
  • the masked character area is used for masking a masked participle
  • the masked participle includes at least two masked characters
  • the character identifier is used to mark the masked participle.
  • the first model is formed by an encoder.
  • the first model further includes a visual word embedding layer
  • the first input unit 701 includes:
  • the first input subunit is used to input the display character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character;
  • the second input subunit is configured to input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character output by the first model and the first feature vector corresponding to the masked character.
  • Two eigenvectors Two eigenvectors.
  • the first model further includes an ROI alignment layer, and the displayed character area and the masked character area are obtained by inputting the training image into the ROI alignment layer.
  • the character identifier is a character sequence number of the masked character in a preset dictionary.
  • the embodiment of the present application also provides an image character recognition device.
  • the image character recognition device will be described below in conjunction with the accompanying drawings.
  • FIG. 8 this figure is a schematic structural diagram of an image character recognition device provided by an embodiment of the present application. As shown in Figure 8, the image character recognition device includes:
  • the input unit 801 is used to input the image to be recognized into the image character recognition model
  • the recognition unit 802 is configured to obtain a recognition result output by the image character recognition model.
  • the image to be recognized is an image requiring character recognition.
  • the image to be recognized includes at least one character to be recognized.
  • the image character recognition model is trained by the above-mentioned image character recognition model training method.
  • the image character recognition model consists of a first model and a second model.
  • the first model extracts and generates feature vectors corresponding to each character based on each character area in the image to be recognized.
  • the second model generates a recognition result based on the feature vector corresponding to the character.
  • the recognition result includes the recognized character corresponding to the character to be recognized obtained after the image character recognition model recognizes the image to be recognized.
  • the present application also provides an electronic device, including: one or more processors; a storage device, on which one or more A program, when the one or more programs are executed by the one or more processors, so that the one or more processors implement the image character recognition model training method or image character recognition as described in the above embodiment method.
  • FIG. 9 it shows a schematic structural diagram of an electronic device 900 suitable for implementing the embodiment of the present application.
  • the terminal equipment in the embodiment of the present application may include but not limited to mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, personal digital assistant), PAD (portable android device, tablet computer), PMP (Portable Media Player, portable multimedia player), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs (television, television sets), desktop computers, and the like.
  • the electronic device shown in FIG. 9 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
  • an electronic device 900 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 901, which may be randomly accessed according to a program stored in a read-only memory (ROM) 902 or loaded from a storage device 908.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM read-only memory
  • various appropriate actions and processes are executed by programs in the memory (RAM) 903 .
  • RAM 903 In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored.
  • the processing device 901, ROM 902, and RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904 .
  • the following devices can be connected to the I/O interface 905: input devices 908 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909.
  • the communication means 909 may allow the electronic device 900 to perform wireless or wired communication with other devices to exchange data. While FIG. 9 shows electronic device 900 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 909 , or from storage means 908 , or from ROM 902 .
  • the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present application are executed.
  • the electronic device provided in the embodiment of the present application and the image character recognition model training method and the image character recognition method provided in the above embodiment belong to the same inventive concept.
  • the image character recognition model training method and the image character recognition method provided in the above embodiment belong to the same inventive concept.
  • the embodiment has the same beneficial effect as the above-mentioned embodiment.
  • the embodiment of the present application provides a computer storage medium on which a computer program is stored, wherein the program is executed by a processor At the same time, the training method of the image character recognition model or the image character recognition method as described in any of the above-mentioned embodiments is implemented.
  • the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (Hyper Text Transfer Protocol, Hypertext Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections.
  • HTTP Hyper Text Transfer Protocol
  • Examples of communication networks include local area networks ("LANs”), wide area networks ("WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the above-mentioned image character recognition model training method or image character recognition method.
  • Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware.
  • the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances, for example, the voice data collection module can also be described as a "data collection module”.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a method for training an image character recognition model, the method comprising:
  • the training image Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;
  • the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.
  • Example 2 provides an image character recognition model training method
  • the masked character area is used for masked word segmentation
  • the masked word includes at least two masked characters
  • the character ID is used to mark the masked participle.
  • Example 3 provides a method for training an image character recognition model, the first model is composed of an encoder.
  • Example 4 provides a method for training an image character recognition model
  • the first model further includes a visual word embedding layer
  • the training image is input into the first model to obtain the
  • the first feature vector corresponding to the displayed character output by the first model and the second feature vector corresponding to the masked character include:
  • Example 5 provides a method for training an image character recognition model
  • the first model further includes a region-of-interest alignment layer
  • the displayed character region and the masked character region are obtained by inputting the training image into the ROI alignment layer.
  • Example 6 provides a method for training an image character recognition model, where the character identifier is the character sequence number of the masked character in the preset dictionary.
  • Example 7 provides an image character recognition method, the method comprising:
  • the image to be recognized into the image character recognition model to obtain the recognition result output by the image character recognition model
  • the image to be recognized includes at least one character to be recognized
  • the image character recognition model is based on the above [Example 1]-[Example Six] obtained by the training method of the image character recognition model described in any one.
  • Example 8 provides an image character recognition model training device, the device comprising:
  • the first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
  • the second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
  • a training unit configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model includes the first model and the second model, so The character identifier is used to identify the masked character.
  • Example 9 provides an image character recognition model training device, the masked character area is used for masked word segmentation, the masked word includes at least two masked characters, the character ID is used to mark the masked participle.
  • Example 10 provides an image character recognition model training device, the first model is composed of an encoder.
  • Example Eleven provides an image character recognition model training device, the first model also includes a visual word embedding layer, and the first input unit includes:
  • the first input subunit is used to input the display character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character;
  • the second input subunit is configured to input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character output by the first model and the first feature vector corresponding to the masked character.
  • Two eigenvectors Two eigenvectors.
  • Example 12 provides an image character recognition model training device, the first model further includes a region of interest alignment layer, the display character region and the masked character region is obtained by inputting the training image into the ROI alignment layer.
  • Example 13 provides an image character recognition model training device, where the character identifier is the character sequence number of the masked character in the preset dictionary.
  • Example Fourteen provides an image character recognition device, the device comprising:
  • the input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on any one of the above-mentioned [Example 1]-[Example 6]
  • the training method training of the image character recognition model obtains;
  • the recognition unit is configured to obtain the recognition result output by the image character recognition model.
  • Example 15 provides an electronic device, including:
  • processors one or more processors
  • the one or more processors are made to implement the method described in any one of [Example 1]-[Example 7].
  • Example 16 provides a computer-readable medium on which a computer program is stored, wherein, when the program is executed by a processor, the implementation of [Example 1]-[Example Seven] the method described in any one.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
  • the system or device disclosed in the embodiment since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant details, please refer to the description of the method part.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Abstract

An image character recognition model training method, and an image character recognition method and apparatus. An image character recognition model consisting of a first model and a second model is trained by using a training image and a character identifier corresponding to the training image. The training image comprises a masked character area and a displayed character area. The image character recognition model is trained by using the training image comprising the masked character area, such that the image character recognition model can better extract bidirectional context information, and the image character recognition model obtained by such training has high accuracy. An image to be recognized is recognized by using the trained image character recognition model, so that a more accurate character comprised in the image to be recognized can be obtained.

Description

图像字符识别模型训练方法、图像字符识别方法及装置Image character recognition model training method, image character recognition method and device
本申请要求于2021年11月25日提交中国专利局、申请号为202111415332.2、申请名称为“图像字符识别模型训练方法、图像字符识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111415332.2 and the title of "Image Character Recognition Model Training Method, Image Character Recognition Method and Device" submitted to the China Patent Office on November 25, 2021, the entire content of which is passed References are incorporated in this application.
技术领域technical field
本申请涉及图像处理领域,具体涉及一种图像字符识别模型训练方法、图像字符识别方法及装置。The present application relates to the field of image processing, in particular to an image character recognition model training method, image character recognition method and device.
背景技术Background technique
OCR(Optical Character Recognition,光学字符识别)是对包括字符的图像进行识别分析,得到图像中的字符的技术。利用OCR技术,能够获取图像中的字符信息。OCR (Optical Character Recognition, Optical Character Recognition) is a technology that recognizes and analyzes images including characters to obtain characters in the image. Using OCR technology, the character information in the image can be obtained.
采用OCR技术可以先确定图像中的字符区域,再对图像中的字符区域进行分割,最后对字符区域中的字符进行识别。目前,利用OCR技术识别对图像中的字符的准确度较低。The OCR technology can first determine the character area in the image, then segment the character area in the image, and finally recognize the characters in the character area. At present, the accuracy of recognizing characters in an image by using the OCR technology is relatively low.
发明内容Contents of the invention
有鉴于此,本申请实施例提供一种图像字符识别模型训练方法、图像字符识别方法及装置,能够较为准确地对图像中的字符进行识别。In view of this, the embodiments of the present application provide an image character recognition model training method, image character recognition method and device, which can more accurately recognize characters in an image.
基于此,本申请实施例提供的技术方案如下:Based on this, the technical solutions provided by the embodiments of the present application are as follows:
第一方面,本申请实施例提供一种图像字符识别模型训练方法,所述方法包括:In the first aspect, the embodiment of the present application provides a method for training an image character recognition model, the method comprising:
将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;
将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的遮蔽字符的预测字符;Inputting the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
根据与所述训练图像对应的字符标识和所述遮蔽字符的预测字符,训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。According to the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.
第二方面,本申请实施例提供一种图像字符识别方法,所述方法包括:In a second aspect, the embodiment of the present application provides a method for image character recognition, the method comprising:
将待识别图像输入图像字符识别模型,所述待识别图像包括至少一个待识别字符,所述图像字符识别模型是基于上述第一方面所述的图像字符识别模型的训练方法训练得到的;Input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is obtained based on the training method of the image character recognition model described in the first aspect above;
获取所述图像字符识别模型输出的识别结果。Obtain the recognition result output by the image character recognition model.
第三方面,本申请实施例提供一种图像字符识别模型训练装置,所述装置包括:In a third aspect, the embodiment of the present application provides an image character recognition model training device, the device comprising:
第一输入单元,用于将训练图像输入第一模型,得到所述第一模型输出的显示字符对 应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;The first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
第二输入单元,用于将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的遮蔽字符的预测字符;The second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
训练单元,用于根据与所述训练图像对应的字符标识和所述遮蔽字符的预测字符,训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。A training unit, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model includes the first model and the second model, so The character identifier is used to identify the masked character.
第四方面,本申请实施例提供一种图像字符识别装置,所述装置包括:In a fourth aspect, an embodiment of the present application provides an image character recognition device, the device comprising:
输入单元,用于将待识别图像输入图像字符识别模型,所述待识别图像包括至少一个待识别字符,所述图像字符识别模型是基于上述第一方面所述的图像字符识别模型的训练方法训练得到的;The input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is trained based on the training method of the image character recognition model described in the first aspect above owned;
识别单元,用于获取所述图像字符识别模型输出的识别结果。The recognition unit is configured to obtain the recognition result output by the image character recognition model.
第五方面,本申请实施例提供一种电子设备,包括:In a fifth aspect, the embodiment of the present application provides an electronic device, including:
一个或多个处理器;one or more processors;
存储装置,其上存储有一个或多个程序,a storage device on which one or more programs are stored,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面和第二方面中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of the first aspect and the second aspect.
第六方面,本申请实施例提供一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如第一方面和第二方面中任一所述的方法。In a sixth aspect, an embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of the first aspect and the second aspect is implemented.
第七方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现如第一方面和第二方面中任一所述的方法。In a seventh aspect, an embodiment of the present application provides a computer program product, the computer program product includes computer programs/instructions, and when the computer programs/instructions are executed by a processor, the computer programs described in any one of the first aspect and the second aspect are implemented. described method.
由此可见,本申请实施例具有如下有益效果:It can be seen that the embodiment of the present application has the following beneficial effects:
本申请实施例提供的一种图像字符识别模型训练、图像字符识别方法及装置,利用训练图像和训练图像对应的字符标识训练由第一模型和第二模型构成的图像字符识别模型。训练图像包括遮蔽字符区域和显示字符区域。利用包括遮蔽字符区域的训练图像对图像字符识别模型进行训练,能够使得图像字符识别模型更好地提取双向上下文信息,如此训练得到的图像字符识别模型准确率较高。利用训练完成的图像字符识别模型对待识别图像进行识别,可以得到更为准确的待识别图像包括的字符。An image character recognition model training, image character recognition method and device provided in the embodiments of the present application uses a training image and character identifiers corresponding to the training image to train an image character recognition model composed of a first model and a second model. The training images include masked character regions and revealed character regions. The image character recognition model is trained by using the training image including the masked character area, which can enable the image character recognition model to better extract bidirectional context information, and the image character recognition model trained in this way has a higher accuracy rate. By using the trained image character recognition model to recognize the image to be recognized, more accurate characters included in the image to be recognized can be obtained.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.
图1为本申请实施例提供的示例性应用场景的框架示意图;FIG. 1 is a schematic framework diagram of an exemplary application scenario provided by an embodiment of the present application;
图2为本申请实施例提供的一种图像字符识别模型训练方法的流程图;Fig. 2 is the flowchart of a kind of image character recognition model training method that the embodiment of the present application provides;
图3为本申请实施例提供的一种原始图像的示意图;FIG. 3 is a schematic diagram of an original image provided by an embodiment of the present application;
图4为本申请实施例提供的一种训练图像的示意图;FIG. 4 is a schematic diagram of a training image provided in an embodiment of the present application;
图5为本申请实施例提供的另一种训练图像的示意图;FIG. 5 is a schematic diagram of another training image provided by the embodiment of the present application;
图6为本申请实施例提供的一种训练图像字符识别模型的示意图;FIG. 6 is a schematic diagram of a training image character recognition model provided by an embodiment of the present application;
图7为本申请实施例提供的一种图像字符识别模型训练装置的结构示意图;FIG. 7 is a schematic structural diagram of an image character recognition model training device provided in an embodiment of the present application;
图8为本申请实施例提供的一种图像字符识别装置的结构示意图;FIG. 8 is a schematic structural diagram of an image character recognition device provided in an embodiment of the present application;
图9为本申请实施例提供的一种电子设备的基本结构的示意图。FIG. 9 is a schematic diagram of a basic structure of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请实施例作进一步详细的说明。In order to make the above objects, features and advantages of the present application more obvious and understandable, the embodiments of the present application will be further described in detail below in conjunction with the accompanying drawings and specific implementation methods.
为便于理解本申请提供的技术方案,下面将先对本申请涉及的背景技术进行说明。In order to facilitate the understanding of the technical solution provided by this application, the background technology involved in this application will first be described below.
在对传统的OCR技术进行研究后发现,目前基于深度学习的OCR技术主要分为以CRNN(Convolutional Recurrent Neural Network,卷积递归神经网络)为代表的CTC(Connectionist Temporal Classification,连接时序分类)方法和以Transformer为代表的Attention方法。上述两种方法均仅能提取单向信息,训练得到的字符识别模型不够准确。After studying the traditional OCR technology, it is found that the current OCR technology based on deep learning is mainly divided into CTC (Connectionist Temporal Classification, connection timing classification) method and Attention method represented by Transformer. The above two methods can only extract one-way information, and the character recognition model trained is not accurate enough.
基于此,本申请实施例提供一种图像字符识别模型训练方法、图像字符识别方法及装置,利用训练图像和训练图像对应的字符标识训练由第一模型和第二模型构成的图像字符识别模型。训练图像包括遮蔽字符区域和显示字符区域。利用包括遮蔽字符区域的训练图像对图像字符识别模型进行训练,能够使得图像字符识别模型更好地提取双向上下文信息,如此训练得到的图像字符识别模型准确率较高。利用训练完成的图像字符识别模型对待识别图像进行识别,可以得到更为准确的待识别图像包括的字符。Based on this, an embodiment of the present application provides an image character recognition model training method, image character recognition method and device, using the training image and the character identifier corresponding to the training image to train the image character recognition model composed of the first model and the second model. The training images include masked character regions and revealed character regions. The image character recognition model is trained by using the training image including the masked character area, which can enable the image character recognition model to better extract bidirectional context information, and the image character recognition model trained in this way has a higher accuracy rate. By using the trained image character recognition model to recognize the image to be recognized, more accurate characters included in the image to be recognized can be obtained.
为了便于理解本申请实施例提供的技术方案,下面结合附图对本申请实施例提供的图像字符识别方法进行说明。参见图1所示,该图为本申请实施例提供的示例性应用场景的框架示意图。In order to facilitate the understanding of the technical solutions provided by the embodiments of the present application, the image character recognition method provided by the embodiments of the present application will be described below with reference to the accompanying drawings. Referring to FIG. 1 , the figure is a schematic framework diagram of an exemplary application scenario provided by an embodiment of the present application.
在实际应用中,待识别图像101包括至少一个字符。将需要识别字符的待识别图像101输入至完成训练的图像字符识别模型102中,得到图像字符识别模型102输出的识别结果103。识别结果中包括图像字符识别模型识别得到的至少一个字符。In practical applications, the image to be recognized 101 includes at least one character. The image to be recognized 101 that needs to be recognized is input into the image character recognition model 102 that has been trained, and the recognition result 103 output by the image character recognition model 102 is obtained. The recognition result includes at least one character recognized by the image character recognition model.
本领域技术人员可以理解,图1所示的框架示意图仅是本申请的实施方式可以在其中得以实现的一个示例。本申请实施方式的适用范围不受到该框架任何方面的限制。Those skilled in the art can understand that the schematic frame diagram shown in FIG. 1 is only an example in which the embodiments of the present application can be implemented. The scope of applicability of the embodiments of the present application is not limited by any aspect of this framework.
基于上述说明,下面将结合附图先对本申请提供的图像字符识别模型训练方法进行详细说明。Based on the above description, the image character recognition model training method provided by the present application will be described in detail below with reference to the accompanying drawings.
参见图2,该图为本申请实施例提供的一种图像字符识别模型训练方法的流程图,如图2所示,方法可以包括S201-S203:Referring to Fig. 2, this figure is a flow chart of a kind of image character recognition model training method provided by the embodiment of the present application, as shown in Fig. 2, the method may include S201-S203:
S201:将训练图像输入第一模型,得到第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,显示字符区域包括至少一个显示字符,遮蔽字符区域用于遮蔽至少一个遮蔽字符。S201: Input the training image into the first model, obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area, display The character area includes at least one display character, and the masking character area is used to cover at least one masking character.
训练图像是用于训练图像字符识别模型的图像。训练图像包括至少两个字符。Training images are images used to train the image character recognition model. The training images include at least two characters.
训练图像包括字符区域。字符区域是字符在训练图像中所占的区域。为了提取上下文信息,训练图像中的部分字符是遮蔽字符,也就是不显示的字符。训练图像可以是由原始图像进行字符遮蔽处理得到的。需要说明的是,可以将训练图像中位于中间顺序的字符设置为遮蔽字符,便于图像字符识别模型提取字符的上下文信息。对应的,训练图像包括至少一个遮蔽字符区域和至少一个显示字符区域。遮蔽字符区域可以包括至少一个遮蔽字符。遮蔽字符区域可以利用全黑色区域表示。显示字符区域中包括至少一个显示字符。The training images include character regions. The character area is the area that a character occupies in the training image. In order to extract contextual information, some characters in the training image are occluded characters, that is, characters that are not displayed. The training image can be obtained by performing character masking on the original image. It should be noted that the characters in the middle order in the training image can be set as masked characters, so that the image character recognition model can extract the context information of the characters. Correspondingly, the training image includes at least one masked character area and at least one displayed character area. The masked character area may include at least one masked character. Masked character areas can be represented by full black areas. At least one display character is included in the display character area.
作为一种示例,参见图3所示,该图为本申请实施例提供的一种原始图像的示意图。参见图4所示,该图为本申请实施例提供的一种训练图像的示意图。其中,训练图像是对原始图像中“在”和“线”两个字符进行字符遮蔽处理得到的。训练图像中包括两个遮蔽字符区域和十一个显示字符区域。遮蔽字符区域就是“在”和“线”两个遮蔽字符所在的字符区域。每个遮蔽字符区域包括一个字符。图4所示的训练图像是基于字符粒度对原始图像进行遮蔽处理的。此外,还可以基于分词粒度对原始图像进行遮蔽处理。每个遮蔽字符区域用于遮蔽一个遮蔽分词。一个遮蔽分词包括至少两个遮蔽字符。字符标识用于标记遮蔽分词中的遮蔽字符。参见图5所示,该图为本申请实施例提供的另一种训练图像的示意图。其中,训练图像是对原始图像中字符组成的文本进行分词处理,再对至少一个分词进行遮蔽处理得到的。图5所示的训练图像中包括一个遮蔽字符区域和六个显示字符区域。遮蔽字符区域由“在线”中各个字符所在的字符区域组成。六个显示字符区域分别是由“简单”、“好用”、“的”、“图片”、“制作”和“软件”六个分词中各个字符所在的字符区域组成的。As an example, refer to FIG. 3 , which is a schematic diagram of an original image provided in an embodiment of the present application. Referring to FIG. 4 , the figure is a schematic diagram of a training image provided by an embodiment of the present application. Among them, the training image is obtained by performing character masking on the two characters "in" and "line" in the original image. The training images include two masked character regions and eleven revealed character regions. The masked character area is the character area where the two masked characters "in" and "line" are located. Each masked character area includes a character. The training images shown in Fig. 4 are masked from the original images based on character granularity. In addition, the original image can also be masked based on word segmentation granularity. Each masked character region is used to mask a masked participle. A masked participle includes at least two masked characters. Character IDs are used to mark shadowed characters in shadowed participles. Refer to FIG. 5 , which is a schematic diagram of another training image provided by the embodiment of the present application. Wherein, the training image is obtained by performing word segmentation processing on the text composed of characters in the original image, and then performing masking processing on at least one of the word segments. The training image shown in Fig. 5 includes one masked character region and six displayed character regions. The masked character area consists of the character area where the individual characters in the "line" are located. The six display character areas are composed of the character areas where each character in the six participles of "simple", "easy to use", "of", "picture", "production" and "software" is located.
将训练图像输入第一模型中,得到第一模型输出的训练图像包括的字符所对应的特征向量。其中,显示字符对应的是第一特征向量,遮蔽字符对应的是第二特征向量。Input the training image into the first model, and obtain the feature vectors corresponding to the characters included in the training image output by the first model. Wherein, the displayed character corresponds to the first feature vector, and the masked character corresponds to the second feature vector.
在一种可能的实现方式中,第一模型可以是由编码器构成的。编码器可以为transformer。In a possible implementation manner, the first model may be formed by an encoder. The encoder can be a transformer.
本申请实施例提供一种将训练图像输入第一模型,得到第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量的具体实现方式,具体请参见下文。The embodiment of the present application provides a specific implementation method of inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model. Please refer to the following for details.
S202:将遮蔽字符的第二特征向量输入第二模型,得到第二模型输出的遮蔽字符的预测字符。S202: Input the second feature vector of the masked character into the second model, and obtain the predicted character of the masked character output by the second model.
第二模型用于对输入的特征向量进行分类,确定特征向量对应的字符。将输出的第二特征向量输入第二模型中,得到第二模型输出的第二特征向量对应的预测字符。The second model is used to classify the input feature vector and determine the character corresponding to the feature vector. Inputting the output second feature vector into the second model to obtain a predicted character corresponding to the second feature vector output by the second model.
在一种可能的实现方式中,第二模型由分类器构成。分类器具体可以是线性分类器。In a possible implementation manner, the second model is composed of a classifier. Specifically, the classifier may be a linear classifier.
S203:根据与训练图像对应的字符标识和预测字符,训练图像字符识别模型,图像字符识别模型包括第一模型和第二模型,字符标识用于标识遮蔽字符。S203: Train an image character recognition model according to the character identification and predicted characters corresponding to the training image, the image character recognition model includes a first model and a second model, and the character identification is used to identify masked characters.
字符标识与训练图像对应。字符标识用于标识训练图像中每个遮蔽字符区域包括的遮蔽字符。在一种可能的实现方式中,字符标识为遮蔽字符在预设字典中的字符序号。Character IDs correspond to training images. The character identification is used to identify the occluded characters included in each occluded character area in the training image. In a possible implementation manner, the character identification is the character serial number of the masked character in the preset dictionary.
需要说明的是,字符标识和训练图像中每个遮蔽字符区域所包括的遮蔽字符的数量对应。以上述图4所示的训练图像所示,训练图像对应的字符标识包括两个,分别标识“在”和“线”两个字符。以上述图5所示的训练图像所示,训练图像对应的字符标识为一个,标识“在线”分词,也就是“在”和“线”两个字符。It should be noted that the character identification corresponds to the number of masked characters included in each masked character area in the training image. As shown in the training image shown in FIG. 4 above, the training image corresponds to two character identifiers, which respectively identify the two characters "in" and "line". As shown in the training image shown in FIG. 5 above, the character identifier corresponding to the training image is one, which identifies the "online" participle, that is, the two characters "in" and "line".
字符标识用于标识正确的遮蔽字符。基于字符标识和预测字符,可以调整第一模型和第二模型中的模型参数,实现对第一模型和第二模型组成的图像字符识别模型的训练。Character IDs are used to identify the correct masked characters. Based on character identification and predicted characters, model parameters in the first model and the second model can be adjusted to realize the training of the image character recognition model composed of the first model and the second model.
在一种可能的实现方式中,在图像字符识别模型满足预设条件时,可以确定图像字符识别模型训练完成。预设条件可以是对图像字符识别模型进行训练的次数达到次数阈值,或者图像字符识别模型的预测字符的准确程度达到次数阈值。In a possible implementation manner, when the image character recognition model satisfies a preset condition, it may be determined that the image character recognition model has been trained. The preset condition may be that the number of times of training the image character recognition model reaches the number threshold, or the accuracy of the character prediction of the image character recognition model reaches the number threshold.
基于上述S201-S203的相关内容可知,通过将包括遮蔽字符区域的图像作为训练图像,可以使得图像字符识别模型在训练时学习上下文信息,提高训练得到的图像字符识别模型的准确程度。Based on the relevant content of S201-S203 above, it can be seen that by using the image including the masked character area as the training image, the image character recognition model can learn context information during training, and the accuracy of the trained image character recognition model can be improved.
在一种可能的实现方式中,第一模型包括编码器和视觉词嵌入层。视觉词嵌入层用于提取显示字符和遮蔽字符对应的字向量。将训练图像输入第一模型,得到第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,包括以下两个步骤:In a possible implementation, the first model includes an encoder and a visual word embedding layer. The visual word embedding layer is used to extract word vectors corresponding to revealed characters and masked characters. Input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, including the following two steps:
A1:将显示字符区域和遮蔽字符区域输入视觉词嵌入层,得到显示字符对应的第一字向量和遮蔽字符对应的第二字向量。A1: Input the displayed character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character.
视觉词嵌入层用于将输入的字符区域中的字符转换为对应的字向量。将训练图像的显示字符区域和遮蔽字符区域输入视觉词嵌入层,能够得到视觉词嵌入层输出的显示字符对应的第一字向量和遮蔽字符对应的第二字向量。The visual word embedding layer is used to convert characters in the input character region into corresponding word vectors. By inputting the displayed character area and the masked character area of the training image into the visual word embedding layer, the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character output by the visual word embedding layer can be obtained.
在一种可能的实现方式中,第一模型还包括感兴趣区域对齐层。可以利用感兴趣区域对齐层对训练图像进行处理,得到显示字符区域和遮蔽字符区域。将训练图像输入感兴趣区域对齐层,得到感兴趣区域对齐层输出的显示字符区域和遮蔽字符区域。In a possible implementation manner, the first model further includes a region-of-interest alignment layer. The training image can be processed by using the ROI alignment layer to obtain the displayed character area and the masked character area. Input the training image into the ROI alignment layer, and obtain the displayed character area and the masked character area output by the ROI alignment layer.
参见图6所示,该图为本申请实施例提供的一种训练图像字符识别模型的示意图。先将训练图像输入感兴趣区域对齐层,也就是图6中的“Rol Align”层中,对训练图像进行分割,得到训练图像中的显示字符区域和遮蔽字符区域。显示字符区域是“简”、“单”、“好”、“用”、“的”、“图”、“片”、“制”、“作”、“软”和“件”所在的字符区域。遮蔽字符区域为“在”和“线”所在的字符区域。Refer to FIG. 6 , which is a schematic diagram of a training image character recognition model provided by an embodiment of the present application. First input the training image into the region of interest alignment layer, that is, the "Rol Align" layer in Figure 6, and segment the training image to obtain the displayed character area and the masked character area in the training image. The display character area is the characters of "simplified", "single", "good", "used", "de", "picture", "piece", "made", "made", "soft" and "piece" area. The masked character area is the character area where "in" and "line" are located.
再将得到的显示字符区域和遮蔽字符区域输入视觉词嵌入层,也就是图6中的“Visual Token Embeddings”,得到各个显示字符对应的第一字向量和各个遮蔽字符对应的第二字向量。其中,各个显示字符对应的第一字向量分别为“v1”、“v2”、“v3”、“v4”、“v5”、“v6”、“v7”、“v8”、“v9”、“v10”和“v11”。各个遮蔽字符对应的第二字向量分别为“vmask1”和“vmask2”。Then input the obtained display character area and masked character area into the visual word embedding layer, that is, "Visual Token Embeddings" in Figure 6, to obtain the first word vector corresponding to each displayed character and the second word vector corresponding to each masked character. Among them, the first word vectors corresponding to each display character are "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", " v10" and "v11". The second word vectors corresponding to each masked character are "vmask1" and "vmask2" respectively.
A2:将第一字向量和第二字向量输入编码器,得到编码器输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量。A2: Input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the encoder.
将视觉词嵌入层输出的显示字符对应的第一字向量和遮蔽字符对应的第二字向量输入第一模型的编码器中。参见图6,将第一字向量和第二字向量输入编码器中,也就是“transformer encoder”中,得到编码器输出的显示字符对应的第一特征向量“h1”、“h2”、“h3”、“h4”、“h5”、“h6”、“h7”、“h8”、“h9”、“h10”和“h11”,以及遮蔽字符对应的第二特征向量“hmask1”和“hmask2”。The first word vector corresponding to the displayed character and the second word vector corresponding to the masked character output by the visual word embedding layer are input into the encoder of the first model. Referring to Figure 6, input the first word vector and the second word vector into the encoder, that is, "transformer encoder", and obtain the first feature vectors "h1", "h2", and "h3" corresponding to the display characters output by the encoder ", "h4", "h5", "h6", "h7", "h8", "h9", "h10" and "h11", and the second feature vectors "hmask1" and "hmask2" corresponding to the masked characters .
进一步的,再将第二特征向量输入第二模型中,也就是图6中的“linear classifier”中,得到对应的第一预测字符和第二预测字符。最后基于第一预测字符和对应的字符标识,以 及第二预测字符和对应的字符标识,可以对第二模型的模型参数以及第一模型的模型参数进行调整,实现对图像字符识别模型的训练。Further, the second feature vector is input into the second model, that is, the "linear classifier" in Fig. 6, to obtain the corresponding first predicted character and the second predicted character. Finally, based on the first predicted character and the corresponding character identifier, and the second predicted character and the corresponding character identifier, the model parameters of the second model and the model parameters of the first model can be adjusted to realize the training of the image character recognition model.
基于上述实施例提供的一种图像字符识别模型训练方法,本申请实施例还提供一种图像字符识别方法,包括:Based on the image character recognition model training method provided by the above-mentioned embodiments, the embodiment of the present application also provides an image character recognition method, including:
将待识别图像输入图像字符识别模型;Input the image to be recognized into the image character recognition model;
获取所述图像字符识别模型输出的识别结果。Obtain the recognition result output by the image character recognition model.
其中,待识别图像是需要进行字符识别的图像。待识别图像包括至少一个待识别字符。Wherein, the image to be recognized is an image requiring character recognition. The image to be recognized includes at least one character to be recognized.
图像字符识别模型是由上述图像字符识别模型训练方法训练得到的。图像字符识别模型由第一模型和第二模型组成。第一模型基于待识别图像中各个字符区域,提取生成各个字符对应的特征向量。第二模型基于字符对应的特征向量,生成识别结果。The image character recognition model is trained by the above-mentioned image character recognition model training method. The image character recognition model consists of a first model and a second model. The first model extracts and generates feature vectors corresponding to each character based on each character area in the image to be recognized. The second model generates a recognition result based on the feature vector corresponding to the character.
识别结果中包括图像字符识别模型对待识别图像进行识别后得到的待识别字符对应的识别字符。The recognition result includes the recognized character corresponding to the character to be recognized obtained after the image character recognition model recognizes the image to be recognized.
基于上述方法实施例提供的一种图像字符识别模型训练方法,本申请实施例还提供了一种图像字符识别模型训练装置,下面将结合附图对图像字符识别模型训练装置进行说明。Based on the method for training an image character recognition model provided by the above method embodiment, the embodiment of the present application also provides an image character recognition model training device. The image character recognition model training device will be described below with reference to the accompanying drawings.
参见图7所示,该图为本申请实施例提供的一种图像字符识别模型训练装置的结构示意图。如图7所示,该图像字符识别模型训练装置包括:Refer to FIG. 7 , which is a schematic structural diagram of an image character recognition model training device provided by an embodiment of the present application. As shown in Figure 7, this image character recognition model training device comprises:
第一输入单元701,用于将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;The first input unit 701 is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one display a character area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
第二输入单元702,用于将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的遮蔽字符的预测字符;The second input unit 702 is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
训练单元703,用于根据与所述训练图像对应的字符标识和所述遮蔽字符的预测字符,训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。A training unit 703, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model including the first model and the second model, The character identifier is used to identify the masked character.
在一种可能的实现方式中,所述遮蔽字符区域用于遮蔽遮蔽分词,所述遮蔽分词包括至少两个遮蔽字符,所述字符标识用于标记所述遮蔽分词。In a possible implementation manner, the masked character area is used for masking a masked participle, the masked participle includes at least two masked characters, and the character identifier is used to mark the masked participle.
在一种可能的实现方式中,所述第一模型由编码器构成。In a possible implementation manner, the first model is formed by an encoder.
在一种可能的实现方式中,所述第一模型还包括视觉词嵌入层,所述第一输入单元701,包括:In a possible implementation manner, the first model further includes a visual word embedding layer, and the first input unit 701 includes:
第一输入子单元,用于将显示字符区域和遮蔽字符区域输入所述视觉词嵌入层,得到显示字符对应的第一字向量和所述遮蔽字符对应的第二字向量;The first input subunit is used to input the display character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character;
第二输入子单元,用于将所述第一字向量和所述第二字向量输入所述编码器,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量。The second input subunit is configured to input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character output by the first model and the first feature vector corresponding to the masked character. Two eigenvectors.
在一种可能的实现方式中,所述第一模型还包括感兴趣区域对齐层,所述显示字符区域和所述遮蔽字符区域是将所述训练图像输入所述感兴趣区域对齐层得到的。In a possible implementation manner, the first model further includes an ROI alignment layer, and the displayed character area and the masked character area are obtained by inputting the training image into the ROI alignment layer.
在一种可能的实现方式中,所述字符标识为所述遮蔽字符在预设字典中的字符序号。In a possible implementation manner, the character identifier is a character sequence number of the masked character in a preset dictionary.
基于上述方法实施例提供的一种图像字符识别方法,本申请实施例还提供了一种图像 字符识别装置,下面将结合附图对图像字符识别装置进行说明。Based on the image character recognition method provided by the above-mentioned method embodiment, the embodiment of the present application also provides an image character recognition device. The image character recognition device will be described below in conjunction with the accompanying drawings.
参见图8所示,该图为本申请实施例提供的一种图像字符识别装置的结构示意图。如图8所示,该图像字符识别装置包括:Referring to FIG. 8 , this figure is a schematic structural diagram of an image character recognition device provided by an embodiment of the present application. As shown in Figure 8, the image character recognition device includes:
输入单元801,用于将待识别图像输入图像字符识别模型;The input unit 801 is used to input the image to be recognized into the image character recognition model;
识别单元802,用于获取所述图像字符识别模型输出的识别结果。The recognition unit 802 is configured to obtain a recognition result output by the image character recognition model.
其中,待识别图像是需要进行字符识别的图像。待识别图像包括至少一个待识别字符。Wherein, the image to be recognized is an image requiring character recognition. The image to be recognized includes at least one character to be recognized.
图像字符识别模型是由上述图像字符识别模型训练方法训练得到的。图像字符识别模型由第一模型和第二模型组成。第一模型基于待识别图像中各个字符区域,提取生成各个字符对应的特征向量。第二模型基于字符对应的特征向量,生成识别结果。The image character recognition model is trained by the above-mentioned image character recognition model training method. The image character recognition model consists of a first model and a second model. The first model extracts and generates feature vectors corresponding to each character based on each character area in the image to be recognized. The second model generates a recognition result based on the feature vector corresponding to the character.
识别结果中包括图像字符识别模型对待识别图像进行识别后得到的待识别字符对应的识别字符。The recognition result includes the recognized character corresponding to the character to be recognized obtained after the image character recognition model recognizes the image to be recognized.
基于上述方法实施例提供的一种图像字符识别模型的训练方法以及图像字符识别方法,本申请还提供一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述实施例所述的图像字符识别模型的训练方法或者图像字符识别方法。Based on the image character recognition model training method and the image character recognition method provided by the above method embodiments, the present application also provides an electronic device, including: one or more processors; a storage device, on which one or more A program, when the one or more programs are executed by the one or more processors, so that the one or more processors implement the image character recognition model training method or image character recognition as described in the above embodiment method.
下面参考图9,其示出了适于用来实现本申请实施例的电子设备900的结构示意图。本申请实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(Personal Digital Assistant,个人数字助理)、PAD(portable android device,平板电脑)、PMP(Portable Media Player,便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV(television,电视机)、台式计算机等等的固定终端。图9示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Referring now to FIG. 9 , it shows a schematic structural diagram of an electronic device 900 suitable for implementing the embodiment of the present application. The terminal equipment in the embodiment of the present application may include but not limited to mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, personal digital assistant), PAD (portable android device, tablet computer), PMP (Portable Media Player, portable multimedia player), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs (television, television sets), desktop computers, and the like. The electronic device shown in FIG. 9 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
如图9所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(ROM)902中的程序或者从存储装置908加载到随机访问存储器(RAM)903中的程序而执行各种适当的动作和处理。在RAM903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。As shown in FIG. 9, an electronic device 900 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 901, which may be randomly accessed according to a program stored in a read-only memory (ROM) 902 or loaded from a storage device 908. Various appropriate actions and processes are executed by programs in the memory (RAM) 903 . In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904 .
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置908;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图9示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。In general, the following devices can be connected to the I/O interface 905: input devices 908 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device 900 to perform wireless or wired communication with other devices to exchange data. While FIG. 9 shows electronic device 900 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM902被安装。在该计算机程序被处理装置901执行时,执行本 申请实施例的方法中限定的上述功能。In particular, according to the embodiments of the present application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 909 , or from storage means 908 , or from ROM 902 . When the computer program is executed by the processing device 901, the above-mentioned functions defined in the method of the embodiment of the present application are executed.
本申请实施例提供的电子设备与上述实施例提供的图像字符识别模型的训练方法以及图像字符识别方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。The electronic device provided in the embodiment of the present application and the image character recognition model training method and the image character recognition method provided in the above embodiment belong to the same inventive concept. For technical details not described in detail in this embodiment, please refer to the above embodiment, and this The embodiment has the same beneficial effect as the above-mentioned embodiment.
基于上述方法实施例提供的一种图像字符识别模型的训练方法和图像字符识别方法,本申请实施例提供了一种计算机存储介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如上述任一实施例所述的图像字符识别模型的训练方法或者图像字符识别方法。Based on the image character recognition model training method and image character recognition method provided by the above method embodiments, the embodiment of the present application provides a computer storage medium on which a computer program is stored, wherein the program is executed by a processor At the same time, the training method of the image character recognition model or the image character recognition method as described in any of the above-mentioned embodiments is implemented.
需要说明的是,本申请上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(Hyper Text Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some implementations, the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (Hyper Text Transfer Protocol, Hypertext Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述图像字符识别模型的训练方法或者图像字符识别方法。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the above-mentioned image character recognition model training method or image character recognition method.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网 (LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元/模块的名称在某种情况下并不构成对该单元本身的限定,例如,语音数据采集模块还可以被描述为“数据采集模块”。The units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware. Wherein, the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances, for example, the voice data collection module can also be described as a "data collection module".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本申请的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本申请的一个或多个实施例,【示例一】提供一种图像字符识别模型训练方法,所述方法包括:According to one or more embodiments of the present application, [Example 1] provides a method for training an image character recognition model, the method comprising:
将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;
将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的遮蔽字符的预测字符;Inputting the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
根据与所述训练图像对应的字符标识和所述遮蔽字符的预测字符,训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。According to the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.
根据本申请的一个或多个实施例,【示例二】提供一种图像字符识别模型训练方法,所 述遮蔽字符区域用于遮蔽遮蔽分词,所述遮蔽分词包括至少两个遮蔽字符,所述字符标识用于标记所述遮蔽分词。According to one or more embodiments of the present application, [Example 2] provides an image character recognition model training method, the masked character area is used for masked word segmentation, the masked word includes at least two masked characters, the character ID is used to mark the masked participle.
根据本申请的一个或多个实施例,【示例三】提供一种图像字符识别模型训练方法,所述第一模型由编码器构成。According to one or more embodiments of the present application, [Example 3] provides a method for training an image character recognition model, the first model is composed of an encoder.
根据本申请的一个或多个实施例,【示例四】提供一种图像字符识别模型训练方法,所述第一模型还包括视觉词嵌入层,所述将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,包括:According to one or more embodiments of the present application, [Example 4] provides a method for training an image character recognition model, the first model further includes a visual word embedding layer, and the training image is input into the first model to obtain the The first feature vector corresponding to the displayed character output by the first model and the second feature vector corresponding to the masked character include:
将显示字符区域和遮蔽字符区域输入所述视觉词嵌入层,得到显示字符对应的第一字向量和所述遮蔽字符对应的第二字向量;Inputting the display character region and the shielding character region into the visual word embedding layer to obtain the first word vector corresponding to the display character and the second word vector corresponding to the shielding character;
将所述第一字向量和所述第二字向量输入所述编码器,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量。Inputting the first word vector and the second word vector into the encoder to obtain a first feature vector corresponding to a displayed character and a second feature vector corresponding to a masked character output by the first model.
根据本申请的一个或多个实施例,【示例五】提供一种图像字符识别模型训练方法,所述第一模型还包括感兴趣区域对齐层,所述显示字符区域和所述遮蔽字符区域是将所述训练图像输入所述感兴趣区域对齐层得到的。According to one or more embodiments of the present application, [Example 5] provides a method for training an image character recognition model, the first model further includes a region-of-interest alignment layer, and the displayed character region and the masked character region are obtained by inputting the training image into the ROI alignment layer.
根据本申请的一个或多个实施例,【示例六】提供一种图像字符识别模型训练方法,所述字符标识为所述遮蔽字符在预设字典中的字符序号。According to one or more embodiments of the present application, [Example 6] provides a method for training an image character recognition model, where the character identifier is the character sequence number of the masked character in the preset dictionary.
根据本申请的一个或多个实施例,【示例七】提供一种图像字符识别方法,所述方法包括:According to one or more embodiments of the present application, [Example 7] provides an image character recognition method, the method comprising:
将待识别图像输入图像字符识别模型,得到所述图像字符识别模型输出的识别结果,所述待识别图像包括至少一个待识别字符,所述图像字符识别模型是基于上述【示例一】-【示例六】任一项所述的图像字符识别模型的训练方法训练得到的。Input the image to be recognized into the image character recognition model to obtain the recognition result output by the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on the above [Example 1]-[Example Six] obtained by the training method of the image character recognition model described in any one.
根据本申请的一个或多个实施例,【示例八】提供一种图像字符识别模型训练装置,所述装置包括:According to one or more embodiments of the present application, [Example 8] provides an image character recognition model training device, the device comprising:
第一输入单元,用于将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;The first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
第二输入单元,用于将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的遮蔽字符的预测字符;The second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
训练单元,用于根据与所述训练图像对应的字符标识和所述遮蔽字符的预测字符,训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。A training unit, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character of the masked character, the image character recognition model includes the first model and the second model, so The character identifier is used to identify the masked character.
根据本申请的一个或多个实施例,【示例九】提供一种图像字符识别模型训练装置,所述遮蔽字符区域用于遮蔽遮蔽分词,所述遮蔽分词包括至少两个遮蔽字符,所述字符标识用于标记所述遮蔽分词。According to one or more embodiments of the present application, [Example 9] provides an image character recognition model training device, the masked character area is used for masked word segmentation, the masked word includes at least two masked characters, the character ID is used to mark the masked participle.
根据本申请的一个或多个实施例,【示例十】提供一种图像字符识别模型训练装置,所述第一模型由编码器构成。According to one or more embodiments of the present application, [Example 10] provides an image character recognition model training device, the first model is composed of an encoder.
根据本申请的一个或多个实施例,【示例十一】提供一种图像字符识别模型训练装置, 所述第一模型还包括视觉词嵌入层,所述第一输入单元,包括:According to one or more embodiments of the present application, [Example Eleven] provides an image character recognition model training device, the first model also includes a visual word embedding layer, and the first input unit includes:
第一输入子单元,用于将显示字符区域和遮蔽字符区域输入所述视觉词嵌入层,得到显示字符对应的第一字向量和所述遮蔽字符对应的第二字向量;The first input subunit is used to input the display character area and the masked character area into the visual word embedding layer to obtain the first word vector corresponding to the displayed character and the second word vector corresponding to the masked character;
第二输入子单元,用于将所述第一字向量和所述第二字向量输入所述编码器,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量。The second input subunit is configured to input the first word vector and the second word vector into the encoder to obtain the first feature vector corresponding to the displayed character output by the first model and the first feature vector corresponding to the masked character. Two eigenvectors.
根据本申请的一个或多个实施例,【示例十二】提供一种图像字符识别模型训练装置,所述第一模型还包括感兴趣区域对齐层,所述显示字符区域和所述遮蔽字符区域是将所述训练图像输入所述感兴趣区域对齐层得到的。According to one or more embodiments of the present application, [Example 12] provides an image character recognition model training device, the first model further includes a region of interest alignment layer, the display character region and the masked character region is obtained by inputting the training image into the ROI alignment layer.
根据本申请的一个或多个实施例,【示例十三】提供一种图像字符识别模型训练装置,所述字符标识为所述遮蔽字符在预设字典中的字符序号。According to one or more embodiments of the present application, [Example 13] provides an image character recognition model training device, where the character identifier is the character sequence number of the masked character in the preset dictionary.
根据本申请的一个或多个实施例,【示例十四】提供一种图像字符识别装置,所述装置包括:According to one or more embodiments of the present application, [Example Fourteen] provides an image character recognition device, the device comprising:
输入单元,用于将待识别图像输入图像字符识别模型,所述待识别图像包括至少一个待识别字符,所述图像字符识别模型是基于上述【示例一】-【示例六】任一项所述的图像字符识别模型的训练方法训练得到的;The input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on any one of the above-mentioned [Example 1]-[Example 6] The training method training of the image character recognition model obtains;
识别单元,用于获取所述图像字符识别模型输出的识别结果。The recognition unit is configured to obtain the recognition result output by the image character recognition model.
根据本申请的一个或多个实施例,【示例十五】提供一种电子设备,包括:According to one or more embodiments of the present application, [Example 15] provides an electronic device, including:
一个或多个处理器;one or more processors;
存储装置,其上存储有一个或多个程序,a storage device on which one or more programs are stored,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如【示例一】-【示例七】中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method described in any one of [Example 1]-[Example 7].
根据本申请的一个或多个实施例,【示例十六】提供一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如【示例一】-【示例七】中任一所述的方法。According to one or more embodiments of the present application, [Example 16] provides a computer-readable medium on which a computer program is stored, wherein, when the program is executed by a processor, the implementation of [Example 1]-[Example Seven] the method described in any one.
需要说明的是,本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统或装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system or device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant details, please refer to the description of the method part.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵 盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this article, relational terms such as first and second etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations Any such actual relationship or order exists between. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

  1. 一种图像字符识别模型训练方法,所述方法包括:A method for training an image character recognition model, the method comprising:
    将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;Inputting the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, the training image includes at least one displayed character area and at least one masked character area , the display character area includes at least one of the display characters, and the cover character area is used to cover at least one of the cover characters;
    将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的所述遮蔽字符的预测字符;Inputting the second feature vector of the masked character into a second model to obtain the predicted character of the masked character output by the second model;
    根据与所述训练图像对应的字符标识和所述遮蔽字符的预测字符,训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。According to the character identification corresponding to the training image and the predicted character of the masked character, train an image character recognition model, the image character recognition model includes the first model and the second model, and the character identification is used for Identify the masked character.
  2. 根据权利要求1所述的方法,其中,所述遮蔽字符区域用于对遮蔽分词进行遮蔽,所述遮蔽分词包括至少两个遮蔽字符,所述字符标识用于标记所述遮蔽分词。The method according to claim 1, wherein the masked character area is used to mask a masked participle, the masked participle includes at least two masked characters, and the character identifier is used to mark the masked participle.
  3. 根据权利要求1所述的方法,其中,所述第一模型由编码器构成。The method of claim 1, wherein the first model is formed by an encoder.
  4. 根据权利要求3所述的方法,其中,所述第一模型还包括视觉词嵌入层,所述将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,包括:The method according to claim 3, wherein the first model further includes a visual word embedding layer, and the training image is input into the first model to obtain the first feature vector and the corresponding first feature vector of the displayed characters output by the first model The second feature vector corresponding to the masked character includes:
    将显示字符区域和遮蔽字符区域输入所述视觉词嵌入层,得到所述显示字符对应的第一字向量和所述遮蔽字符对应的第二字向量;Input the display character region and the shielding character region into the visual word embedding layer to obtain the first word vector corresponding to the display character and the second word vector corresponding to the shielding character;
    将所述第一字向量和所述第二字向量输入所述编码器,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量。Inputting the first word vector and the second word vector into the encoder to obtain a first feature vector corresponding to a displayed character and a second feature vector corresponding to a masked character output by the first model.
  5. 根据权利要求4所述的方法,其中,所述第一模型还包括感兴趣区域对齐层,所述显示字符区域和所述遮蔽字符区域是将所述训练图像输入所述感兴趣区域对齐层得到的。The method according to claim 4, wherein the first model further comprises a region-of-interest alignment layer, and the displayed character region and the masked character region are obtained by inputting the training image into the region-of-interest alignment layer of.
  6. 根据权利要求1-5任一项所述的方法,其中,所述字符标识为所述遮蔽字符在预设字典中的字符序号。The method according to any one of claims 1-5, wherein the character identifier is a character sequence number of the masked character in a preset dictionary.
  7. 一种图像字符识别方法,所述方法包括:A method for image character recognition, said method comprising:
    将待识别图像输入图像字符识别模型,所述待识别图像包括至少一个待识别字符,所述图像字符识别模型是基于上述权利要求1-6任一项所述的图像字符识别模型的训练方法训练得到的;The image to be recognized is input into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is trained based on the training method of the image character recognition model described in any one of claims 1-6 owned;
    获取所述图像字符识别模型输出的识别结果。Obtain the recognition result output by the image character recognition model.
  8. 一种图像字符识别模型训练装置,所述装置包括:An image character recognition model training device, said device comprising:
    第一输入单元,用于将训练图像输入第一模型,得到所述第一模型输出的显示字符对应的第一特征向量和遮蔽字符对应的第二特征向量,所述训练图像包括至少一个显示字符区域和至少一个遮蔽字符区域,所述显示字符区域包括至少一个所述显示字符,所述遮蔽字符区域用于遮蔽至少一个所述遮蔽字符;The first input unit is configured to input the training image into the first model to obtain the first feature vector corresponding to the displayed character and the second feature vector corresponding to the masked character output by the first model, and the training image includes at least one displayed character an area and at least one masked character area, the displayed character area includes at least one of the displayed characters, and the masked character area is used to cover at least one of the masked characters;
    第二输入单元,用于将所述遮蔽字符的第二特征向量输入第二模型,得到所述第二模型输出的遮蔽字符的预测字符;The second input unit is configured to input the second feature vector of the masked character into the second model to obtain the predicted character of the masked character output by the second model;
    训练单元,用于根据与所述训练图像对应的字符标识和所述遮蔽字符区域的预测字符, 训练图像字符识别模型,所述图像字符识别模型包括所述第一模型和所述第二模型,所述字符标识用于标识所述遮蔽字符。a training unit, configured to train an image character recognition model according to the character identifier corresponding to the training image and the predicted character in the masked character area, the image character recognition model including the first model and the second model, The character identifier is used to identify the masked character.
  9. 一种图像字符识别装置,所述装置包括:An image character recognition device, said device comprising:
    输入单元,用于将待识别图像输入图像字符识别模型,所述待识别图像包括至少一个待识别字符,所述图像字符识别模型是基于上述权利要求1-6任一项所述的图像字符识别模型的训练方法训练得到的;The input unit is used to input the image to be recognized into the image character recognition model, the image to be recognized includes at least one character to be recognized, and the image character recognition model is based on the image character recognition described in any one of claims 1-6. Obtained by the training method of the model;
    识别单元,用于获取所述图像字符识别模型输出的识别结果。The recognition unit is configured to obtain the recognition result output by the image character recognition model.
  10. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储装置,其上存储有一个或多个程序,a storage device on which one or more programs are stored,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一项所述的方法,或者实现上述权利要求7所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-6, or implement the above claim 7 the method described.
  11. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-6中任一项所述的方法,或者实现上述权利要求7所述的方法。A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-6 is realized, or the method according to claim 7 above is realized .
  12. 一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述权利要求1-6中任一项所述的方法,或者实现上述权利要求7所述的方法。A computer program product, the computer program product comprising a computer program/instruction, when the computer program/instruction is executed by a processor, the method according to any one of the above claims 1-6 is realized, or the above claim 7 is realized the method described.
PCT/CN2022/125575 2021-11-25 2022-10-17 Image character recognition model training method, and image character recognition method and apparatus WO2023093361A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111415332.2A CN114049632A (en) 2021-11-25 2021-11-25 Image character recognition model training method, image character recognition method and device
CN202111415332.2 2021-11-25

Publications (1)

Publication Number Publication Date
WO2023093361A1 true WO2023093361A1 (en) 2023-06-01

Family

ID=80210959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125575 WO2023093361A1 (en) 2021-11-25 2022-10-17 Image character recognition model training method, and image character recognition method and apparatus

Country Status (2)

Country Link
CN (1) CN114049632A (en)
WO (1) WO2023093361A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049632A (en) * 2021-11-25 2022-02-15 北京有竹居网络技术有限公司 Image character recognition model training method, image character recognition method and device
CN116363663A (en) * 2023-04-03 2023-06-30 北京百度网讯科技有限公司 Image processing method, image recognition method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445569B1 (en) * 2016-08-30 2019-10-15 A9.Com, Inc. Combination of heterogeneous recognizer for image-based character recognition
CN110674876A (en) * 2019-09-25 2020-01-10 北京猎户星空科技有限公司 Character detection method and device, electronic equipment and computer readable medium
CN110941945A (en) * 2019-12-02 2020-03-31 百度在线网络技术(北京)有限公司 Language model pre-training method and device
CN111639598A (en) * 2020-05-29 2020-09-08 济南博观智能科技有限公司 License plate recognition method, license plate recognition device, license plate recognition equipment and storage medium
CN112801085A (en) * 2021-02-09 2021-05-14 沈阳麟龙科技股份有限公司 Method, device, medium and electronic equipment for recognizing characters in image
CN114049632A (en) * 2021-11-25 2022-02-15 北京有竹居网络技术有限公司 Image character recognition model training method, image character recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445569B1 (en) * 2016-08-30 2019-10-15 A9.Com, Inc. Combination of heterogeneous recognizer for image-based character recognition
CN110674876A (en) * 2019-09-25 2020-01-10 北京猎户星空科技有限公司 Character detection method and device, electronic equipment and computer readable medium
CN110941945A (en) * 2019-12-02 2020-03-31 百度在线网络技术(北京)有限公司 Language model pre-training method and device
CN111639598A (en) * 2020-05-29 2020-09-08 济南博观智能科技有限公司 License plate recognition method, license plate recognition device, license plate recognition equipment and storage medium
CN112801085A (en) * 2021-02-09 2021-05-14 沈阳麟龙科技股份有限公司 Method, device, medium and electronic equipment for recognizing characters in image
CN114049632A (en) * 2021-11-25 2022-02-15 北京有竹居网络技术有限公司 Image character recognition model training method, image character recognition method and device

Also Published As

Publication number Publication date
CN114049632A (en) 2022-02-15

Similar Documents

Publication Publication Date Title
WO2023093361A1 (en) Image character recognition model training method, and image character recognition method and apparatus
WO2023273985A1 (en) Method and apparatus for training speech recognition model and device
CN110826567B (en) Optical character recognition method, device, equipment and storage medium
JP2023547917A (en) Image segmentation method, device, equipment and storage medium
CN111046677B (en) Method, device, equipment and storage medium for obtaining translation model
WO2023083142A1 (en) Sentence segmentation method and apparatus, storage medium, and electronic device
WO2022252881A1 (en) Image processing method and apparatus, and readable medium and electronic device
CN112364829B (en) Face recognition method, device, equipment and storage medium
CN112364860A (en) Training method and device of character recognition model and electronic equipment
WO2023138314A1 (en) Object attribute recognition method and apparatus, readable storage medium, and electronic device
CN113313064A (en) Character recognition method and device, readable medium and electronic equipment
WO2021088790A1 (en) Display style adjustment method and apparatus for target device
CN112883968B (en) Image character recognition method, device, medium and electronic equipment
WO2023143016A1 (en) Feature extraction model generation method and apparatus, and image feature extraction method and apparatus
WO2023142914A1 (en) Date recognition method and apparatus, readable medium and electronic device
WO2023072015A1 (en) Method and apparatus for generating character style image, device, and storage medium
WO2022116819A1 (en) Model training method and apparatus, machine translation method and apparatus, and device and storage medium
CN115270717A (en) Method, device, equipment and medium for detecting vertical position
CN114445813A (en) Character recognition method, device, equipment and medium
WO2023143107A1 (en) Character recognition method and apparatus, device, and medium
WO2023134433A1 (en) Font generation method and apparatus, and device
WO2023011397A1 (en) Method for generating acoustic features, training speech models and speech recognition, and device
WO2023174075A1 (en) Training method and apparatus for content detection model, and content detection method and apparatus
WO2023000782A1 (en) Method and apparatus for acquiring video hotspot, readable medium, and electronic device
WO2023138361A1 (en) Image processing method and apparatus, and readable storage medium and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897444

Country of ref document: EP

Kind code of ref document: A1