WO2023143107A1

WO2023143107A1 - Character recognition method and apparatus, device, and medium

Info

Publication number: WO2023143107A1
Application number: PCT/CN2023/072001
Authority: WO
Inventors: 毛晓飞; 黄灿
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2022-01-30
Filing date: 2023-01-13
Publication date: 2023-08-03
Also published as: CN114445812A

Abstract

The present application discloses a character recognition method, which is implemented using a pre-trained character recognition network model during character recognition. The character recognition network model is generated by means of training using a first sample image and multiple sub-sample images corresponding to the first sample image. The height of each sub-sample image in the multiple sub-sample images is the same as the height of the first sample image, the width of each sub-sample image is identical, and the width of each sub-sample image is smaller than the width of the first sample image. In the present application, manual annotation does not need to be carried out on the first sample image, and the character recognition network model is trained by means of aligning partial features (the multiple sub-sample images) and an overall feature (the first sample image), thereby reducing annotation costs and improving training efficiency. In practical use, by inputting a text image to the character recognition network model, features of text information in the image can be completely extracted, and recognition carried out according to the features to obtain an output result.

Description

A character recognition method, device, equipment and medium

This application claims the priority of the Chinese patent application with the application number 202210114334.6 and the application title "A Character Recognition Method, Device, Equipment, and Medium" filed on January 30, 2022, the entire contents of which are incorporated in this disclosure by reference middle.

technical field

The present application relates to the field of computer technology, in particular to a character recognition method, device, equipment and medium.

Background technique

Optical Character Recognition (OCR) refers to the technology of analyzing and recognizing image files containing text data to obtain text, which is an important aspect in the field of automatic recognition technology research and application.

Usually, the OCR recognition model is generated through a supervised training method. During the training process, it is necessary to collect sample data that has been manually marked, and then use the sample data for training. In order to improve the recognition accuracy of the OCR recognition model, when collecting a large amount of sample data, it needs to consume a lot of manpower for manual labeling, which increases the training cost.

Contents of the invention

In view of this, the embodiments of the present application provide a character recognition method, device, device, and medium, so as to implement model training using unlabeled sample data and reduce training costs.

In order to achieve the above purpose, the technical solutions provided by the embodiments of the present application are as follows:

In the first aspect of the embodiment of the present application, a character recognition method is provided, the method comprising:

Acquiring a text image to be processed, the text image to be processed includes text information to be recognized;

Inputting the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized;

Wherein, the character recognition network model is generated by using training samples, the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images The height of each subsample image is the same as the height of the first sample image, the width of each subsample image in the plurality of subsample images is the same, and the width of each subsample image is smaller than that of the first sample image Width, the first sample image includes text information.

In the second aspect of the embodiment of the present application, a character recognition device is provided, and the device includes:

an acquisition unit, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;

A processing unit, configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples The training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.

In the third aspect of the embodiment of the present application, an electronic device is provided, and the device includes: a processor and a memory;

said memory for storing instructions or computer programs;

The processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method described in the first aspect.

In the fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, and when the instructions are run on a device, the device executes the method described in the first aspect. The character recognition method described above.

In the fifth aspect of the embodiment of the present application, a computer program product is provided, which causes the computer to execute the character recognition method described in the first aspect when the computer program product is run on a computer.

It can be seen that the embodiment of the present application has the following beneficial effects:

In the embodiment of the present application, character recognition is realized by using a pre-trained character recognition network model. The character recognition network model is generated by using the first sample image and multiple sub-sample images corresponding to the first sample image. of. Wherein, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same and the width of each sub-sample image is smaller than the width of the first sample image . That is, when the present application trains the character recognition network model, it is not necessary to manually label the first sample image, but to train by aligning local features (multiple sub-sample images) and overall features (first sample image) Character recognition network model reduces labeling costs and improves training efficiency. In actual use, input the text image to be processed into the character recognition network model, so that the character recognition network model can completely extract the features of the text information in the image, and perform recognition based on the extracted features to obtain the output result.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

Fig. 1 is the flowchart of a kind of character recognition method that the embodiment of the present application provides;

Fig. 2 is the flowchart of a kind of training character recognition network model that the embodiment of the present application provides;

FIG. 3 is a schematic diagram of a division operation provided by an embodiment of the present application;

Fig. 4 is the flowchart of another kind of training character recognition network model that the embodiment of the present application provides;

FIG. 5 is a structural diagram of a character recognition device provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

OCR refers to the technology of analyzing and recognizing image files containing text data to obtain text. Usually, the OCR recognition model is generated by a supervised training method. During the training process, it is necessary to collect sample data that has been manually marked. , and then use the sample data for training. In order to improve the recognition accuracy of the OCR recognition model, when a large amount of sample data is collected, it needs to consume a lot of manpower for manual labeling, which increases the training cost.

Based on this, the embodiment of the present application provides a character recognition method, which is realized by using a pre-trained character recognition network model. The character recognition network model is based on the first sample image and the multiple data corresponding to the first sample image. stature Sample images generated for training. Wherein, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same and the width of each sub-sample image is smaller than the width of the first sample image . That is, when the present application trains the character recognition network model, it is not necessary to manually label the first sample image, but to train by aligning local features (multiple sub-sample images) and overall features (first sample image) Character recognition network model reduces labeling costs and improves training efficiency. In actual use, input the text image to be processed into the character recognition network model, so that the character recognition network model can completely extract the features of the text information in the image, and perform recognition based on the extracted features to obtain the output result.

The character recognition method provided by the embodiment of the present application will be described below with reference to the accompanying drawings. Referring to FIG. 1 , FIG. 1 is a flowchart of a character recognition method provided by an embodiment of the present application.

The method specifically includes the following steps:

S101: Acquire a text image to be processed, where the text image to be processed includes text information to be recognized.

Wherein, the text information to be recognized may be characters to be recognized, including Chinese characters, English words, English letters, numbers and symbols.

S102: Input the text image to be processed into the character recognition network model to obtain an output result, the output result including text information to be recognized.

In this embodiment, after the text image to be processed is obtained, in order to obtain the text information to be recognized included in the text image to be processed, the text image to be processed is input into the pre-trained character recognition network model to Through the processing of the character recognition network model, the text information to be recognized is output.

Wherein, the character recognition network model is generated by using training samples. The training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image. The height of each sub-sample image in the plurality of sub-sample images is the same as The height of a sample image is the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.

In the embodiment of the present application, when training the character recognition network model, it is not necessary to manually label the first sample image, but to train characters by aligning local features (multiple sub-sample images) and overall features (first sample image) Identifying network models can reduce labeling costs and improve training efficiency.

After the character recognition network model training is completed, the text image to be processed is input into the character recognition network model, and the character recognition network model can extract the features of the text image, and recognize them according to the extracted features, so as to output the text image. text information.

It can be seen that the trained character recognition network model can extract the complete features of the text information in the image, and perform text recognition based on the extracted complete features to obtain output results and improve the accuracy of recognition.

The process of training the character recognition network model of the present application will be described below with reference to the accompanying drawings. Referring to FIG. 2 , FIG. 2 is a flow chart of training a character recognition network model provided by an embodiment of the present application.

The method mainly includes the following steps:

S201: Acquire a first sample image and multiple sub-sample images corresponding to the first sample image.

Before training the character recognition network model, it is first necessary to obtain training samples for training the initial network model, the training samples include the first sample image and a plurality of sub-samples corresponding to the first sample image, wherein the first sample image and the plurality of sub-samples The sample image includes text information. Among them, the way to obtain the training samples can be to obtain the first A sample image, and then divide the first sample image into multiple sub-sample images, and use the first sample image and its corresponding multiple sub-sample images to form a training sample, that is, the sub-sample images are part of the first sample image. Wherein, the height of each sub-sample image is the same as that of the first sample image, the width of each sub-sample image is smaller than the width of the first sample image, and the width of each sub-sample image is the same.

In a possible implementation manner, the first sample image may be divided into multiple sub-sample images in the following manner, specifically: firstly determine a division parameter, and then use the division parameter to divide the first sample image multiple times, Multiple subsample images are thereby acquired. The ratio of the width of each sub-sample image to the width of the first sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1. That is, the first sample image is divided into multiple sub-sample images whose width is equal to the division parameter.

When implementing the division operation, for each division operation, the starting position of the division can be determined first in the first sample image, and the first sample image is divided according to the starting position of each division and the division parameters, so that Get multiple subsample images. Referring to FIG. 3, FIG. 3 is a schematic diagram of a division operation in an embodiment of the present application. In this scenario, the width of the first sample image is width, and the division parameter is K, that is, the width of each sub-sample image is K*width, Wherein, the value range of K is 75%-95%. For each division operation, determine the starting position of the division in the first sample image, and then divide the sub-sample image whose width is K*width. As shown in Figure 3, the first sample image is divided into sub-sample image 1, sub-sample image 2, ..., sub-sample image n, wherein, sub-sample image 1, sub-sample image 2, ..., sub-sample image The height of the sample image n is the same as that of the first sample image, and the width is K*width.

It should be noted that, the starting position of each dividing operation may be a fixed interval, or may be determined randomly, which is not limited in this embodiment of the present application.

S202: Input the first sample image and multiple sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, wherein the dimension of the first feature vector set is the same as the dimension of the second feature vector set The numbers are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of text information.

After acquiring the training samples including the first sample image and multiple sub-sample images obtained by dividing the first sample image, the initial network model can be trained. The first sample image is input into the initial network model, and the first feature vector set corresponding to the first sample image is obtained. Input the multiple sub-sample images obtained by the above division into the initial network model to obtain the second feature vector set, wherein the dimension of the first feature vector set is the same as the dimension of the second feature vector set, and the first feature vector set and The second set of feature vectors are all feature vectors of the text information in the sample image.

It should be noted that the order of inputting the first sample image and multiple sub-sample images corresponding to the first sample image into the initial network model is not limited, that is, the first feature corresponding to the first sample image can be obtained first The vector set may also first obtain the second feature vector set corresponding to the multiple sub-sample images.

S203: Determine a first loss amount according to the first feature vector set and the second feature vector set.

After the first feature vector set and the second feature vector set are acquired based on the initial network model, the first loss amount may be determined according to the first feature vector set and the second feature vector set.

In a possible implementation manner, a comparison loss function may be used to calculate the first loss amount between the first feature vector set and the second feature vector set. Loss functions are usually used to represent the degree of matching between samples, and can also train models for extracting features. Usually, two originally similar training samples, after feature extraction by the model, the two feature vectors obtained are still similar in the feature space; two originally dissimilar samples, after feature extraction by the model, The resulting two eigenvectors are still dissimilar in the feature space. When two similar samples undergo feature extraction, the amount of loss between the two feature vectors calculated using the loss function should also be small. The loss function can calculate the distance between two feature vectors, such as the Euclidean distance, to determine the amount of loss between the two feature vectors, so as to judge the quality of the initial network model model training.

S204: Adjust the parameters of the initial network model based on the first loss amount, re-input the first sample image and multiple sub-sample images into the initial network model and the subsequent training process until the first loss amount is less than the first threshold, and obtain Character recognition network model.

In this embodiment, the sub-sample image is obtained by dividing the first sample image, so multiple sub-sample images have a high degree of similarity with the first sample image. Therefore, after feature extraction by the initial network model , the obtained first feature vector set and the second feature vector set should have a high similarity, that is, the first loss amount determined based on the first feature vector set and the second feature vector set should be small. Therefore, when the first loss amount is greater than or equal to the first threshold, it indicates that the similarity between the first feature vector set and the second feature vector set obtained based on the initial network model does not meet the requirements, and the recognition after the initial network model training If the effect does not meet the requirements, it is necessary to adjust the parameters of the initial network model, and continue to use the first sample image and the corresponding sub-sample image to train the initial network model. That is, re-execute inputting the first sample image and multiple sub-sample images into the initial network model and the subsequent training process until the first loss determined based on the first feature vector set and the second feature vector set is less than the first threshold, Thus the final character recognition network model is obtained.

It should be noted that during the single training process of the initial network model, the value of the division parameter is fixed, that is, in the same training sample, the width of each sub-sample image is the same. When using the first sample image and multiple sub-sample images corresponding to the first sample image to train the initial network model again, the value of the division parameter can be different, that is, the width of the sub-sample images can be different between different training samples . For example, set the value range of the division parameter to 75%-95%, and set the value interval of the division parameter to 5%, that is, in different training samples, the possible values of the division parameter are 75%, 80%, and 85%. , 90%, and 95%. Using different training samples to train the initial network model can improve the accuracy of feature extraction by the character recognition network model, and facilitate subsequent text recognition using the trained character recognition network model.

In the actual application process, it may be due to insufficient samples for training the initial network model that the character recognition network model obtained after training is not accurate enough. The first loss in the interval is relatively large. The embodiment of the present application provides a possible implementation, by performing data enhancement processing on the first sample image to obtain the processed first sample image, input the first sample image after data enhancement processing into the initial network model, Get the first set of eigenvectors. Wherein, the data enhancement processing methods include: rotation, flip transformation, noise perturbation, etc., and this embodiment does not limit the specific manner of data enhancement.

In addition, the first sample image that has undergone data enhancement processing can also be divided into multiple sub-sample images, and the initial network model can be trained using the first sample image that has undergone data enhancement processing and the corresponding multiple sub-sample images, so as to improve character recognition. Accuracy of the network model.

In the embodiment of the present application, when training the character recognition network model, it is not necessary to manually label the first sample image, but to train characters by aligning local features (multiple sub-sample images) and overall features (first sample image) Recognize the network model, so that the character recognition network model can extract the complete features of the text information in the image, and then can use the complete text features for character recognition, which can not only reduce the cost of labeling, improve training efficiency, but also provide information for subsequent texts. Recognition lays the foundation and improves the accuracy of recognition.

The process of training the character recognition network provided by the above-mentioned embodiments mainly trains the encoder in the character recognition network model, that is, the text image to be processed is input into the character recognition network model, and the encoder first preprocesses the text image, Including digitization, geometric transformation, normalization, smoothing and other steps, then feature extraction is performed on the preprocessed text image, and the feature vector corresponding to the text image is obtained through the output of the fully connected layer. By training the encoder in the character recognition network model, the features in the text image extracted by the encoder can be made more accurate, and the feature vector corresponding to the obtained text image can be more accurate.

In order to further improve the accuracy of text information recognized by the character recognition network model, the embodiment of the present application also provides a preferred implementation method, that is, to train the decoder in the character recognition network model. The main function of the decoder is to output the encoder The eigenvectors are decoded to identify the text information corresponding to the eigenvectors. By training the function of the decoder in the character recognition network model, the accuracy of recognizing text information can be improved. The process of training a character recognition network model to recognize text information will be described below with reference to the accompanying drawings.

Referring to FIG. 4 , FIG. 4 is a flow chart of another training character recognition network model provided by the embodiment of the present application.

The method mainly includes the following steps:

S401: Acquire a second sample image and an annotation corresponding to the second sample image, where the second sample image includes text information, and the annotation is used to reflect the text information.

In this embodiment, in order to further train the decoder of the character recognition network model, the second sample image with annotations is obtained, and the annotation reflects the text information of the second sample image, which is used to identify with the character recognition network model. The text information of the second sample image obtained is compared, and then the character recognition network model is trained according to the comparison result.

S402: Input the second sample image into the character recognition network model to obtain a recognition result, where the recognition result includes text information.

The character recognition network model acquires text information corresponding to the second sample image by performing feature extraction and feature recognition on the second sample image.

S403: Determine a second loss amount based on the recognition result and the annotation of the second sample image.

After obtaining the recognition result output by the character recognition network model, the second loss amount is determined based on the annotation in the second sample image and the recognition result, wherein the second loss amount represents the difference between the text information of the second sample image and the character recognition network model recognized The difference between the text information.

S404: Adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image into the character recognition network model and the subsequent training process until the second loss amount is smaller than the second threshold.

When the second loss is large, it indicates that the text information in the second text image recognized by the character recognition network model is quite different from the marked text information, and it is necessary to adjust the parameters of the character recognition network model and retrain the character recognition network model, that is, re-execute inputting the second sample image into the character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.

By training the decoder in the character recognition network model, the accuracy of the character recognition network model to recognize text information based on features can be improved.

When the trained character recognition network model meets the requirements, the character recognition network model can be used for character recognition. Input the text image to be processed into the character recognition network model to obtain the output result, which includes the other text messages.

In this embodiment, since the character recognition network model has been preliminarily trained by the training method shown in Figure 2, the character recognition network model can realize feature extraction and basic recognition functions, when using the second sample image for further training , the training can be completed without obtaining a large number of labeled second sample images, which reduces the training cost and improves the recognition accuracy.

Based on the foregoing method embodiments, embodiments of the present application provide an apparatus and equipment for implementing the foregoing method, which will be described below with reference to the accompanying drawings.

Referring to FIG. 5, FIG. 5 is a structural diagram of a character recognition device provided by an embodiment of the present application. As shown in FIG. 5 , the apparatus 500 may include: an acquiring unit 501 and a processing unit 502 .

An acquisition unit 501, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;

The processing unit 502 is configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is trained using training samples Generated, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as that of the first sample image The heights of the sub-sample images are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.

In a specific implementation manner, the processing unit 502 is specifically configured to input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set, the second feature vector set set, the dimensions of the first feature vector set are the same as the dimensions of the second feature vector set, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are the text The feature vector of information; determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the The first sample image and the plurality of sub-sample images are respectively input into the initial network model and the subsequent training process, until the first loss amount is less than a first threshold, and the character recognition network model is obtained.

In a specific implementation manner, the processing unit 502 is specifically configured to determine a division parameter, and use the division parameter to divide the first sample image multiple times to obtain the multiple sub-sample images, so The ratio of the width of each sub-sample image in the plurality of sub-sample images to the width of the sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1.

In a specific implementation manner, the processing unit 502 is specifically configured to, for each division operation, determine the starting position of the division in the first sample image; according to the starting position and the division The parameter divides the first sample image to obtain the plurality of sub-sample images.

In a specific implementation manner, the value range of the division parameter is 75%-95%.

In a specific implementation manner, the acquiring unit 501 is further configured to acquire a second sample image and annotations corresponding to the second sample image, the second sample image includes text information, and the annotations are used to reflect said text message;

The processing unit 502 is further configured to input the second sample image into the character recognition network model to obtain A recognition result, the recognition result including the text information; determining a second loss amount based on the recognition result and the annotation of the second sample image, and modifying the parameters of the character recognition network model based on the second loss amount Making adjustments, re-executing the input of the second sample image into the character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.

In a specific implementation manner, the processing unit 502 is specifically configured to perform data enhancement processing on the first sample image to obtain the processed first sample image; This image is input into the initial network model to obtain the first feature vector set.

In a specific implementation manner, the processing unit 502 is specifically configured to use a comparison loss function to calculate a first loss amount between the first feature vector set and the second feature vector set.

It should be noted that, for implementation of each unit in this embodiment, reference may be made to relevant descriptions in the foregoing method embodiments, and details are not repeated in this embodiment.

Referring to FIG. 6 , it shows a schematic structural diagram of an electronic device 600 suitable for implementing the embodiment of the present application. The terminal equipment in the embodiment of the present application may include but not limited to mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, personal digital assistant), PAD (portable android device, tablet computer), PMP (Portable Media Player, portable multimedia player), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs (television, television sets), desktop computers, and the like. The electronic device shown in FIG. 6 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.

As shown in FIG. 6, an electronic device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be randomly accessed according to a program stored in a read-only memory (ROM) 602 or loaded from a storage device 608. Various appropriate actions and processes are executed by programs in the memory (RAM) 603 . In the RAM 603, various programs and data necessary for the operation of the electronic device 600 are also stored. The processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604 .

Typically, the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to the embodiments of the present application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609 , or from storage means 608 , or from ROM 602 . When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present application are executed.

The electronic device provided by the embodiment of the present application and the method for adding group members provided by the above-mentioned embodiment belong to the same inventive concept. The technical details not described in detail in this embodiment can be found in the above-mentioned embodiment, and this embodiment has the same features as the above-mentioned embodiment. beneficial effect.

An embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method described in any of the foregoing embodiments is implemented.

It should be noted that the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some implementations, the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (Hyper Text Transfer Protocol, Hypertext Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the above-mentioned method.

Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more Executable instructions for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware. Wherein, the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances, for example, the voice data collection module can also be described as a "data collection module".

The functions described above herein may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present application, a character recognition method is provided, which may include:

According to one or more embodiments of the present application, the training process of the character recognition network model includes:

Input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, the dimension of the first feature vector set and the second feature The dimensions of the vector sets are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information;

Determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the first sample image, the plurality of instances The image is respectively input into the initial network model and the subsequent training process until the first loss amount is less than the first threshold, and the character recognition network model is obtained.

According to one or more embodiments of the present application, the acquisition process of the plurality of sub-sample images includes:

determining the division parameters, and using the division parameters to divide the first sample image multiple times to obtain the multiple sub-sample images, the width of each sub-sample image in the multiple sub-sample images is the same as the width of the sample image The width ratio is equal to the division parameter, and the division parameter is greater than 0 and less than 1.

According to one or more embodiments of the present application, using the division parameters to divide the sample image multiple times to obtain the multiple sub-sample images includes:

For each division operation, determine the starting position of division in the first sample image;

Divide the first sample image according to the starting position and the division parameter to obtain the plurality of sub-sample images.

According to one or more embodiments of the present application, the value range of the division parameter is 75%-95%.

According to one or more embodiments of the present application, the method further includes:

Acquiring a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation is used to reflect the text information;

inputting the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information;

Determine the second loss amount based on the recognition result and the annotation of the second sample image, adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image The character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.

According to one or more embodiments of the present application, the inputting the first sample image into the initial network model to obtain the first feature vector set includes:

performing data enhancement processing on the first sample image to obtain a processed first sample image;

Inputting the processed first sample image into the initial network model to obtain the first feature vector set.

According to one or more embodiments of the present application, the determining the first loss amount according to the first feature vector set and the second feature vector set includes:

A first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrastive loss function.

According to one or more embodiments of the present application, a character recognition device is provided, which may include:

In one or more embodiments of the present application, the processing unit is specifically configured to convert the first sample image, the The plurality of sub-sample images are respectively input into the initial network model to obtain a first feature vector set and a second feature vector set, the dimension of the first feature vector set is the same as the dimension of the second feature vector set, and the second feature vector set The feature vectors in a feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information; determine a first loss amount according to the first feature vector set and the second feature vector set, and Adjust the parameters of the initial network model based on the first loss amount, and re-input the first sample image and the plurality of sub-sample images into the initial network model and the subsequent training process until the first loss amount If it is smaller than the first threshold, the character recognition network model is obtained.

In one or more embodiments of the present application, the processing unit is specifically configured to determine a division parameter, and use the division parameter to perform multiple divisions on the first sample image to obtain the plurality of sub-sample images , the ratio of the width of each sub-sample image in the plurality of sub-sample images to the width of the sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1.

In one or more embodiments of the present application, the processing unit is specifically configured to determine a starting position of division in the first sample image for each division operation; according to the starting position and the The first sample image is divided according to the division parameter to obtain the plurality of sub-sample images.

In one or more embodiments of the present application, the value range of the division parameter is 75%-95%.

In one or more embodiments of the present application, the acquiring unit is further configured to acquire a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation uses to reflect the text information;

The processing unit is further configured to input the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information; based on the recognition result and the second sample image The second loss amount is determined based on the annotation of the second loss amount, and the parameters of the character recognition network model are adjusted based on the second loss amount, and the input of the second sample image into the character recognition network model and the subsequent training process are re-executed until the second The amount of loss is less than the second threshold.

In one or more embodiments of the present application, the processing unit is specifically configured to perform data enhancement processing on the first sample image to obtain a processed first sample image; A sample image is input into the initial network model to obtain the first set of feature vectors.

In one or more embodiments of the present application, the processing unit is specifically configured to use a comparison loss function to calculate a first loss amount between the first feature vector set and the second feature vector set.

According to one or more embodiments of the present application, an electronic device is provided, and the device includes: a processor and a memory;

said memory for storing instructions or computer programs;

The processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method.

According to one or more embodiments of the present application, a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a device, the device is made to execute the character recognition method.

It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other. For example As for the disclosed system or device, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related part, please refer to the description of the method part.

It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.

It should also be noted that in this article, relational terms such as first and second etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations Any such actual relationship or order exists between. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A character recognition method, characterized in that the method comprises:

Acquiring a text image to be processed, the text image to be processed includes text information to be recognized;

Inputting the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized;

Wherein, the character recognition network model is generated by using training samples, the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images The height of each subsample image is the same as the height of the first sample image, the width of each subsample image in the plurality of subsample images is the same, and the width of each subsample image is smaller than that of the first sample image Width, the first sample image includes text information.
The method according to claim 1, wherein the training process of the character recognition network model comprises:

Input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, the dimension of the first feature vector set and the second feature The dimensions of the vector sets are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information;

Determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the first sample The image and the plurality of sub-sample images are respectively input into the initial network model and the subsequent training process until the first loss amount is less than a first threshold, and the character recognition network model is obtained.
The method according to claim 2, wherein the acquiring process of the plurality of sub-sample images comprises:

determining the division parameters, and using the division parameters to divide the first sample image multiple times to obtain the multiple sub-sample images, the width of each sub-sample image in the multiple sub-sample images is the same as the width of the sample image The width ratio is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
The method according to claim 3, wherein the step of dividing the sample image multiple times by using the division parameters to obtain the plurality of sub-sample images comprises:

For each division operation, determine the starting position of division in the first sample image;

Divide the first sample image according to the starting position and the division parameter to obtain the plurality of sub-sample images.
The method according to claim 3 or 4, characterized in that the value range of the division parameter is 75%-95%.
The method according to claim 2, further comprising:

Acquiring a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation is used to reflect the text information;

inputting the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information;

Determine the second loss amount based on the recognition result and the annotation of the second sample image, adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image character recognition Differentiate the network model and the subsequent training process until the second loss amount is less than the second threshold.
The method according to claim 2, wherein said inputting said first sample image into an initial network model to obtain a first set of feature vectors comprises:

performing data enhancement processing on the first sample image to obtain a processed first sample image;

Inputting the processed first sample image into the initial network model to obtain the first feature vector set.
The method according to claim 2, wherein the determining the first loss amount according to the first feature vector set and the second feature vector set comprises:

A first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrastive loss function.
A character recognition device, characterized in that the device comprises:

an acquisition unit, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;

A processing unit, configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples The training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
An electronic device, characterized in that the device includes: a processor and a memory;

said memory for storing instructions or computer programs;

The processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method according to any one of claims 1-8.
A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a device, the device executes the method described in any one of claims 1-8. character recognition method.
A computer program product, characterized in that, when the computer program product is run on a computer, the computer is made to execute the character recognition method according to any one of claims 1-8.