CN114445812A - Character recognition method, device, equipment and medium - Google Patents

Character recognition method, device, equipment and medium Download PDF

Info

Publication number
CN114445812A
CN114445812A CN202210114334.6A CN202210114334A CN114445812A CN 114445812 A CN114445812 A CN 114445812A CN 202210114334 A CN202210114334 A CN 202210114334A CN 114445812 A CN114445812 A CN 114445812A
Authority
CN
China
Prior art keywords
sample image
sample
sub
network model
character recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210114334.6A
Other languages
Chinese (zh)
Inventor
毛晓飞
黄灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202210114334.6A priority Critical patent/CN114445812A/en
Publication of CN114445812A publication Critical patent/CN114445812A/en
Priority to PCT/CN2023/072001 priority patent/WO2023143107A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The embodiment of the application discloses a character recognition method, which is realized by utilizing a pre-trained character recognition network model during character recognition, wherein the character recognition network model is generated by utilizing a first sample image and a plurality of sub-sample images corresponding to the first sample image through training. Wherein the height of each of the plurality of sub-sample images is the same as the height of the first sample image, the width of each of the sub-sample images is the same and the width of each of the sub-sample images is smaller than the width of the first sample image. According to the method and the device, manual labeling of the first sample image is not needed, the character recognition network model is trained in a mode of aligning local features (a plurality of sub-sample images) and overall features (the first sample image), labeling cost is reduced, and training efficiency is improved. When the method is actually used, the text image is input into the character recognition network model, namely the characteristics of the text information in the image can be completely extracted, and recognition is carried out according to the characteristics to obtain an output result.

Description

Character recognition method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a character recognition method, apparatus, device, and medium.
Background
Optical Character Recognition (OCR) is a technology for analyzing and recognizing an image file containing text data to obtain characters, and is an important aspect in the field of research and application of automatic Recognition technology.
Generally, an OCR recognition model is generated by a supervised training method, and in the training process, sample data that has been manually labeled needs to be collected and then trained by using the sample data. In order to improve the recognition accuracy of the OCR recognition model, when a large amount of sample data is collected, a large amount of manpower is consumed for manual marking, and the training cost is increased.
Disclosure of Invention
In view of this, embodiments of the present application provide a method, an apparatus, a device, and a medium for character recognition, so as to implement model training using unlabeled sample data, and reduce training cost.
In order to achieve the above purpose, the technical solutions provided in the embodiments of the present application are as follows:
in a first aspect of embodiments of the present application, a character recognition method is provided, where the method includes:
acquiring a text image to be processed, wherein the text image to be processed comprises text information to be identified;
inputting the text image to be processed into a character recognition network model to obtain an output result, wherein the output result comprises the text information to be recognized;
the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
In a second aspect of the embodiments of the present application, there is provided a character recognition apparatus, including:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring a text image to be processed, and the text image to be processed comprises text information to be recognized;
the processing unit is used for inputting the text image to be processed into a character recognition network model to obtain an output result, and the output result comprises the text information to be recognized; the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
In a third aspect of embodiments of the present application, there is provided an electronic device, including: a processor and a memory;
the memory for storing instructions or computer programs;
the processor is configured to execute the instructions or the computer program in the memory, so as to enable the electronic device to execute the character recognition method of the first aspect.
In a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored therein instructions that, when run on a device, cause the device to perform the character recognition method of the first aspect.
In a fifth aspect of embodiments of the present application, there is provided a computer program product, which, when run on a computer, causes the computer to execute the character recognition method of the first aspect.
Therefore, the embodiment of the application has the following beneficial effects:
in the embodiment of the application, when character recognition is performed, a character recognition network model trained in advance is used for realizing, and the character recognition network model is generated by training a first sample image and a plurality of sub-sample images corresponding to the first sample image. Wherein the height of each of the plurality of sub-sample images is the same as the height of the first sample image, the width of each of the plurality of sub-sample images is the same and the width of each of the plurality of sub-sample images is less than the width of the first sample image. That is, when the character recognition network model is trained, the first sample image does not need to be manually labeled, and the character recognition network model is trained in a mode of aligning local features (a plurality of sub-sample images) and overall features (the first sample image), so that the labeling cost is reduced, and the training efficiency is improved. In practical use, the text image to be processed is input into the character recognition network model, so that the character recognition network model can completely extract the characteristics of the text information in the image, and the character recognition network model carries out recognition according to the extracted characteristics to obtain an output result.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a character recognition method according to an embodiment of the present application;
FIG. 2 is a flowchart of training a character recognition network model according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a partitioning operation according to an embodiment of the present application;
FIG. 4 is a flow chart of another method for training a character recognition network model according to an embodiment of the present disclosure;
fig. 5 is a structural diagram of a character recognition apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
OCR refers to a technique of analyzing and recognizing an image file containing text data to obtain characters, and in general, an OCR recognition model is generated by a supervised training method, and in the training process, sample data that has been manually labeled needs to be collected and then trained by using the sample data. In order to improve the recognition accuracy of the OCR recognition model, when a large amount of sample data is collected, great manpower is consumed for manual marking, and the training cost is increased.
Based on this, the embodiment of the present application provides a character recognition method, which is implemented by using a pre-trained character recognition network model, where the character recognition network model is generated based on a first sample image and a plurality of sub-sample images corresponding to the first sample image. Wherein a height of each of the plurality of sub-sample images is the same as a height of the first sample image, a width of each of the plurality of sub-sample images is the same and the width of each of the sub-sample images is smaller than the width of the first sample image. That is, when the character recognition network model is trained, the first sample image does not need to be manually labeled, and the character recognition network model is trained in a mode of aligning local features (a plurality of sub-sample images) and overall features (the first sample image), so that the labeling cost is reduced, and the training efficiency is improved. In practical use, the text image to be processed is input into the character recognition network model, so that the character recognition network model can completely extract the characteristics of the text information in the image, and the character recognition network model carries out recognition according to the extracted characteristics to obtain an output result.
The character recognition method provided by the embodiment of the present application will be described below with reference to the drawings. Referring to fig. 1, fig. 1 is a flowchart of a character recognition method according to an embodiment of the present disclosure.
The method specifically comprises the following steps:
s101: and acquiring a text image to be processed, wherein the text image to be processed comprises text information to be identified.
The text information to be recognized can be characters to be recognized, including Chinese characters, English words, English letters, numbers, symbols and the like.
S102: and inputting the text image to be processed into a character recognition network model to obtain an output result, wherein the output result comprises text information to be recognized.
In this embodiment, after the text image to be processed is acquired, in order to acquire the text information to be recognized included in the text image to be processed, the text image to be processed is input into a pre-trained character recognition network model, so that the text information to be recognized is output through the processing of the character recognition network model.
The character recognition network model is generated by training with a training sample, the training sample comprises a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image comprises text information.
When the character recognition network model is trained, the first sample image does not need to be manually labeled, the character recognition network model is trained in a mode of aligning local features (a plurality of sub-sample images) and overall features (the first sample image), labeling cost can be reduced, and training efficiency is improved.
After the training of the character recognition network model is completed, the text image to be processed is input into the character recognition network model, namely, the character recognition network model extracts the characteristics of the text image and recognizes the text image according to the extracted characteristics, thereby outputting the text information in the text image.
Therefore, the trained character recognition network model can extract the complete features of the text information in the image, and performs text recognition according to the extracted complete features to obtain an output result and improve the recognition accuracy.
The process of training the character recognition network model according to the present application will be described with reference to the accompanying drawings. Referring to fig. 2, fig. 2 is a flowchart of training a character recognition network model according to an embodiment of the present disclosure.
The method mainly comprises the following steps:
s201: and acquiring the first sample image and a plurality of sub-sample images corresponding to the first sample image.
Before training the character recognition network model, firstly, a training sample for training an initial network model needs to be obtained, wherein the training sample comprises a first sample image and a plurality of sub-samples corresponding to the first sample image, and the first sample image and the plurality of sub-sample images comprise text information. The method for obtaining the training sample may be to obtain a first sample image without artificial labeling, then divide the first sample image into a plurality of sub-sample images, and form the training sample by using the first sample image and the plurality of sub-sample images corresponding to the first sample image, that is, the sub-sample image is a part of the first sample image. Wherein the height of each of the sub-sample images is the same as the height of the first sample image, the width of each of the sub-sample images is smaller than the width of the first sample image, and the width of each of the sub-sample images is the same.
In a possible implementation manner, the first sample image may be divided into a plurality of sub-sample images by: first, a dividing parameter is determined, and then the first sample image is divided for a plurality of times by using the dividing parameter, so that a plurality of sub-sample images are obtained. The ratio of the width of each sub-sample image to the width of the first sample image is equal to the partition parameter, which is greater than 0 and less than 1. I.e. the first sample image is divided into a plurality of sub-sample images having a width equal to the division parameter.
When the dividing operation is specifically implemented, for each dividing operation, a dividing start position may be determined in the first sample image, and the first sample image may be divided according to the dividing start position and the dividing parameter, so as to obtain a plurality of sub-sample images. Referring to fig. 3, fig. 3 is a schematic diagram of a partitioning operation in an embodiment of the present application, in the scene, a width of a first sample image is width, and a partitioning parameter is K, that is, a width of each sub-sample image is K × width, where a value of K is in a range of 75% to 95%. For each division operation, the start position of division is determined in the first sample image, and then the sub-sample image of width K × width is divided. As shown in fig. 3, the first sample image is divided into a sub-sample image 1, a sub-sample image 2, and a sub-sample image n, wherein the sub-sample image 1, the sub-sample image 2, the sub-sample image n has the same height as the first sample image and has a width of K × width.
It should be noted that the starting position of each dividing operation may be a fixed width at intervals, or may be randomly determined, which is not limited in the embodiment of the present application.
S202: and respectively inputting the first sample image and the plurality of sub-sample images into the initial network model to obtain a first feature vector set and a second feature vector set, wherein the dimension of the first feature vector set is the same as that of the second feature vector set, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of text information.
After a training sample including a first sample image and a plurality of sub-sample images obtained by dividing the first sample image is acquired, an initial network model may be trained. And inputting the first sample image into the initial network model to obtain a first feature vector set corresponding to the first sample image. And inputting the plurality of sub-sample images obtained by division into an initial network model to obtain a second feature vector set, wherein the dimension of the first feature vector set is the same as that of the second feature vector set, and the first feature vector set and the second feature vector set are feature vectors of text information in the sample images.
It should be noted that the order of inputting the first sample image and the plurality of sub-sample images corresponding to the first sample image into the initial network model is not limited, that is, the first feature vector set corresponding to the first sample image may be obtained first, or the second feature vector set corresponding to the plurality of sub-sample images may be obtained first.
S203: a first loss amount is determined according to the first feature vector set and the second feature vector set.
After obtaining the first set of feature vectors and the second set of feature vectors based on the initial network model, a first loss amount may be determined from the first set of feature vectors and the second set of feature vectors.
In one possible implementation, a first loss amount between the first set of feature vectors and the second set of feature vectors may be calculated using a contrast loss function. The loss function is typically used to represent the degree of match between samples, and the model for extracting features may also be trained. Under a common condition, after two originally similar training samples are subjected to feature extraction through a model, two obtained feature vectors are still similar in a feature space; after the two originally dissimilar samples are subjected to feature extraction by the model, the two obtained feature vectors are still dissimilar in feature space. When two similar samples are subjected to feature extraction, the loss amount between the two feature vectors calculated by using the loss function should also be small. The loss function can determine the loss between the two feature vectors by calculating the distance between the two feature vectors, such as the Euclidean distance, so as to judge the quality of the initial network model training.
S204: and adjusting parameters of the initial network model based on the first loss amount, re-executing the processes of inputting the first sample image and the plurality of sub-sample images into the initial network model and performing subsequent training respectively until the first loss amount is smaller than a first threshold value, and obtaining the character recognition network model.
In this embodiment, the sub-sample images are obtained by dividing the first sample image, so that the plurality of sub-sample images have a higher similarity with the first sample image, and therefore, after the feature extraction is performed on the initial network model, the obtained first feature vector set and the second feature vector set should have a higher similarity, that is, the first loss determined based on the first feature vector set and the second feature vector set should be smaller. Therefore, when the first loss is greater than or equal to the first threshold, it is indicated that the similarity between the first feature vector set and the second feature vector set obtained based on the initial network model does not meet the requirement, the recognition effect after the initial network model training does not meet the requirement, the parameters of the initial network model need to be adjusted, and the initial network model is trained by continuously using the first sample image and the corresponding sub-sample image. The first sample image and the plurality of sub-sample images are input into the initial network model and the subsequent training process is executed again until the first loss quantity determined based on the first feature vector set and the second feature vector set is smaller than the first threshold value, so that the final character recognition network model is obtained.
It should be noted that, in the process of training the initial network model once, the values of the partition parameters are fixed, that is, in the same training sample, the width of each sub-sample image is the same. When the first sample image and the plurality of sub-sample images corresponding to the first sample image are used again to train the initial network model, values of the division parameters may be different, that is, widths of the sub-sample images may be different between different training samples. For example, the range of values of the partition parameters is set to be 75% -95%, and the interval of values of the partition parameters is set to be 5%, that is, in different training samples, the possible values of the partition parameters are 75%, 80%, 85%, 90%, and 95%. The initial network model is trained by using different training samples, so that the accuracy of extracting features of the character recognition network model can be improved, and the character recognition by using the trained character recognition network model is convenient to perform character recognition subsequently.
In the actual application process, the training of the initial network model may be insufficient in sample, which may result in the training of the obtained character recognition network model being inaccurate enough, and it is shown that the first loss between the first feature vector set and the second feature vector set determined based on the character recognition network model is large. The embodiment of the application provides a possible implementation manner, a processed first sample image is obtained by performing data enhancement processing on the first sample image, and the first sample image subjected to the data enhancement processing is input into an initial network model to obtain a first feature vector set. The data enhancement processing method comprises the following steps: rotation, flipping transformation, noise disturbance, etc., and the embodiment does not limit the specific manner of data enhancement.
In addition, the first sample image subjected to the data enhancement processing can be divided into a plurality of sub-sample images, and the initial network model is trained by using the first sample image subjected to the data enhancement processing and the plurality of corresponding sub-sample images, so that the accuracy of the character recognition network model is improved.
When the character recognition network model is trained, the character recognition network model is trained in a mode of aligning local features (a plurality of sub-sample images) and overall features (the first sample images) without manually labeling the first sample images, so that the complete features of text information in the images can be extracted by the character recognition network model, further, the complete text features can be utilized for character recognition, the labeling cost can be reduced, the training efficiency is improved, a foundation can be laid for subsequent text recognition, and the recognition accuracy is improved.
The process of training the character recognition network provided in the above embodiment is mainly directed to training an encoder in the character recognition network model, that is, inputting a text image to be processed into the character recognition network model, where the encoder first performs preprocessing on the text image, including the steps of digitization, geometric transformation, normalization, smoothing, and the like, then performs feature extraction on the preprocessed text image, and outputs through a full connection layer to obtain a feature vector corresponding to the text image. By training the encoder in the character recognition network model, the features in the text image extracted by the encoder can be more accurate, and the obtained feature vector corresponding to the text image is more accurate.
In order to further improve the accuracy of recognizing the text information by the character recognition network model, the embodiment of the present application further provides a preferred implementation manner, that is, a decoder in the character recognition network model is trained, and the decoder mainly has a function of decoding the feature vector output by the encoder and recognizing the text information corresponding to the feature vector. By training the function of a decoder in the character recognition network model, the accuracy of text information recognition can be improved. The following describes a process of training a character recognition network model to recognize text information with reference to the accompanying drawings.
Referring to fig. 4, fig. 4 is a flowchart of another training character recognition network model provided in the embodiment of the present application.
The method mainly comprises the following steps:
s401: and acquiring a second sample image and an annotation corresponding to the second sample image, wherein the second sample image comprises text information, and the annotation is used for reflecting the text information.
In this embodiment, in order to further train the decoder of the character recognition network model, a second sample image with a label is obtained, where the label reflects text information of the second sample image, and is used for comparing the text information of the second sample image recognized by the character recognition network model, and then training the character recognition network model according to a comparison result.
S402: and inputting the second sample image into the character recognition network model to obtain a recognition result, wherein the recognition result comprises text information.
And the character recognition network model obtains the text information corresponding to the second sample image by performing feature extraction and feature recognition on the second sample image.
S403: a second amount of loss is determined based on the recognition result and the annotation of the second sample image.
After obtaining the recognition result output by the character recognition network model, determining a second loss amount based on the label in the second sample image and the recognition result, wherein the second loss amount represents the difference between the text information of the second sample image and the text information recognized by the character recognition network model.
S404: and adjusting parameters of the character recognition network model based on the second loss amount, and re-executing the character recognition network model input with the second sample image and the subsequent training process until the second loss amount is smaller than a second threshold value.
When the second loss is large, it indicates that the text information in the second text image recognized by the character recognition network model is different from the labeled text information greatly, and it is necessary to adjust parameters of the character recognition network model and retrain the character recognition network model, that is, input the second sample image into the character recognition network model and the subsequent training process are performed again until the second loss is smaller than the second threshold.
By training a decoder in the character recognition network model, the accuracy of the character recognition network model for recognizing the text information based on the characteristics can be improved.
When the trained character recognition network model meets the requirements, the character recognition network model can be used for carrying out character recognition. Namely, the text image to be processed is input into the character recognition network model, and an output result is obtained, wherein the output result comprises the text information to be recognized.
In this embodiment, since the character recognition network model has been preliminarily trained by the training method shown in fig. 2, the character recognition network model can realize the feature extraction and basic recognition functions, and when the second sample image is used for further training, the training can be completed without acquiring a large number of second sample images with labels, thereby reducing the training cost and improving the recognition accuracy.
Based on the above method embodiments, the embodiments of the present application provide an apparatus and a device for implementing the above method, which will be described below with reference to the accompanying drawings.
Referring to fig. 5, fig. 5 is a structural diagram of a character recognition apparatus according to an embodiment of the present application. As shown in fig. 5, the apparatus 500 may include: an acquisition unit 501 and a processing unit 502.
An obtaining unit 501, configured to obtain a text image to be processed, where the text image to be processed includes text information to be identified;
a processing unit 502, configured to input the text image to be processed into a character recognition network model, and obtain an output result, where the output result includes the text information to be recognized; the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
In a specific implementation manner, the processing unit 502 is specifically configured to input the first sample image and the plurality of sub-sample images into an initial network model respectively, and obtain a first feature vector set and a second feature vector set, where a dimension of the first feature vector set is the same as a dimension of the second feature vector set, and feature vectors in the first feature vector set and feature vectors in the second feature vector set are feature vectors of the text information; determining a first loss amount according to the first feature vector set and the second feature vector set, adjusting parameters of the initial network model based on the first loss amount, re-executing the initial network model input by the first sample image and the plurality of sub-sample images and the subsequent training process until the first loss amount is smaller than a first threshold value, and obtaining the character recognition network model.
In a specific implementation manner, the processing unit 502 is specifically configured to determine a partition parameter, and perform multiple partitions on the first sample image by using the partition parameter to obtain the multiple sub-sample images, where a ratio of a width of each of the multiple sub-sample images to a width of the sample image is equal to the partition parameter, and the partition parameter is greater than 0 and smaller than 1.
In a specific implementation, the processing unit 502 is specifically configured to determine, for each dividing operation, a start position of the division in the first sample image; and dividing the first sample image according to the starting position and the dividing parameter to obtain the plurality of sub-sample images.
In a specific implementation manner, the value range of the division parameter is 75% -95%.
In a specific implementation manner, the obtaining unit 501 is further configured to obtain a second sample image and an annotation corresponding to the second sample image, where the second sample image includes text information, and the annotation is used to reflect the text information;
the processing unit 502 is further configured to input the second sample image into the character recognition network model, and obtain a recognition result, where the recognition result includes the text information; and determining a second loss amount based on the recognition result and the label of the second sample image, adjusting the parameters of the character recognition network model based on the second loss amount, and re-executing the character recognition network model input with the second sample image and the subsequent training process until the second loss amount is less than a second threshold value.
In a specific implementation manner, the processing unit 502 is specifically configured to perform data enhancement processing on the first sample image to obtain a processed first sample image; and inputting the processed first sample image into the initial network model to obtain the first feature vector set.
In a specific implementation, the processing unit 502 is specifically configured to calculate a first loss amount between the first feature vector set and the second feature vector set by using a contrast loss function.
It should be noted that, for implementation of each unit in this embodiment, reference may be made to relevant descriptions in the foregoing method embodiments, and details of this embodiment are not described herein again.
Referring to fig. 6, a schematic diagram of an electronic device 600 suitable for implementing embodiments of the present application is shown. The terminal device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a PDA (Personal Digital Assistant), a PAD (Portable android device), a PMP (Portable multimedia Player), a car terminal (e.g., car navigation terminal), and the like, and a fixed terminal such as a Digital TV (television), a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present application.
The electronic device provided by the embodiment of the present application and the group member adding method provided by the above embodiment belong to the same inventive concept, and technical details that are not described in detail in the present embodiment can be referred to the above embodiment, and the present embodiment has the same beneficial effects as the above embodiment.
The embodiments of the present application provide a computer readable medium, on which a computer program is stored, where the program is executed by a processor to implement the method according to any one of the above embodiments.
It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (Hyper Text Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. Where the name of a unit/module does not in some cases constitute a limitation on the unit itself, for example, a voice data collection module may also be described as a "data collection module".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present application, there is provided a character recognition method, which may include:
acquiring a text image to be processed, wherein the text image to be processed comprises text information to be identified;
inputting the text image to be processed into a character recognition network model to obtain an output result, wherein the output result comprises the text information to be recognized;
the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
According to one or more embodiments of the present application, the training process of the character recognition network model includes:
respectively inputting the first sample image and the plurality of sub-sample images into an initial network model to obtain a first feature vector set and a second feature vector set, wherein the dimension of the first feature vector set is the same as that of the second feature vector set, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are the feature vectors of the text information;
determining a first loss amount according to the first feature vector set and the second feature vector set, adjusting parameters of the initial network model based on the first loss amount, re-executing the initial network model input by the first sample image and the plurality of sub-sample images and the subsequent training process until the first loss amount is smaller than a first threshold value, and obtaining the character recognition network model.
According to one or more embodiments of the present application, the acquiring of the plurality of sub-sample images comprises:
determining a dividing parameter, and dividing the first sample image for multiple times by using the dividing parameter to obtain the multiple sub-sample images, wherein the ratio of the width of each sub-sample image in the multiple sub-sample images to the width of the sample image is equal to the dividing parameter, and the dividing parameter is greater than 0 and less than 1.
According to one or more embodiments of the present application, the performing multiple divisions on the sample image by using the division parameter to obtain the multiple sub-sample images includes:
for each division operation, determining a start position of the division in the first sample image;
and dividing the first sample image according to the starting position and the dividing parameter to obtain the plurality of sub-sample images.
According to one or more embodiments of the present application, the value range of the partition parameter is 75% -95%.
According to one or more embodiments of the present application, the method further comprises:
acquiring a second sample image and a label corresponding to the second sample image, wherein the second sample image comprises text information, and the label is used for reflecting the text information;
inputting the second sample image into the character recognition network model to obtain a recognition result, wherein the recognition result comprises the text information;
and determining a second loss amount based on the recognition result and the label of the second sample image, adjusting the parameters of the character recognition network model based on the second loss amount, and re-executing the character recognition network model input with the second sample image and the subsequent training process until the second loss amount is less than a second threshold value.
According to one or more embodiments of the present application, the inputting the first sample image into an initial network model to obtain a first feature vector set includes:
performing data enhancement processing on the first sample image to obtain a processed first sample image;
and inputting the processed first sample image into the initial network model to obtain the first feature vector set.
According to one or more embodiments of the present application, the determining a first loss amount according to the first feature vector set and the second feature vector set includes:
a first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrast loss function.
According to one or more embodiments of the present application, there is provided a character recognition apparatus, which may include:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring a text image to be processed, and the text image to be processed comprises text information to be recognized;
the processing unit is used for inputting the text image to be processed into a character recognition network model to obtain an output result, and the output result comprises the text information to be recognized; the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
In one or more embodiments of the present application, the processing unit is specifically configured to input the first sample image and the plurality of sub-sample images into an initial network model respectively, and obtain a first feature vector set and a second feature vector set, where a dimension of the first feature vector set is the same as a dimension of the second feature vector set, and feature vectors in the first feature vector set and feature vectors in the second feature vector set are feature vectors of the text information; determining a first loss amount according to the first feature vector set and the second feature vector set, adjusting parameters of the initial network model based on the first loss amount, re-executing the initial network model input by the first sample image and the plurality of sub-sample images and the subsequent training process until the first loss amount is smaller than a first threshold value, and obtaining the character recognition network model.
In one or more embodiments of the present application, the processing unit is specifically configured to determine a partition parameter, and divide the first sample image multiple times by using the partition parameter to obtain the multiple sub-sample images, where a ratio of a width of each of the multiple sub-sample images to a width of the sample image is equal to the partition parameter, and the partition parameter is greater than 0 and less than 1.
In one or more embodiments of the present application, the processing unit is specifically configured to determine, for each division operation, a start position of the division in the first sample image; and dividing the first sample image according to the starting position and the dividing parameter to obtain the plurality of sub-sample images.
In one or more embodiments of the present application, the value range of the partition parameter is 75% to 95%.
In one or more embodiments of the present application, the obtaining unit is further configured to obtain a second sample image and an annotation corresponding to the second sample image, where the second sample image includes text information, and the annotation is used to reflect the text information;
the processing unit is further configured to input the second sample image into the character recognition network model to obtain a recognition result, where the recognition result includes the text information; and determining a second loss amount based on the recognition result and the label of the second sample image, adjusting the parameters of the character recognition network model based on the second loss amount, and re-executing the character recognition network model input with the second sample image and the subsequent training process until the second loss amount is less than a second threshold value.
In one or more embodiments of the present application, the processing unit is specifically configured to perform data enhancement processing on the first sample image, so as to obtain a processed first sample image; and inputting the processed first sample image into the initial network model to obtain the first feature vector set.
In one or more embodiments of the present application, the processing unit is specifically configured to calculate a first loss amount between the first set of feature vectors and the second set of feature vectors by using a contrast loss function.
According to one or more embodiments of the present application, there is provided an electronic apparatus including: a processor and a memory;
the memory for storing instructions or computer programs;
the processor is configured to execute the instructions or the computer program in the memory, so that the electronic device executes the character recognition method.
According to one or more embodiments of the present application, there is provided a computer-readable storage medium having stored therein instructions that, when executed on a device, cause the device to perform the character recognition method.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the system or the device disclosed by the embodiment, the description is simple because the system or the device corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A character recognition method, the method comprising:
acquiring a text image to be processed, wherein the text image to be processed comprises text information to be identified;
inputting the text image to be processed into a character recognition network model to obtain an output result, wherein the output result comprises the text information to be recognized;
the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
2. The method of claim 1, wherein the training process of the character recognition network model comprises:
respectively inputting the first sample image and the plurality of sub-sample images into an initial network model to obtain a first feature vector set and a second feature vector set, wherein the dimension of the first feature vector set is the same as that of the second feature vector set, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are the feature vectors of the text information;
and determining a first loss amount according to the first feature vector set and the second feature vector set, adjusting parameters of the initial network model based on the first loss amount, re-executing the initial network model input by the first sample image and the subsequent training processes of the plurality of sub-sample images respectively until the first loss amount is smaller than a first threshold value, and obtaining the character recognition network model.
3. The method of claim 2, wherein the obtaining of the plurality of subsample images comprises:
determining a dividing parameter, and dividing the first sample image for multiple times by using the dividing parameter to obtain the multiple sub-sample images, wherein the ratio of the width of each sub-sample image in the multiple sub-sample images to the width of the sample image is equal to the dividing parameter, and the dividing parameter is greater than 0 and less than 1.
4. The method of claim 3, wherein the dividing the sample image a plurality of times using the division parameter to obtain the plurality of sub-sample images comprises:
for each division operation, determining a start position of the division in the first sample image;
and dividing the first sample image according to the starting position and the dividing parameter to obtain the plurality of sub-sample images.
5. The method according to claim 3 or 4, wherein the value range of the partition parameter is 75-95%.
6. The method of claim 2, further comprising:
acquiring a second sample image and a label corresponding to the second sample image, wherein the second sample image comprises text information, and the label is used for reflecting the text information;
inputting the second sample image into the character recognition network model to obtain a recognition result, wherein the recognition result comprises the text information;
and determining a second loss amount based on the recognition result and the label of the second sample image, adjusting the parameters of the character recognition network model based on the second loss amount, and re-executing the character recognition network model input with the second sample image and the subsequent training process until the second loss amount is less than a second threshold value.
7. The method of claim 2, wherein the inputting the first sample image into an initial network model to obtain a first feature vector set comprises:
performing data enhancement processing on the first sample image to obtain a processed first sample image;
and inputting the processed first sample image into the initial network model to obtain the first feature vector set.
8. The method of claim 2, wherein determining a first loss amount based on the first set of eigenvectors and the second set of eigenvectors comprises:
a first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrast loss function.
9. An apparatus for character recognition, the apparatus comprising:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring a text image to be processed, and the text image to be processed comprises text information to be recognized;
the processing unit is used for inputting the text image to be processed into a character recognition network model to obtain an output result, and the output result comprises the text information to be recognized; the character recognition network model is generated by training with a training sample, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same as the width of the first sample image, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
10. An electronic device, characterized in that the device comprises: a processor and a memory;
the memory for storing instructions or computer programs;
the processor, configured to execute the instructions or the computer program in the memory, so as to cause the electronic device to perform the character recognition method according to any one of claims 1 to 8.
11. A computer-readable storage medium having stored therein instructions that, when executed on a device, cause the device to perform the character recognition method of any one of claims 1-8.
CN202210114334.6A 2022-01-30 2022-01-30 Character recognition method, device, equipment and medium Pending CN114445812A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210114334.6A CN114445812A (en) 2022-01-30 2022-01-30 Character recognition method, device, equipment and medium
PCT/CN2023/072001 WO2023143107A1 (en) 2022-01-30 2023-01-13 Character recognition method and apparatus, device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210114334.6A CN114445812A (en) 2022-01-30 2022-01-30 Character recognition method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114445812A true CN114445812A (en) 2022-05-06

Family

ID=81370879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210114334.6A Pending CN114445812A (en) 2022-01-30 2022-01-30 Character recognition method, device, equipment and medium

Country Status (2)

Country Link
CN (1) CN114445812A (en)
WO (1) WO2023143107A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023143107A1 (en) * 2022-01-30 2023-08-03 北京有竹居网络技术有限公司 Character recognition method and apparatus, device, and medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803090A (en) * 2016-12-05 2017-06-06 中国银联股份有限公司 A kind of image-recognizing method and device
CN108875722A (en) * 2017-12-27 2018-11-23 北京旷视科技有限公司 Character recognition and identification model training method, device and system and storage medium
CN111695385B (en) * 2019-03-15 2023-09-26 杭州海康威视数字技术股份有限公司 Text recognition method, device and equipment
WO2021081562A2 (en) * 2021-01-20 2021-04-29 Innopeak Technology, Inc. Multi-head text recognition model for multi-lingual optical character recognition
CN113111871B (en) * 2021-04-21 2024-04-19 北京金山数字娱乐科技有限公司 Training method and device of text recognition model, text recognition method and device
CN113887442A (en) * 2021-09-29 2022-01-04 招商银行股份有限公司 OCR training data generation method, device, equipment and medium
CN114445812A (en) * 2022-01-30 2022-05-06 北京有竹居网络技术有限公司 Character recognition method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023143107A1 (en) * 2022-01-30 2023-08-03 北京有竹居网络技术有限公司 Character recognition method and apparatus, device, and medium

Also Published As

Publication number Publication date
WO2023143107A1 (en) 2023-08-03

Similar Documents

Publication Publication Date Title
US20230394671A1 (en) Image segmentation method and apparatus, and device, and storage medium
CN110826567B (en) Optical character recognition method, device, equipment and storage medium
EP3893125A1 (en) Method and apparatus for searching video segment, device, medium and computer program product
CN112364829B (en) Face recognition method, device, equipment and storage medium
CN112883968B (en) Image character recognition method, device, medium and electronic equipment
CN109934142B (en) Method and apparatus for generating feature vectors of video
CN112883967B (en) Image character recognition method, device, medium and electronic equipment
CN110659639A (en) Chinese character recognition method and device, computer readable medium and electronic equipment
CN114445813A (en) Character recognition method, device, equipment and medium
CN115294501A (en) Video identification method, video identification model training method, medium and electronic device
CN109919220B (en) Method and apparatus for generating feature vectors of video
CN114445812A (en) Character recognition method, device, equipment and medium
CN114049632A (en) Image character recognition model training method, image character recognition method and device
CN111128131B (en) Voice recognition method and device, electronic equipment and computer readable storage medium
CN110674813B (en) Chinese character recognition method and device, computer readable medium and electronic equipment
CN113051933B (en) Model training method, text semantic similarity determination method, device and equipment
CN114495080A (en) Font identification method and device, readable medium and electronic equipment
CN114612909A (en) Character recognition method and device, readable medium and electronic equipment
CN114595346A (en) Training method of content detection model, content detection method and device
CN116821327A (en) Text data processing method, apparatus, device, readable storage medium and product
CN114495081A (en) Text recognition method and device, readable medium and electronic equipment
CN111738311A (en) Multitask-oriented feature extraction method and device and electronic equipment
CN111340813A (en) Image instance segmentation method and device, electronic equipment and storage medium
CN114625876B (en) Method for generating author characteristic model, method and device for processing author information
CN111783858B (en) Method and device for generating category vector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination