CN115661846A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115661846A
CN115661846A CN202110770563.9A CN202110770563A CN115661846A CN 115661846 A CN115661846 A CN 115661846A CN 202110770563 A CN202110770563 A CN 202110770563A CN 115661846 A CN115661846 A CN 115661846A
Authority
CN
China
Prior art keywords
text
character
recognition model
image
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110770563.9A
Other languages
Chinese (zh)
Inventor
程昌旭
郑琪
张诗禹
王永攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Co
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110770563.9A priority Critical patent/CN115661846A/en
Publication of CN115661846A publication Critical patent/CN115661846A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring text image data to be identified; recognizing text image data according to a pre-trained text recognition model, and determining a recognition result, wherein the trained text recognition model and a single character recognition model share character information; outputting each text in the recognition result for display; the recognition accuracy of the text image data can be improved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a storage medium.
Background
Character Recognition (OCR) technology refers to the recognition of textual characters in an image.
The existing character recognition is generally performed by using a recognition model trained in advance, but the existing recognition model has low recognition accuracy.
Disclosure of Invention
The embodiment of the application provides a data processing method for improving the identification accuracy of an identification model.
Correspondingly, the embodiment of the application also provides electronic equipment and a storage medium, which are used for ensuring the realization and the application of the system.
In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring text image data to be identified; recognizing the text image data according to a pre-trained text recognition model, determining a recognition result, sharing character information between the trained text recognition model and the single character recognition model, and outputting each text in the recognition result for display.
In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: providing an interactive page, wherein the interactive page comprises a data uploading control; acquiring text image data to be recognized according to triggering of a data uploading control, uploading the text image data to a server, so that the server can recognize the text image data according to a pre-trained text recognition model and determine a recognition result, wherein the trained text recognition model and a single character recognition model share character information; and receiving and displaying the identification result.
In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring an ancient character image to be identified; recognizing the ancient text image according to a pre-trained text recognition model, and determining a recognition result, wherein the trained text recognition model and a single character recognition model share character information; and outputting the identification result for display.
In order to solve the above problem, an embodiment of the present application discloses an electronic device, including: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform the method as described in one or more of the above embodiments.
To address the above issues, embodiments of the present application disclose one or more machine-readable media having executable code stored thereon that, when executed, cause a processor to perform a method as described in one or more of the above embodiments.
Compared with the prior art, the embodiment of the application has the following advantages:
in the embodiment of the application, before text image data is recognized, a single character is extracted to form a single character image according to a marked training text image and is marked, a text recognition model is trained according to the marked training text image, the single character recognition model is trained according to the marked single character image, character information is shared between the text recognition model and the single character recognition model, after the text recognition model is trained, the text image data to be recognized can be obtained and input into the pre-trained text recognition model to extract the characteristics of each character in the text image data and associate the characteristics of each character context character, so that character classification is performed, a recognition result is determined, and the recognition result is output to be displayed. According to the method and the device, the text recognition model can be trained through the training text images and the single character images, so that the influence of factors such as unbalanced distribution of characters in the training text images on the recognition result can be weakened by utilizing character information of the single characters which are distributed in the single character images in a balanced manner, and the recognition accuracy of the text recognition model can be improved.
Drawings
FIG. 1A is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;
FIG. 1B is a schematic block diagram of a context modeling module according to one embodiment of the present application;
FIG. 1C is a schematic diagram of a character enhancer according to an embodiment of the present application;
FIG. 1D is a flow chart illustrating the processing of the arbiter according to one embodiment of the present application;
FIG. 1E is a schematic flow chart diagram of a data processing method according to another embodiment of the present application;
FIG. 2A is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;
FIG. 2B is a schematic diagram of a character enhancer process flow, according to one embodiment of the present application;
FIG. 3 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;
FIG. 4A is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;
FIG. 4B is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;
FIG. 5 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;
FIG. 6 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;
FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 8 is a schematic block diagram of a data processing apparatus according to another embodiment of the present application;
FIG. 9 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;
FIG. 10 is a block diagram of an exemplary apparatus provided in one embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
The embodiment of the application can be applied to the field of Character Recognition, and the technology of Character Recognition (Optical Character Recognition, OCR) refers to Recognition of characters in an image. The text recognition model can be trained by using the marked training text images and the single character images, wherein the single character images can be determined in various ways, such as by extracting a single character in the training text images, or can be obtained to combine the single character and the images into the single character images, and the single character can also be single characters with different fonts, so that the single character images are balanced in category distribution as much as possible. After the training of the text recognition model is completed, the text image to be recognized can be input into the trained text recognition model to determine a recognition result. According to the method and the device, the influence of factors such as unbalanced distribution of characters in the training text image on the recognition result can be weakened through the single character image, and therefore the recognition accuracy of the text recognition model can be improved.
Specifically, as shown in fig. 1A, the training process of the embodiment of the present application involves three branches, namely a text line learning branch for recognizing an image of a line of text (text line) containing one or more characters, a single-character learning branch for recognizing an image of a single character, and a countering learning branch for determining a branch to which features extracted by the text line learning branch and the single-character learning branch belong, so as to adjust the other two branches. The text line learning branch can be identified through a text identification model, the text identification model is used for identifying image data containing a plurality of characters to determine an identification result, in the training process, a training text image containing a plurality of characters can be input into the text identification model to determine a first prediction result, and the text identification model is adjusted according to the difference between the first prediction result and the label of the training text image. The text recognition model can comprise a first feature extractor, a context modeling module and a first character classifier, wherein the first feature extractor is used for extracting feature information of each character in the image to serve as a first feature of each character, the context modeling module is used for integrating features of contexts (characters before and after the character) into the first feature of the character to determine first feature data, and the first character classifier is used for carrying out character classification according to the first feature data to determine a first prediction result.
It should be noted that, as shown in fig. 1B, in addition to the context modeling module may associate features of a character context, the context modeling module may further configure a character enhancer to enhance the character through the character enhancer to reduce interference of a background in an image with the character, in an example shown in fig. 1B, the context modeling module may configure two character enhancers and an encoder, the encoder is configured to fuse character context features, in this embodiment of the present application, the character feature may be enhanced through the first character enhancer to obtain a processed first feature, the character context feature may be fused into the character feature through the encoder to obtain first feature data, and the first feature data may be further enhanced through the second character enhancer. Specifically, as shown in fig. 1C, the character enhancer (taking the first character enhancer as an example) may include a background separator and a style extractor, and the character enhancer may separate the background features through the background separator, extract the style features of the characters through the style extractor, enhance the style features, and perform fusion to form the processed first features. The method and the device can enhance the style characteristics of the characters so as to reduce the interference of the background characteristics on the recognition result and improve the recognition accuracy of the text recognition model.
The single character recognition model of the single character learning branch is used for recognizing the single character image containing single characters so as to determine the recognition result of the single character image. The single character recognition model comprises a second feature extractor and a second character classifier, wherein the second feature extractor is used for extracting features in the single character image to serve as second feature data, and the second character classifier is used for carrying out character classification according to the second feature data to determine a recognition result (or called a prediction result). It should be noted that, a first character classifier in the text recognition model and a second character classifier in the single character recognition model are shared, and structures and parameters of the first character classifier and the second character classifier are the same.
Natural languages usually have a long tail effect, which refers to an imbalance in sample distribution, and a small number of characters are frequently used, called header characters; a large number of characters are rarely used, called tail characters. For example, in life, the number of Chinese characters most commonly used by people is about 3000, but the total number of Chinese characters is ten thousands, which is a serious imbalance of the use frequency of Chinese characters. Therefore, for the classification method using the context-related feature and completely depending on the feature data related to the context feature, the recognition result of the part of the obscure words has low accuracy due to the unbalanced character distribution.
In the embodiment of the application, the first character classifier and the second character classifier are shared, and the first character classifier is also adjusted in the process of adjusting the second character classifier through the single character image, so that the purpose of training the first character classifier of the text recognition model through the single character image is achieved, the influence of factors such as unbalanced distribution of characters in the training text image on the recognition result can be weakened, and the recognition accuracy of the text recognition model is further improved.
According to the embodiment of the application, the marked training text image can be obtained, the training text image is input into the text recognition model, the first feature of the character is extracted through the first feature extractor, the processed first feature is determined through the context modeling module, and then the processed first feature can be input into the generator, the character information is determined, and the single character image is determined. It should be noted that, in the embodiment of the present application, the description is given by taking an example of generating a single character image by using character features after character style enhancement, and in some optional examples, the single character image may also be generated for training based on character features without character style enhancement, and may be specifically configured according to requirements. After the single-character image is determined, annotations can be configured for the single-character image. According to the method and the device for processing the single-character image, the single-character image can be configured to be marked according to the marks of the training text images, the single-character images can be arranged according to the positions of the characters in the training text images to obtain a single-character image sequence, and the marks of the training text images are used as the marks of the single-character image sequence.
In this embodiment, the single-character image is generated according to the character information of each character in the training text image, so that the features of the characters in the single-character image should be consistent with the features of the corresponding characters in the training text image, but in the process of actually performing feature extraction by the text recognition model, the features of a part of the background in the training text image may be extracted as the features of the characters, and the single-character image is a background-free image generated according to the character information, and the features extracted according to the single-character image do not include the background features, so that the features extracted by the text recognition model and the single-character recognition model may not be consistent or even may be orthogonal (there is no association between the characterization features), and therefore, in the embodiment of the present application, the difference information between the first feature data and the second feature data of the same character may be determined by the discriminator of the counterlearning branch, and the first feature extractor of the text recognition model may be adjusted according to the difference information. Specifically, as shown in fig. 1D, in the embodiment of the present application, first feature data of a target character in a training text image and second feature data of the target character in a single character image may be obtained and input into a discriminator, and difference information is determined, where the discriminator may perform scoring according to a data type to which the feature data belongs, for example, the data type may include a single character type (data corresponding to the single character image) and a character line type (data corresponding to the training text image). Determining difference information according to the first score of the first characteristic data and the second score of the second characteristic data; and adjusting a first feature extractor of the text recognition model according to the difference information.
In addition, in order to further reduce the difference between the first feature data and the second feature data, in the embodiment of the present application, a corresponding score sequence may be generated according to a first prediction result of the training text image, and a corresponding score in the score sequence is extracted to adjust the first score (for example, an average value of two scores) so as to reduce the influence of the background feature in the first feature data on the first score, and further improve the consistency between the first feature data and the second feature data, so as to improve the recognition accuracy of the text recognition model.
After the labeled training text image and the labeled single character image are determined, on one hand, the training text image can be input into a text recognition model, a first prediction result is determined through the text recognition model, then, a character model adjustment amount is determined according to the difference between the first prediction result and the label of the training text image, and the text recognition model is adjusted, wherein the character model adjustment amount comprises an adjustment amount corresponding to a first feature extractor, an adjustment amount corresponding to a context modeling module, and a character classification adjustment amount corresponding to a first character classifier. The first character classifier of the text recognition model and the second character classifier of the single character recognition model are shared, so that the second character classifier can be adjusted by adjusting the character classification of the first character classifier, and the parameters of the first character classifier and the second character classifier are kept consistent. Besides, the first character classifier and the second character classifier can be shared in the above manner, and other manners can be adopted to share the first character classifier and the second character classifier, for example, a character classifier shared by a text recognition model and a single character recognition model can be configured to achieve the effect of sharing the character classifier.
On the other hand, a single-character image may be input into the single-character recognition model, a second prediction result may be determined by the single-character recognition model, and then a single-character model adjustment amount may be determined and adjusted according to a difference between the second prediction result and a label of the single-character image, where the single-character model adjustment amount may include an adjustment amount corresponding to the second feature extractor and a single-character classification adjustment amount corresponding to the second character classifier. The embodiment of the application can adjust the first character classifier according to the single character classification adjustment amount, so that the parameters of the first character classifier and the second character classifier are kept consistent (shared).
It should be noted that, in the training process of the text recognition model, the influence of the unbalanced distribution of the characters on the text recognition model is small in the early stage of the training, and the influence of the unbalanced distribution of the characters on the text recognition model is large in the later stage of the training, so that the embodiment of the application can configure corresponding weight information for the single character image, and continuously increase the influence of the single character image on the first character classifier of the text recognition model along with the progress of the training process of the text recognition model, thereby improving the recognition accuracy of the text recognition model. For example, according to the embodiment of the application, the corresponding weight information can be determined according to the iteration times of the text recognition model, so that the single-character classification adjustment amount is adjusted according to the weight information, the influence of the single-character classification adjustment amount on the first character classifier in the training process of the text recognition model is continuously increased, and the recognition accuracy of the text recognition model is improved.
It should be noted that, in the embodiment of the present application, the text line learning branch, the single character learning branch, and the countercheck learning branch may be trained by one device, or may be respectively deployed on different devices for training, and may be specifically configured according to requirements. Further, different components in the three branches may also be deployed on different computing devices, for example, the first feature extractor, the context modeling module, and the first character classifier may be configured on different computing devices, and a connection may be established between the computing devices, so as to complete corresponding training.
After the training of the text recognition model is completed, the text image to be recognized can be recognized according to the text recognition model, it should be noted that the single character recognition model provides an auxiliary effect on the first character classifier of the text recognition model in the training stage, and the single character recognition model does not have an effect in the model application stage. The text image data to be recognized can be obtained, the text image data is recognized according to the pre-trained text recognition model, and the recognition result is determined.
The method and the device for recognizing the text image can be applied to various scenes for recognizing the text image, and therefore the text image data to be recognized can be obtained in a data obtaining mode corresponding to different scenes. The embodiment of the application can be applied to a scene of big data processing, target information can be set, and text image data related to the target information can be acquired. In addition, as shown in fig. 1E, the embodiment of the application may also be applied to a server, where the server may interact with a terminal, and may provide an interactive page for the terminal, so as to obtain text image data based on the interactive page, and feed back an identification result after identification. The embodiment of the application can also be applied to the scene of enterprise query, can identify the text image containing the enterprise name, searches the related data of the enterprise according to the enterprise name in the identification result, and outputs the related data of the enterprise.
The method and the device for the text recognition model can output the recognition result and display the recognition result to the user in the interactive page, the user can adjust the recognition result in the interactive page to form result adjustment information, the recognition result is adjusted according to the result adjustment, and the text recognition model is adjusted according to the adjusted recognition result. According to the method and the device, the text recognition model can be adjusted according to the adjustment of the user on the recognition result, and the accuracy of the text recognition model can be further improved. The training text image and the arranged single character images can be displayed through the interactive page, and the single character images can be arranged according to the positions of the characters in the single character images in the training text image. The user can adjust the characters in the interactive page to obtain a character adjusting instruction, and adjust the training text image or the single character image according to the character adjusting instruction to obtain an adjusted image, so that the training of the text recognition model can be performed according to the adjusted image.
The user can also individually configure the text recognition model according to the corresponding scene, and the user can upload target training data in the target field, wherein the target training data comprises target training text images and marking data; extracting a single character in the target training text image, determining a target single character image, and configuring a label for the target single character image according to the label data; and then, training a target text recognition model according to the labeled target training text image and the labeled target single character image so as to recognize text image data of the target field according to the target text recognition model. The target field can be education, medical treatment, E-commerce and the like, so that the personalized text recognition model can be trained according to the training data of the corresponding field.
The embodiment of the application is optimized in the training stage of the text recognition model, so that the embodiment of the application can be applied to a scene of recognizing various text images, for example, the embodiment of the application can be applied to a scene of recognizing text images containing information such as ancient characters, education related information, medical related information, e-commerce related information, enterprise information and conference content.
For example, the method and the device for identifying the ancient character image can be applied to a scene of identifying the text image containing the ancient character, the ancient character image can be obtained, identification is carried out according to the pre-trained text identification model, and the identification result is determined. According to the corresponding recognition result, the ancient characters can be converted into the modern characters.
For another example, the embodiment of the present application may also be applied to a scene in which an education text image including education related information is recognized, the education text image may be obtained, the education text image may be recognized according to a pre-trained text recognition model, a recognition result may be determined, and corresponding processing may be performed according to the recognition result, for example, when the education related information is answer information of a student, the embodiment of the present application may perform scoring according to an answer in the recognition result, so as to determine a score of the student.
The method and the device for recognizing the text image can be applied to the scenes, besides the scenes, the method and the device for recognizing the text image can also be applied to the scenes of searching based on the text image, the text image to be recognized uploaded by a user can be obtained, recognition is carried out according to a text recognition model trained in advance, and a recognition result is determined. And then, searching according to the identification result, and displaying the search result to the user.
In addition, the text image is divided from the texts contained in the text image, and the embodiment of the present application can also be applied to a scene in which the text image containing different types of texts is recognized. For the recognition of text images of different types of characters in the same language, the Chinese characters are taken as an example, and the recognition of the text of various types of Chinese characters such as simplified characters, complex characters, seal characters and the like can be included. For another example, the recognition of different fonts of a chinese character, such as a multiple-font chinese character text including a regular script, an clerical script, a cursive script, and a willow body, is performed.
The embodiment of the application provides a data processing method, which corresponds to a training process of a text recognition model, can be applied to a processing end, wherein the processing end can be understood as equipment for training the text recognition model. Specifically, as shown in fig. 2A, the method includes:
step 202, obtaining the marked training text image. And step 204, determining a single-character image and labeling, wherein the single-character image is determined based on the training text image or determined by synthesis. On one hand, a single character in a training text image can be extracted to obtain a single character image, and on the other hand, the single character image can be determined in a synthesis mode in the embodiment of the present application. As for the method for determining a single-character image according to a training text image, in the embodiment of the present application, the training text image may be input into a text recognition model, and the training text image is processed by a first feature extractor and a context modeling module in the character model to determine a first feature after processing each character, and to determine the single-character image according to the processed first feature, specifically, as an optional embodiment, the determining the single-character image includes: the processed first feature is input into a generator to determine character information and to determine a single character image. The character information may include character trajectory information, character structure, position information of characters in the training text image, and the like. After the single-character images are determined, labels can be configured for the single-character images, and for the single-character images determined according to the training text images, the labels can be configured for the single-character images according to the labels of the training text images. According to the embodiment of the application, the matching characters corresponding to the single character images in the training text images can be determined according to the position information of the characters in the training text images, and labels are configured for the single character images according to the labels of the matching characters. For the synthesized single-character image, the label of the single-character image can be determined in a character recognition mode or a manual labeling mode in the embodiment of the present application, it should be noted that the embodiment of the present application can also adopt other modes to configure the label for the single-character image, and specifically can configure the label according to the requirement.
After the labeled training text image and the labeled single character image are determined, for the training text image, in step 206, a text recognition model may be trained according to the labeled training text image, where the text recognition model includes a first feature extractor, a context modeling module, and a first character classifier.
The method includes the steps of inputting a training text image into a text recognition model, extracting first features of characters in the image through a first feature extractor, associating the features of contexts of the characters through a context modeling module to determine first feature data, and then performing character classification according to the first feature data through a first character classifier to determine a first prediction result. After the first prediction result is determined, the character model adjustment amount can be determined according to the deviation between the first prediction result and the label of the training text image, and the text recognition model is adjusted.
In order to improve the accuracy of classification of the text recognition model, the embodiment of the present application may further enhance the features of the characters to reduce the interference of the background features on the recognition result, and specifically, as an optional embodiment, the context modeling module is configured to: acquiring first features of each character extracted by a first feature extractor, performing feature separation, and determining character features and background features; enhancing character features, and fusing background features to obtain a processed first feature; and performing context association on the processed first features to obtain first feature data, so as to determine a first prediction result according to the first feature data and the first character classifier, and adjusting the text recognition model according to the first prediction result and the label of the training text image. The context modeling module may be configured with a character enhancer and an encoder to classify character features and background features in the first features through the character enhancer and enhance the character features, then fuse the enhanced character features with the background features to obtain processed first features, and then correlate the features of the characters of the context through the encoder to determine first feature data. It should be noted that, in an optional example, after the feature association is performed, a character enhancer may be further configured in the embodiment of the present application, so as to further perform enhancement on the character feature, and obtain the first feature data.
In addition, the characters input into the text recognition model and the single character recognition model are substantially the same, but the text recognition model and the single character recognition model may have a large difference in the features extracted by the two models due to inaccuracy in feature extraction, and therefore, the embodiment of the present application may adjust the two models according to the difference, and specifically, as an optional embodiment, the method further includes: acquiring first characteristic data of a target character in a training text image and second characteristic data in a single character image, inputting the first characteristic data and the second characteristic data into a discriminator, and determining difference information; and adjusting a first feature extractor of the text recognition model according to the difference information. In the embodiment of the present application, the input feature data (the first feature data or the second feature data) may be classified to determine whether the input feature data is a single character type or a character row type, and the first feature extractor may be adjusted according to the determination result.
In order to reduce interference of the background feature in the training text image on the determination result, the embodiment of the present application may further adjust the determination result of the first feature data according to the label of the training text image, so as to reduce a difference between the first feature data and the second feature data, specifically, as an optional embodiment, the determining the difference information includes: grading the first characteristic data and the second characteristic data according to data types, wherein the data types comprise a single character type and a character line type; adjusting the first score of the first feature data according to the label of the training text image so as to reduce the influence of the background feature in the first feature data on the first score; and determining difference information according to the adjusted first score and the second score of the second characteristic data. The data type to which the feature data belongs may be determined by a scoring manner, for example, a score may be set for the data type in advance (for example, a single character type data is set to 0, and a character row type is set to 1), so that the type is determined according to the score. After the scores corresponding to the first feature data and the second feature data are determined, a score sequence may be generated according to a first prediction result of the training text image, the first score of the first feature data may be adjusted according to the score sequence (for example, an average value of the corresponding score and the first score in the score sequence) to obtain an adjusted first score, and then the difference information may be determined according to the adjusted first score and the second score of the second feature data. In the embodiment of the application, the smaller the difference between the first characteristic data and the second characteristic data is (the more the data type cannot be judged by the discriminator), the more consistent the first characteristic data and the second characteristic data are characterized by, and the higher the identification accuracy is. Therefore, the embodiment of the application can adopt a counterstudy mode, and adjust the first feature extractor through the judgment result of the discriminator, so that the interference of the background in the training text image on the features of the characters can be reduced, and the accuracy of the text recognition model is further improved.
For single character images, in step 208, the first character classifier of the text recognition model is adjusted according to the labeled single character images, so as to obtain a trained text recognition model. The embodiments of the present application may input a single-character image into a single-character recognition model to adjust a first character classifier based on adjustments to the single-character recognition model. Specifically, as an alternative embodiment, the adjusting the first character classifier of the text recognition model according to the labeled single character image includes: adjusting a single character recognition model according to the marked single character image, wherein the single character recognition model comprises a second feature extractor and a second character classifier; and determining single-character classification adjustment quantity according to the adjustment of the character information of the single-character image on the second character classifier, and adjusting the first character classifier according to the single-character classification adjustment quantity.
According to the embodiment of the application, the single-character image can be input into the single-character recognition model, the second feature data of the characters in the single-character image are extracted through the second feature extractor, the characters are classified according to the second feature data through the second character separator, the second prediction result is determined, then the single-character model adjustment amount can be determined according to the second prediction result and the label of the single-character image, and therefore the single-character recognition model is adjusted. In addition, in the embodiment of the present application, a single-character classification adjustment amount may be determined according to adjustment of the second character classifier, and the first character classifier is adjusted according to the single-character classification adjustment amount, so as to achieve a purpose of sharing the first character classifier and the second character classifier, thereby achieving a purpose of sharing character information (sharing character information of characters in a training text image and character information of characters in a single-character image) between the first character classifier and the second character classifier.
In addition, in order to control the influence of the training text image and the single-character image on the first character classifier at different stages, the embodiment of the application may further configure weight information to adjust the first character classifier according to the weight information. Specifically, as an optional embodiment, the adjusting the first character classifier according to the single-character classification adjustment amount includes: determining weight information corresponding to the single character classification adjustment amount; and adjusting the first character classifier according to the weight information and the single character classification adjustment amount. The weight information can be determined according to a training process of the text recognition model, and along with continuous iterative training of the text recognition model, the weight information can be continuously increased so as to continuously improve the influence of single character images on the first character classifier, so that the first character classifier of the text recognition model is mainly adjusted through the training text images in the early stage of training, and the first classifier of the text recognition model is mainly adjusted through the single character images in the later stage of training, so that the accuracy of the text recognition model is improved.
In the embodiment of the application, the marked training text image can be obtained, a single character in the training text image is extracted, a single character image is determined, and the single character image can also be determined in a synthesis mode; and labels are configured for the single character images. Then, on one hand, the training text image can be input into the text recognition model, first features of all characters are extracted through the first feature extractor, the features of the character contexts are associated through the context modeling module to obtain first feature data, the first character classifier is used for classifying according to the first feature data to determine a prediction result, and then the text recognition model is adjusted according to the prediction result and the label of the training text image to train the text recognition model. On the other hand, the first character classifier can be adjusted through the marked single character image, and a trained text recognition model is obtained. According to the method and the device, the influence of factors such as unbalanced distribution of characters in the training text image on the recognition result can be weakened through the single character image without the context characteristics, and therefore the recognition accuracy of the text recognition model can be improved.
In the embodiment of the present application, a Residual Network model (ResNet) may be used as a first feature extractor of the text recognition model, for example, resNet45 may be used as the first feature extractor, the Residual Network is a convolutional neural Network, and the Residual Network uses jump connection through its internal Residual block, so that a gradient vanishing problem caused by increasing depth in the deep neural Network is alleviated. The context modeling module of the text recognition model comprises a character enhancer, an encoder and a character enhancer which are connected in sequence, wherein the character enhancer can comprise a background separator and a wind format extractor, the background classifier of the embodiment of the application can comprise two fully-connected layers and an S-shaped function (Sigmoid function), each node of the fully-connected layers is connected with all nodes of the upper layer and used for integrating the extracted features of the upper layer, the embodiment can extract the features through the two fully-connected layers and map the features to the background features through the Sigmoid function, and the Sigmoid function is often used as an activation function of a neural network and used for mapping variables between 0 and 1. The style extractor includes two fully connected layers to extract style features.
The first character classifier of the text recognition model can be understood as a fully connected layer followed by a normalized exponential function (softmax function). In addition, after determining the first prediction result and the label of the training text image, the embodiment of the application may determine a character model adjustment amount (or referred to as a first loss function) according to a difference between the first prediction result and the label of the training text image, where the character model adjustment amount may be determined by the following formula 1.
L Rline =Loss(P L Y) formula 1
Wherein, P L The first prediction result is referred to, and y is the labeling result of the training text image.
Specifically, the style of characters in the training text image is generally uniform, and the style of the background is inconsistent, so that the background and the characters can be separated and processed according to the corresponding style, thereby enhancing the character features and obtaining the processed first features. Specifically, as shown in fig. 2B, in the embodiment of the present application, the local background features may be separated by a background separator, and the local style features may be extracted by a style extractor. Then, for the local background features, feature concatenation may be performed between the local background features to obtain a spliced background feature, where the feature concatenation is to perform vector splicing (concatenate) between the local background features to form a spliced background feature. After the background features and style features are determined, the average style features may be obtained by means of weighted average, and the average style features may include the average background style features and the average character style features (or foreground style features). The embodiment of the application can determine the average style characteristic S through formula 2 a
Figure BDA0003152806170000101
Wherein, W fb Showing the characteristics of the splicing background which is based on the local background characteristics W b Determination of S L Showing style characteristics, Z being against a spliced background characteristic W fb Is calculated.
After the average style characteristic is determined, the average style characteristic and the splicing background characteristic can be fused in a weighted average mode to obtain a global characteristic, the weighted average mode can be understood as determining a background weight corresponding to each position in the splicing background characteristic, and the foreground style characteristic and the background style characteristic are fused according to the background weight to obtain the global characteristic. Global feature S g Can be used for dredgingAs determined by equation 3 below.
S g =W fb S a Equation 3
After determining the global feature, the global feature and the elements of the first feature may be added one by one to obtain a processed first feature, where adding element by element may be understood as adding each element in the first feature (which may be understood as a matrix) to a corresponding element in the global feature (which may be understood as a matrix) to obtain a processed first feature (which is an added matrix). In the embodiment of the application, the uniform character style can be averaged, and the non-uniform background features can be averaged, so that the character features can be enhanced, and the influence of the background features on the character features is weakened.
In the embodiment of the present application, the category corresponding to the background feature may be set as a blank category (corresponding to a non-character feature), and the category identifier (category id) is set as 0, so that the embodiment of the present application may set the probability sequence P of the blank category in the prediction result of the first character classifier as a probability sequence P of the blank category 0 Background separation result W as background separator b To determine an enhancer adjustment amount for the character enhancer, wherein the enhancer adjustment amount L is the amount by which the character enhancer is adjusted loc Can be determined by the following equation 4.
Figure BDA0003152806170000102
In the embodiment of the application, the context modeling module is provided with 2 character intensifiers, and the processing results (the character characteristic sequences after processing) obtained by the two character classifiers are respectively W b (1) And W b (2) The overall localization loss function of the context modeling module can be determined by the following equation 5,
L loc =l loc (p 0 ,W b (1) )+l loc (p 0 ,W b (2) ) Equation 5
For the single-character recognition model, the single-character recognition model is used for the single characterThe images are classified, the single character recognition model inputs the uniformly distributed single character images, and a second feature extractor of the single character recognition model can adopt 41 layers of convolution layers, wherein each convolution layer is used for extracting different features. The second character classifier may determine a second prediction result according to the second feature data extracted by the second feature extractor, and determine a single character model adjustment amount (or referred to as a second loss function) according to the second prediction result and the label of the single character image. Cross entropy loss function L Rchar Can be determined according to the following equation 6.
L Rchar =-log P C ,y c Equation 6
Wherein, P C Refers to the second prediction result, y c Are referred to as single character tags.
The embodiment of the application can share a first character classifier of a text recognition model and a second character classifier of a single character recognition model, and the sharing of the two classifiers is based on the premise that features extracted by the two models have strong correlation, but the extracted features may have poor correlation without constraint, and even the extracted features of the two models may be orthogonal (irrelevant). This embodiment represents the text line and the single character label as 1 and 0, respectively. Through the discriminator, the second feature data in the single character recognition model is mapped to a score (second score), and the first feature data feature (sequence) of the text recognition model is mapped to a score sequence (sequence of first scores). Wherein, the single character recognition model does not include the background feature, but the feature extracted by the text recognition model may include the background feature, therefore, in order to weaken the influence of the background feature in the text recognition model on the extracted feature, in the embodiment of the present application, a character mask may be generated according to the first prediction result of the text recognition model, and the determination result of the determiner may be adjusted according to the character mask, which may be represented by the following formula 7,
Figure BDA0003152806170000112
wherein, P L The first prediction result is referred to, 0 represents the category id of the blank category (non-character category), and the embodiment of the application can screen the prediction results of the categories (character-containing categories) larger than 0 from the prediction results and determine the corresponding character mask M (score sequence). Then, the character mask M is used as a weight to perform weighted average on the discrimination result (first score) of the discriminator to obtain an adjusted first score, where the adjusted first score can be determined by the following formula 8.
Figure BDA0003152806170000111
Wherein s is l Refers to the first score after processing, and m refers to the character mask.
After determining the adjusted first score, difference information may be determined according to the adjusted first score and the second score to adjust the feature extractor (the first feature extractor and/or the second feature extractor) according to the difference information.
In the embodiment of the present application, the adjustment amount (or called binary cross entropy loss function L) for the feature extractor in the model may be determined by the following formula 9 D ) In which the discriminators have poor discrimination capability in the initial stage, therefore, in the embodiments of the present application, after the discriminators are adjusted by using partial data, the discriminators after the adjustment may be used to discriminate feature data, where the gradient of the parameter may be obtained by the following equation 9, and the discrimination may be determined based on the corresponding gradientWhether the discriminator can perform discrimination.
L D =-log S L -log(1-S C ) Equation 9
Wherein S is L Refers to the sequence of the first fraction, S C Refers to the second score.
The embodiment of the present application may input feature data (first feature data and second feature data) into a discriminator, determine scores (a first score and a second score) by the discriminator, and then determine an adjustment amount for a feature extractor (a first feature extractor and a second feature extractor) according to the first score and the second score, thereby adjusting the feature extractor, wherein the adjustment amount for the first feature extractor may be determined by the following formula 10, and the adjustment amount for the second feature extractor may be determined by the following formula 11.
L Gline =-log(1-S L ) Equation 10
L Gchar =-log S C Equation 11
In summary, in the embodiment of the present application, a training text image is used to train a text recognition model, and a single character image is used to train a single character recognition model, wherein a character model adjustment amount of the text recognition model can be represented by the following formula 12, and a single character model adjustment amount of the single character recognition model can be represented by the following formula 13.
L line =L RlineL L Gline +γL loc Equation 12
L char =L RcharC L Gchar Equation 13
Wherein the content of the first and second substances,
α L =min(|S L -S C |+α L0 ,1.0)
α c =min(|S L -S C |+α C0 ,1.0)
after the training is finished, the text image is recognized by adopting the trained text recognition model, so that the training text image and the single character image are mainly utilized in the embodiment of the applicationTraining the text recognition model by the three images and the discriminator, and correspondingly, in this aspect, the total adjustment L of the text recognition model obj Can be expressed by the following formula 14,
L obj =L line +βL char +L D equation 14
In the early training stage, the first feature extractor and the context modeling module of the text recognition model tend to be adjusted more through the training text image, and in the later training stage, the first character classifier tends to be adjusted through the single character image, so that the embodiment of the present application sets the coefficient β as the weight information to control the influence of the training text image and the single character image on the text recognition model in different stages based on the weight information, where the weight information may be represented by the following formula 15.
Figure BDA0003152806170000121
Where T is the number of iteration steps of the current training step, T 0 Initial iteration step, T 1 Stopping the iteration step, B 0 And B 1 Lower and upper bounds of beta, respectively. After the training of the text recognition model is completed, the text image to be recognized can be recognized through the text recognition model so as to determine a recognition result and output the recognition result.
On the basis of the foregoing embodiments, an embodiment of the present application further provides a data processing method, which can be applied to a processing end, as shown in fig. 3, where the method includes:
step 302, obtaining the labeled training text image, and extracting a single character in the training text image to obtain a single character image. As an alternative embodiment, the processed first feature is input to a generator to determine character information and to determine a single character image.
And step 304, configuring labels for the single character images according to the labels of the training text images. It should be noted that, in the present embodiment, the determination of the single-character image based on the training text image is described as an example, and the single-character image may also be determined in other manners, for example, the single-character image may be synthesized in a synthesizing manner in the embodiments of the present application.
And step 306, training a text recognition model according to the marked training text image, wherein the text recognition model comprises a first feature extractor, a context modeling module and a first character classifier. As an alternative embodiment, the context modeling module is configured to: acquiring first features of all characters extracted by a first feature extractor, performing feature separation, and determining character features and background features; enhancing character features, and fusing background features to obtain a processed first feature; and performing context association on the processed first features to obtain first feature data, so as to determine a prediction result according to the first feature data and the first character classifier, and adjusting the text recognition model according to the prediction result and the label of the training text image. As an optional embodiment, the method further includes, as an adjusting step, adjusting the first feature extractor according to a difference between the first feature data and the second feature data: acquiring first characteristic data of a target character in a training text image and second characteristic data in a single character image, inputting the first characteristic data and the second characteristic data into a discriminator, and grading the first characteristic data and the second characteristic data according to data types, wherein the data types comprise a single character type and a character line type; adjusting the first score of the first feature data according to the label of the training text image so as to reduce the influence of the background feature in the first feature data on the first score; and determining difference information according to the adjusted first score and the second score of the second characteristic data. And adjusting a first feature extractor of the text recognition model according to the difference information.
And 308, adjusting a single character recognition model according to the marked single character image, wherein the single character recognition model comprises a second feature extractor and a second character classifier.
Step 310, determining the single character classification adjustment amount according to the adjustment of the character information of the single character image to the second character classifier.
And step 312, determining weight information corresponding to the single character classification adjustment amount.
And step 314, adjusting the first character classifier according to the weight information and the single character classification adjustment amount to obtain a trained text recognition model.
In the embodiment of the application, the marked training text image can be obtained, a single character in the training text image is extracted, and a single character image is determined; and configuring labels for the single-character images according to the labels of the training text images. On one hand, the training text image can be input into the text recognition model, first features of all characters are extracted through the first feature extractor, the features of the character contexts are associated through the context modeling module to obtain first feature data, the first character classifier is used for classifying according to the first feature data to determine a prediction result, and then the text recognition model is adjusted according to the prediction result and the labels of the training text image to train the text recognition model. On the other hand, the single-character image can be input into the single-character recognition model, the second feature extractor is used for extracting second feature data, the second character classifier is used for classifying according to the second feature data to determine a second prediction result, the single-character model adjustment amount is determined according to the second prediction result and the label of the single-character image, and the single-character recognition model is adjusted according to the single-character model adjustment amount. The single character classification adjustment amount can be determined according to the adjustment of the second character classifier, the corresponding weight information is determined, and then the first character classifier is adjusted according to the single character classification adjustment amount and the weight information to obtain the trained text recognition model. So as to recognize the text image to be recognized according to the trained text recognition model.
On the basis of the foregoing embodiment, an embodiment of the present application further provides a data processing method, corresponding to a text image recognition stage, where the method may be applied to a processing end, and the processing end may be understood as a device that performs character recognition on text image data according to a pre-trained text recognition model, and the method may acquire text image data to be analyzed and perform recognition to determine a recognition result, specifically, as shown in fig. 4A, the method includes:
step 402, text image data to be recognized is obtained. The text image data may include one character or a plurality of characters, and after training of the text recognition model is completed, the embodiment of the present application may recognize the text image data to be recognized according to the text recognition model, and the embodiment of the present application may configure a corresponding text image data acquisition manner according to a specific scene, specifically, as an optional embodiment, the acquiring the image data to be recognized includes at least one of the following steps: acquiring text image data related to target information; providing a data uploading interface to receive text image data based on the data uploading interface; and providing the interactive page to acquire the text image data based on the interactive page. The method and the device for processing the text image data can set the target information to obtain the text image data related to the target information and identify the text image data, can also provide a data uploading interface to obtain the text image data, and can also provide an interactive page so that a user can upload the text image data based on the interactive page.
After the text image data to be analyzed is obtained, in step 404, the text image data may be recognized according to a pre-trained text recognition model, and a recognition result may be determined, where the trained text recognition model and the single character recognition model share character information. In step 406, each text in the recognition result is output for presentation. It should be noted that in the embodiment of the present application, the single-character image may be determined by a synthesis method, and the embodiment is described only by taking an example of determining the single-character image by training a text image. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
According to the text recognition method and device, the text recognition model can be trained on the basis of the single character image and the training text image, the influence of factors such as unbalanced distribution of characters in the training text image on the recognition result can be weakened, and therefore the recognition accuracy of the text recognition model can be improved. Specifically, as an optional embodiment, the method further includes the step of training the text recognition model: acquiring a marked training text image, and extracting a single character in the training text image to obtain a single character image; configuring labels for the single character images according to the labels of the training text images; training a text recognition model according to the marked training text image; adjusting a single character recognition model according to the marked single character image, wherein the single character recognition model comprises a second feature extractor and a second character classifier; and determining single-character classification adjustment quantity according to the adjustment of the character information of the single-character image on the second character classifier, and adjusting the first character classifier according to the single-character classification adjustment quantity. As an alternative embodiment, the text recognition model includes: a first feature extractor for extracting a first feature of the text image data; the context modeling module is used for carrying out feature separation on the first features, determining character features and background features, enhancing the character features, and fusing the background features to obtain processed first features; performing context association on the processed first characteristic to obtain first characteristic data; and the first character classifier is used for classifying the characters according to the first characteristic data and determining a recognition result. The training process of the text recognition model in this embodiment is similar to the training process of the text recognition model in the above embodiments, and the specific implementation process may refer to the training process of the text recognition model in the above embodiments, and is not described here again.
In this embodiment of the present application, the single-character image may be adjusted in a manual adjustment manner to determine more accurate training data, and specifically, as an optional embodiment, the method further includes: providing an interactive page to show training text images and arranged single character images; and acquiring a character adjusting instruction based on the interactive page, and adjusting the training text image and/or the single character image to obtain an adjusted image. In the embodiment of the application, the single-character images are determined by extracting single characters in the training text images, and the single-character images and the training text images may have the problem of non-correspondence, so that the embodiment of the application can show the training text images and the arranged single-character images to a user, and receive corresponding character adjusting instructions to adjust the training text images or the single-character images.
The embodiment of the present application may also be applied to a scene in which a target object is searched based on a text image, and specifically, as an optional embodiment, the text image data includes a name of the target object, and the method further includes: and searching and outputting related data of the target object according to the name of the target object in the identification result. The embodiment of the application can be applied to a scene of searching a target object, the target object can be an enterprise name, a place name, a person name, an article name and the like, and related data can be searched according to corresponding text image data.
The method and the device for recognizing the text can adjust the recognition result in a manual adjustment mode, so that the text recognition model is adjusted, and the accuracy of the text recognition model is improved. Specifically, as an optional embodiment, the method further includes: receiving a result adjusting instruction of the recognition result, providing an editing control, and acquiring a text adjusting operation based on the editing control, wherein the text adjusting operation comprises at least one of a position adjusting operation, a text reprogramming operation, a text deleting operation, a text adding operation and a text error marking operation; adjusting the recognition result according to the text adjustment operation; and adjusting the text recognition model according to the adjusted recognition result. The embodiment of the present application may provide an interactive page, so as to display each text in the recognition result to the user in the interactive page, where the user may adjust each text in the recognition result in the interactive page, and when the user adjusts, the embodiment of the present application may configure an editing control in the interactive page, where the editing control may be understood as an editing area, where the recognition result may be displayed in the editing area, and the user may operate the text in the editing area to adjust the recognition result, specifically, as an optional embodiment, the obtaining of the text adjustment operation includes: determining position adjustment operation on the target text according to the operation of selecting the target text from the editing control and dragging the target text; determining a text reprogramming operation of the target text according to the operation of selecting the target text from the editing control and reprogramming the target text; determining a text deletion operation on the target text according to the operation of selecting the target text from the editing control and deleting the target text; determining a text adding operation of newly adding a target text according to the operation of selecting a target position and adding the target text in the editing control; and determining the text error marking operation on the target text according to the operation of selecting the target text from the editing control and adding the error mark. The method is applied to a processing terminal, and the processing terminal can interact with a terminal to receive operations of a terminal user in an interactive page, such as operations of receiving a target text selected by the user in the interactive page and dragging the target text, operations of selecting the target text and reprogramming the target text, operations of selecting the target text and deleting the target text, operations of selecting a target position and adding the target text at the target position, operations of selecting the target text and adding an error mark and the like, so as to form corresponding text adjustment operations, and further adjust a recognition result according to the text adjustment operations.
This application embodiment has better effect to the discernment of rarely used word, and partly rarely used word is because unusual, and the user does not know the rarely used word enough, consequently, for the convenience of the user knows the rarely used word, and this application embodiment can provide the explanation to the rarely used word, and is concrete, as an optional embodiment, each text in the output recognition result includes: classifying each text in the recognition result, and determining a rarely-used text and a common text; acquiring uncommon text description information of the uncommon text, wherein the uncommon text description information comprises at least one of annotation information and phonetic notation information: and outputting the common text, the uncommon text and the uncommon text description information. As shown in fig. 4B, the rarely-used text table including the rarely-used text can be preset at the server, the server acquires text image data from the terminal, and performs recognition through the trained text recognition model, and after the recognition result is obtained, the text in the recognition result can be classified according to the rarely-used text table and divided into the rarely-used text (such as cheng in fig. 4B) and the common text (such as hangzhong, kou, si and the like in fig. 4B). Therefore, rarely-used text description information of the rarely-used text, such as phonetic notation information (such as Chinese pinyin) or annotation information (such as explanation to the text) of the rarely-used text is obtained, the rarely-used text is transmitted to the terminal, the rarely-used text can be displayed at the terminal, and a user can know the rarely-used text according to the corresponding description, so that the recognition result can be adjusted and confirmed conveniently.
In order to facilitate the user to adjust the recognition result, the embodiment of the present application may display the recognition results with different recognition confidence levels in different display manners, so that the user can determine the character with the second confidence level and modify the character with the low confidence level, specifically, as an optional embodiment, the method further includes: acquiring the recognition confidence of the recognition result of each character, and determining the confidence level corresponding to the recognition result of the character according to the recognition confidence; and determining a display mode corresponding to the recognition result of the character according to the credibility level so as to display the recognition result of the character in the interactive page according to the display mode, and facilitating the adjustment of the recognition result by the user. According to the embodiment of the application, the credibility threshold corresponding to the credibility level can be preset, so that the credibility level corresponding to the recognition result is determined according to the credibility threshold and the recognition confidence of the recognition result, a corresponding display mode is further determined, and the display mode is issued, so that the recognition result can be displayed in the interactive page according to the display mode. For the recognition result with low confidence coefficient, the embodiment of the application can highlight the recognition result, so that the user can adjust and confirm the recognition result conveniently.
In addition, for the recognition result with low recognition confidence, the embodiment of the application can also interact with a human in a questioning mode, so that the accuracy of the recognition result is improved, and specifically, as an optional embodiment, the recognition confidence of the recognition result of each character is obtained; and when the recognition confidence of the recognition result is lower than a preset confidence threshold, outputting question information corresponding to the recognition result so as to determine the recognition result according to response information corresponding to the question information. The embodiment of the present application may compare the recognition confidence with a preset confidence threshold, so as to determine whether the recognition result is reliable, and for a recognition result with a low confidence, the embodiment of the present application may generate question information, for example, may extract corresponding text image data, and generate: whether the text is "Hangzhou weather 32 degrees"? Thereby obtaining corresponding response information. The user can confirm the question and can also adjust the recognition result in the question and call information to form response information, so that the accuracy of the recognition result is improved.
In this embodiment of the present application, training data uploaded by a user may also be received, and a personalized text recognition model is trained, specifically, as an optional embodiment, the method further includes: acquiring target training data of a target field, wherein the target training data comprises target training text images and marking data; extracting single characters in the target training text image, determining a target single character image, and configuring labels for the target single character image according to the label data; and training a target text recognition model according to the labeled target training text image and the labeled target single character image so as to recognize text image data of the target field according to the target text recognition model. The user can upload target training text images related to a target field, the target field can be fields such as education, medical treatment, E-commerce, meetings and finance, and can also be fields of other fields or subdivision, the configuration can be specifically carried out according to requirements, the embodiment can receive target training data, determine a target single character image according to the target training data, configure a corresponding label, and then train a target text recognition model according to the labeled target training text image and the labeled target single character image so as to recognize text image data of the target field according to the target text recognition model.
The implementation of this embodiment is similar to the implementation of the foregoing embodiment, and for the specific implementation, reference may be made to the specific implementation of the foregoing embodiment, which is not described herein again.
In the embodiment of the present application, before the text image data is identified, the single character image and the corresponding label may be determined according to the labeled training text image, and the text recognition model may be trained according to the labeled training text image and the labeled single character image, after the text recognition model is trained, the text image data to be identified may be obtained, and the text image data to be identified may be input into the pre-trained text recognition model to extract the features of each character in the text image data and associate the features of the context character of each character, thereby performing character classification and determining the recognition result. According to the method and the device, the text recognition model can be trained through the training text images and the single character images, so that the influence of factors such as unbalanced distribution of characters in the training text images on the recognition result can be weakened through the single characters which are evenly distributed in the single character images, and the recognition accuracy of the text recognition model can be improved.
The embodiment of the present application further provides a data processing method, which may be applied to a terminal, where the terminal may be understood as a device that uploads a text image to be analyzed and receives a recognition result, the terminal may interact with a server to upload the text image to be analyzed to the server, the server may be understood as a device that recognizes the text image based on a pre-trained text recognition model, and after determining the recognition result of the text image, the server may feed back the recognition result to the terminal, specifically, as shown in fig. 5, the method includes:
step 502, providing an interactive page, wherein the interactive page comprises a data uploading control.
Step 504, acquiring text image data to be recognized according to triggering of the data uploading control, and uploading the text image data to a server, so that the server can recognize the text image data according to a pre-trained text recognition model and determine a recognition result, wherein the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
And step 506, receiving and displaying the identification result.
The implementation manner of this embodiment is similar to that of the above embodiment, and the detailed implementation manner of the above embodiment may be referred to, and is not described herein again.
In the embodiment of the application, the server can provide an interactive page for the terminal to provide a text image recognition service for the terminal based on the interactive page, a user can trigger a data uploading control in the interactive page to upload text image data to be recognized to the server through the terminal, and the server can acquire the text image data to be recognized and input the text image data to be recognized into a pre-trained text recognition model to extract the characteristics of each character in the text image data and associate the characteristics of each character context character, so that character classification is performed and a recognition result is determined. After the server determines the recognition result, the recognition result can be fed back to the terminal, so that the recognition result can be displayed on the terminal.
The embodiment of the present application further provides a data processing method, which may be applied to a scene in which a text image containing ancient characters is recognized to determine a recognition result, specifically, as shown in fig. 6, the method includes:
step 602, obtaining an ancient character image to be identified.
And step 604, recognizing the ancient character image according to a pre-trained text recognition model, and determining a recognition result, wherein the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
And 606, outputting the identification result for display. As an alternative embodiment, the step 606 includes: and acquiring text description information of each text in the recognition result, wherein the text description information comprises at least one of annotation information and phonetic notation information. And outputting the text in the recognition result and the corresponding text description information for displaying.
The embodiment of the application has better effect to the discernment of rarely-used word, consequently can use in the discernment scene of ancient writing, however ancient writing is usually unusual, and the user knows the ancient writing inadequately, consequently, for the convenience of the user knows the ancient writing, and the embodiment of the application can provide the explanation to the ancient writing, like the phonetic notation information or the annotation information (like the explanation to this text, the modern text that this text corresponds etc.) of ancient writing text to the user knows the ancient writing text. In addition, in an optional embodiment, the ancient word texts in the identification result of the ancient words can be classified, for example, a rarely-used text table is configured in advance, the rarely-used text table is used for dividing the ancient word texts in the identification result into the rarely-used texts and common texts, the text description information of the rarely-used texts can be acquired and output for displaying, the number of texts needing to be described can be reduced, and the data processing efficiency is improved.
The implementation of this embodiment is similar to the implementation of the foregoing embodiment, and for the specific implementation, reference may be made to the specific implementation of the foregoing embodiment, which is not described herein again.
In the embodiment of the application, the ancient character image refers to an image containing ancient characters, the ancient characters refer to characters before the existing characters, such as the ancient Chinese characters, the ancient Chinese characters can be understood as Chinese characters used in China before the occurrence of simplified characters, wherein the simplified characters are simplified characters published by official authorities, and generally refer to characters published in 1956, namely a simplified character general table. The ancient Chinese characters can be understood as Chinese characters used in China before simplified characters, and can comprise original ancient Chinese characters, traditional Chinese characters, oracle-bone characters, gold characters, war characters, qin series characters, and ancient clerical script in the clerical change process from the early stage of Qin to Han dynasty. The image containing the ancient Chinese characters can be an image related to ancient medical treatment, ancient education and historical records, the embodiment of the application can determine the image of the ancient characters by shooting the book containing the ancient Chinese characters, and the image of the ancient characters is input into a pre-trained text recognition model to determine a recognition result. After the recognition result is determined, the ancient characters can be converted according to the recognition result, for example, the ancient characters are converted into the modern characters, and it should be noted that in the embodiment of the application, besides a one-to-one conversion mode of characters and characters, the embodiment of the application can also be translated according to semantics. In addition, the embodiment of the application can further translate the modern characters obtained by translation into texts of other languages, and can be configured specifically according to requirements.
The embodiment of the present application further provides a data processing method, which can be applied to a scene searched based on a text image, identify the text image, search relevant information according to an identification result, and feed back the relevant information to a user, and specifically, the method includes:
and providing an interactive page to acquire text image data to be recognized based on the interactive page.
Recognizing the text image data according to a pre-trained text recognition model, and determining a recognition result, wherein the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
And searching according to the identification result, and feeding back the search result so as to display the search result on the interactive page.
The implementation manner of this embodiment is similar to that of the above embodiment, and the detailed implementation manner of the above embodiment may be referred to, and is not described herein again.
In the embodiment of the application, an interactive page can be provided, a user uploads text image data through the interactive page, after the text image data is received, the text image data can be identified according to a pre-trained text identification model, an identification result is determined, and then searching can be carried out according to the identification result, so that corresponding information is searched and displayed in the interactive page. The embodiment of the application can be applied to a scene of searching based on the image, for example, the embodiment of the application can be applied to a scene of searching based on the text image containing the enterprise name, and the embodiment of the application can search the enterprise related information according to the enterprise name in the identification result and display the enterprise related information to the user.
The embodiment of the present application further provides a data processing method, which may be applied in an education scene, and may recognize a text image including education-related information according to a trained text recognition model to determine a recognition result, specifically, the method includes:
and acquiring an educational text image to be identified.
And identifying the education text image according to a pre-trained text identification model, and determining an identification result, wherein the trained text identification model and the single character identification model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
And outputting each text in the recognition result for displaying.
In the embodiment of the present application, the recognition results with different recognition confidence levels may be displayed in different display manners, so that a user can modify a character with a low confidence level conveniently, specifically, as an optional embodiment, the method further includes: acquiring the recognition confidence of the recognition result of each character, and determining the confidence level corresponding to the recognition result of the character according to the recognition confidence; and determining a display mode corresponding to the recognition result of the character according to the credibility level so as to display the recognition result of the character in the interactive page according to the display mode and facilitate the user to adjust the recognition result. According to the embodiment of the application, the credibility threshold corresponding to the credibility level can be preset, so that the credibility level corresponding to the recognition result is determined according to the credibility threshold and the recognition confidence degree of the recognition result, a corresponding display mode is further determined, and the display mode is issued, so that the recognition result can be displayed in the interactive page according to the display mode. For the recognition result with low confidence coefficient, the embodiment of the application can highlight the recognition result, so that the user can adjust and confirm the recognition result conveniently.
In addition, for the recognition result with low recognition confidence coefficient, the embodiment of the present application may further interact with a human in a questioning manner, so as to improve the accuracy of the recognition result, and specifically, as an optional embodiment, the recognition confidence coefficient of the recognition result of each character is obtained; and when the recognition confidence of the recognition result is lower than a preset confidence threshold, outputting question information corresponding to the recognition result so as to determine the recognition result according to response information corresponding to the question information. The embodiment of the application can compare the recognition confidence with the preset confidence threshold value so as to determine whether the recognition result is credible, and for the recognition result with low confidence, the embodiment of the application can generate question information, so that a user can confirm the question information and adjust the recognition result in the question information to form response information, thereby improving the accuracy of the recognition result.
The implementation manner of this embodiment is similar to that of the above embodiment, and the detailed implementation manner of the above embodiment may be referred to, and is not described herein again.
In the embodiment of the present application, the education text image may be understood as an image including education-related information, and the education-related information may be answer information of a student, student status information, information in a book, homework text of the student, and the like. The method and the device for recognizing the education text image can acquire the education text image, recognize the education text image according to the pre-trained text recognition model, determine the recognition result and output the recognition result for displaying. In addition, corresponding processing can be carried out according to the recognition result. For example, when the education-related information in the education text image is answer information of a student, the embodiment of the present application may perform discrimination according to the answer information in the recognition result, so as to determine a corresponding score of the student.
The embodiment of the present application further provides a data processing method, which may be applied in an e-commerce scene, and may identify a text image including commodity-related information according to a trained text identification model to determine an identification result, specifically, the method includes:
and acquiring a text image of the commodity to be identified.
And identifying the commodity text image according to a pre-trained text identification model, and determining an identification result, wherein the text identification model and the single character identification model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
And outputting each text in the recognition result for displaying.
The implementation of this embodiment is similar to the implementation of the foregoing embodiment, and for the specific implementation, reference may be made to the specific implementation of the foregoing embodiment, which is not described herein again.
In the embodiment of the present application, the product text image may be understood as an image including product-related information, and the product-related information may be information such as a product name, a production date, a product description, and a product publicity material. According to the embodiment of the application, the commodity text image can be obtained, the commodity text image is identified according to the pre-trained text identification model, the identification result is determined, and the identification result is output for displaying. In addition, corresponding processing can be carried out according to the identification result. For example, when the recognition result is a product name, a corresponding product may be searched for according to the product name, and the search result may be presented to the user.
The embodiment of the present application further provides a data processing method, which can be applied in a medical scenario, and can identify a text image containing medical-related information according to a trained text identification model to determine an identification result, specifically, the method includes:
and acquiring a medical text image to be identified.
And identifying the medical text image according to a pre-trained text identification model, and determining an identification result, wherein the text identification model and the single character identification model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
And outputting each text in the recognition result for displaying.
The implementation of this embodiment is similar to the implementation of the foregoing embodiment, and for the specific implementation, reference may be made to the specific implementation of the foregoing embodiment, which is not described herein again.
In the embodiment of the application, the medical text image can be understood as an image containing medical related information, and the medical related information can be information such as case information, medical documents, electronic medical records, medical image texts and the like. According to the embodiment of the application, the medical text image can be obtained, the medical text image is identified according to the pre-trained text identification model, the identification result is determined, and the identification result is output for displaying. In addition, corresponding processing can be carried out according to the identification result. For example, when the identification result is a name of a disease, information related to the disease may be searched and the search result may be presented to the user.
The embodiment of the present application further provides a data processing method, which can be applied in a conference scene, and can identify a text image containing conference related information according to a trained text identification model to determine an identification result, specifically, the method includes:
acquiring a conference text image to be identified;
and recognizing the conference text image according to a pre-trained text recognition model, and determining a recognition result, wherein the text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
And outputting each text in the recognition result for displaying.
The implementation of this embodiment is similar to the implementation of the foregoing embodiment, and for the specific implementation, reference may be made to the specific implementation of the foregoing embodiment, which is not described herein again.
In the embodiment of the application, the meeting text image can be understood as an image containing medical related information, and the meeting related information can be information such as meeting content. The method and the device for recognizing the conference text image can acquire the conference text image, recognize the conference text image according to the pre-trained text recognition model, determine the recognition result and output the recognition result for displaying. In addition, corresponding processing can be carried out according to the recognition result. For example, the conference content in the identification result may be recorded, and related conferences may be associated.
The embodiment of the present application further provides a data processing method, which can be applied in a traffic scene, and can identify a text image containing traffic-related information according to a trained text identification model to determine an identification result, and specifically, the method includes:
and acquiring a traffic text image to be identified.
And identifying the traffic text image according to a pre-trained text identification model, and determining an identification result, wherein the trained text identification model and the single character identification model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
And outputting each text in the recognition result for displaying.
The implementation manner of this embodiment is similar to that of the above embodiment, and the detailed implementation manner of the above embodiment may be referred to, and is not described herein again.
In the embodiment of the application, the traffic text image can be understood as an image containing traffic related information, and the traffic related information can be license plate information, billboard information and the like. The traffic text image can be obtained, the traffic text image is recognized according to the pre-trained text recognition model, the recognition result is determined, and the recognition result is output to be displayed. In addition, corresponding processing can be carried out according to the recognition result. For example, the violation information can be recorded according to the license plate information in the recognition result.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.
On the basis of the foregoing embodiment, the present embodiment further provides a data processing apparatus, and with reference to fig. 7, the data processing apparatus may specifically include the following modules:
a text image obtaining module 702, configured to obtain text image data to be identified;
and the text image recognition module 704 is used for recognizing the text image data according to a pre-trained text recognition model and determining a recognition result, wherein the text recognition model and the single character recognition model share character information. It should be noted that in the embodiment of the present application, the single-character image may be determined by a synthesis method, and the embodiment is described only by taking an example of determining the single-character image by training a text image. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
And the recognition result output module 706 is configured to output each text in the recognition result for display.
In summary, in the embodiment of the present application, before the text image data is identified, the single character image and the corresponding label may be determined according to the labeled training text image, and the text recognition model may be trained according to the labeled training text image and the labeled single character image, after the text recognition model is trained, the text image data to be identified may be obtained, and the text image data to be identified may be input into the pre-trained text recognition model, so as to extract the features of each character in the text image data, and associate the features of the context character of each character, thereby performing character classification, and determining the recognition result. According to the method and the device, the text recognition model can be trained through the training text images and the single character images, so that the influence of factors such as unbalanced distribution of characters in the training text images on the recognition result can be weakened through the single characters which are evenly distributed in the single character images, and the recognition accuracy of the text recognition model can be improved.
On the basis of the foregoing embodiment, the present embodiment further provides a data processing apparatus, and with reference to fig. 8, the data processing apparatus may specifically include the following modules:
and a single character image determining module 802, configured to obtain the labeled training text image.
And a single-character image labeling module 804, configured to determine and label a single-character image, where the single-character image is determined based on the training text image or determined by synthesis.
And a training text image training module 806, configured to train a text recognition model according to the labeled training text image, where the text recognition model includes a first feature extractor, a context modeling module, and a first character classifier.
And the single-character image training module 808 is configured to adjust a first character classifier of the text recognition model according to the labeled single-character image, so as to obtain a trained text recognition model.
In conclusion, in the embodiment of the application, the marked training text images can be obtained, the single characters in the training text images can be extracted, the single character images can be determined, and the single character images can also be synthesized in a synthesis mode; and labels are configured for the single character images. Then, on one hand, the training text image can be input into the text recognition model, first features of all characters are extracted through the first feature extractor, the features of the character contexts are associated through the context modeling module to obtain first feature data, the first character classifier is used for classifying according to the first feature data to determine a prediction result, and then the text recognition model is adjusted according to the prediction result and the label of the training text image to train the text recognition model. On the other hand, the first character classifier can be adjusted through the marked single character image, and a trained text recognition model is obtained. According to the method and the device, the influence of factors such as unbalanced distribution of characters in the training text image on the recognition result can be weakened through the single character image without the context characteristics, and therefore the recognition accuracy of the text recognition model can be improved.
On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, which may specifically include the following modules:
and the single character image acquisition processing module is used for acquiring the marked training text image and extracting a single character in the training text image to obtain the single character image. As an alternative embodiment, the processed first feature is input into the generator to determine character information and to determine a single character image.
And the single character label obtaining and processing module is used for configuring labels for the single character images according to the labels of the training text images.
And the training text image training processing module is used for training a text recognition model according to the marked training text image, and the text recognition model comprises a first feature extractor, a context modeling module and a first character classifier. As an alternative embodiment, the context modeling module is configured to: acquiring first features of each character extracted by a first feature extractor, performing feature separation, and determining character features and background features; enhancing character features, and fusing background features to obtain a processed first feature; and performing context association on the processed first features to obtain first feature data, so as to determine a prediction result according to the first feature data and the first character classifier, and adjusting the text recognition model according to the prediction result and the labels of the training text images. As an optional embodiment, the apparatus may further adjust the first feature extractor according to a difference between the first feature data and the second feature data, and further include: the device comprises a discriminator analysis processing module, a classifier and a classifier model, wherein the discriminator analysis processing module is used for acquiring first characteristic data of a target character in a training text image and second characteristic data in a single character image, inputting the first characteristic data and the second characteristic data into the discriminator, and grading the first characteristic data and the second characteristic data according to data types, wherein the data types comprise a single character type and a character line type; the first score adjusting and processing module is used for adjusting the first score of the first feature data according to the label of the training text image so as to reduce the influence of the background feature in the first feature data on the first score; and the difference information acquisition processing module is used for determining difference information according to the adjusted first score and the second score of the second characteristic data. And adjusting a first feature extractor of the text recognition model according to the difference information.
And the single character image training processing module is used for adjusting the single character recognition model according to the marked single character image, and the single character recognition model comprises a second characteristic extractor and a second character classifier.
And the single-classification adjustment quantity acquisition processing module is used for determining single-character classification adjustment quantity according to the adjustment of the second character classifier.
And the single-classification weight acquisition processing module is used for determining the weight information corresponding to the single-character classification adjustment amount.
And the first character classifier adjusting and processing module is used for adjusting the first character classifier according to the weight information and the single character classification adjustment amount to obtain the trained text recognition model.
In the embodiment of the application, the marked training text image can be obtained, a single character in the training text image is extracted, and a single character image is determined; and configuring labels for the single-character images according to the labels of the training text images. Then, on one hand, the training text image can be input into the text recognition model, first features of all characters are extracted through the first feature extractor, the features of the character contexts are associated through the context modeling module to obtain first feature data, the first character classifier is used for classifying according to the first feature data to determine a prediction result, and then the text recognition model is adjusted according to the prediction result and the label of the training text image to train the text recognition model. On the other hand, the single-character image can be input into the single-character recognition model, second feature data is extracted through the second feature extractor, classification is carried out according to the second feature data through the second character classifier, a second prediction result is determined, the single-character model adjustment amount is determined according to the second prediction result and the label of the single-character image, and the single-character recognition model is adjusted according to the single-character model adjustment amount. The single character classification adjustment amount can be determined according to adjustment of the second character classifier, corresponding weight information is determined, and then the first character classifier is adjusted according to the single character classification adjustment amount and the weight information to obtain a trained text recognition model. So as to recognize the text image to be recognized according to the trained text recognition model.
On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, as shown in fig. 9, which specifically includes the following modules:
an interactive page providing module 902, configured to provide an interactive page, where the interactive page includes a data upload control.
The text image uploading module 904 is configured to obtain text image data to be recognized according to triggering of the data uploading control, and upload the obtained text image data to the server, so that the server recognizes the text image data according to a pre-trained text recognition model, and determines a recognition result, where the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
And an identification result receiving module 906, configured to receive the identification result and display the identification result.
In the embodiment of the application, the server can provide an interactive page for the terminal to provide a text image recognition service for the terminal based on the interactive page, a user can trigger a data uploading control in the interactive page to upload text image data to be recognized to the server through the terminal, and the server can acquire the text image data to be recognized and input the text image data to be recognized into a pre-trained text recognition model to extract the characteristics of each character in the text image data and associate the characteristics of each character context character, so that character classification is performed and a recognition result is determined. After the server determines the recognition result, the recognition result can be fed back to the terminal, so that the recognition result can be displayed on the terminal.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied to a scene for recognizing an image of an ancient character, specifically,
the text image obtaining module 702 is specifically configured to obtain an ancient character image to be identified.
The text image recognition module 704 is specifically configured to recognize an ancient digital image according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and a single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image. The text image recognition module 704 is further configured to obtain text description information of each text in the recognition result, where the text description information includes at least one of annotation information and annotation information. And outputting the text in the recognition result and the corresponding text description information for displaying.
In the embodiment of the application, the ancient character image refers to an image containing ancient characters, the ancient characters refer to characters before the existing characters, such as ancient Chinese characters, the ancient Chinese characters can be understood as Chinese characters used by China before the appearance of simplified characters, wherein the simplified characters are simplified characters published by official authorities, and generally refer to characters of a simplified character general table published in 1956. The ancient Chinese characters can be understood as Chinese characters used in China before simplified characters, and can comprise original ancient Chinese characters, traditional Chinese characters, oracle characters, golden characters, warcountry characters, qin system characters, ancient clerical script in the clerical change process in the early stages from Qin to Han dynasties and the like. The image containing the ancient Chinese characters can be an image related to ancient medical treatment, ancient education and historical records, the embodiment of the application can determine the image of the ancient characters by shooting the book containing the ancient Chinese characters, and the image of the ancient characters is input into a pre-trained text recognition model to determine a recognition result. After the recognition result is determined, the ancient characters can be converted according to the recognition result, for example, the ancient characters are converted into the modern characters, and it should be noted that in the embodiment of the application, besides a one-to-one conversion mode of characters and characters, the embodiment of the application can also be translated according to semantics. In addition, the embodiment of the application can further translate the modern characters obtained by translation into texts of other languages, and can be configured specifically according to requirements.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied to scenes searched based on text images, specifically,
the text image obtaining module 702 is specifically configured to provide an interactive page, so as to obtain text image data to be identified based on the interactive page.
The text image recognition module 704 is specifically configured to recognize text image data according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and the single character recognition model share character information. And searching according to the identification result, and feeding back the search result so as to display the search result on the interactive page. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
In the embodiment of the application, an interactive page can be provided, a user uploads text image data through the interactive page, after receiving the text image data, the user can recognize the text image data according to a pre-trained text recognition model to determine a recognition result, and then, the user can search according to the recognition result, so that corresponding information is searched and displayed in the interactive page. The embodiment of the application can be applied to a scene of searching based on the image, for example, the embodiment of the application can be applied to a scene of searching based on the text image containing the enterprise name, and enterprise related information can be searched according to the enterprise name in the identification result and displayed to the user.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied to educational scenes, specifically,
the text image obtaining module 702 is specifically configured to obtain an educational text image to be recognized.
The text image recognition module 704 is specifically configured to recognize an education text image according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
The recognition result output module 706 is specifically configured to output each text in the recognition result for display.
In the embodiment of the present application, the education text image may be understood as an image including education-related information, and the education-related information may be answer information of a student, student status information of the student, information in a book, or the like. The method and the device for recognizing the education text image can acquire the education text image, recognize the education text image according to the pre-trained text recognition model, determine the recognition result and perform corresponding processing according to the recognition result. For example, when the education related information in the education text image is answer information of a student, the embodiment of the application may perform discrimination according to the answer information in the recognition result, so as to determine a corresponding score of the student.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied in e-commerce scenes, specifically,
the text image obtaining module 702 is specifically configured to obtain a text image of a commodity to be identified.
The text image recognition module 704 is specifically configured to recognize the commodity text image according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
The recognition result output module 706 is specifically configured to output each text in the recognition result for display.
In the embodiment of the present application, the product text image may be understood as an image including product related information, and the product related information may be information such as a product name, a date of manufacture, and a product description. According to the embodiment of the application, the commodity text image can be obtained, the commodity text image is identified according to the pre-trained text identification model, the identification result is determined, and corresponding processing is carried out according to the identification result. For example, when the recognition result is a product name, a corresponding product may be searched for according to the product name, and the search result may be presented to the user.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied in a medical scene, specifically,
the text image obtaining module 702 is specifically configured to obtain a medical text image to be identified.
The text image recognition module 704 is specifically configured to recognize the medical text image according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and the single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
The recognition result output module 706 is specifically configured to output each text in the recognition result for display.
In the embodiment of the application, the medical text image can be understood as an image containing medical related information, and the medical related information can be information such as case information and medical documents. According to the embodiment of the application, the medical text image can be obtained, the medical text image is identified according to the pre-trained text identification model, the identification result is determined, and corresponding processing is carried out according to the identification result. For example, when the identification result is a name of a disease, information related to the disease may be searched and the search result may be presented to the user.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied in a conference scene, specifically,
the text image obtaining module 702 is specifically configured to obtain a conference text image to be identified.
The text image recognition module 704 is specifically configured to recognize a conference text image according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and a single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single character recognition model, and the single character recognition model is trained according to the single character image.
The recognition result output module 706 is specifically configured to output each text in the recognition result for display.
In the embodiment of the application, the conference text image can be understood as an image containing medical related information, and the conference related information can be information such as conference content. The method and the device for recognizing the conference text image can acquire the conference text image, recognize the conference text image according to the pre-trained text recognition model, determine the recognition result, and perform corresponding processing according to the recognition result. For example, the conference content may be recorded according to the conference content in the identification result, and the related conference may be associated.
On the basis of the above embodiments, the present embodiment further provides a data processing apparatus, which can be applied in traffic scenes, specifically,
the text image obtaining module 702 is specifically configured to obtain a traffic text image to be identified.
The text image recognition module 704 is specifically configured to recognize a traffic text image according to a pre-trained text recognition model, and determine a recognition result, where the trained text recognition model and a single character recognition model share character information. The trained text recognition model is trained according to a training text image in a training stage, and shares a character classifier with a single-character recognition model in the training stage so as to perform auxiliary training on the text recognition model according to the single-character recognition model, and the single-character recognition model is trained according to the single-character image.
The recognition result output module 706 is specifically configured to output each text in the recognition result for display.
In the embodiment of the application, the traffic text image can be understood as an image containing traffic related information, and the traffic related information can be license plate information, billboard information and the like. The traffic text image can be obtained, the traffic text image is recognized according to the pre-trained text recognition model, the recognition result is determined, and corresponding processing is carried out according to the recognition result. For example, the violation information can be recorded according to the license plate information in the recognition result.
The present application further provides a non-transitory, readable storage medium, where one or more modules (programs) are stored, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of method steps in this application.
Embodiments of the present application provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an electronic device to perform the methods as described in one or more of the above embodiments. In the embodiment of the application, the electronic device includes a server, a terminal device and other devices.
Embodiments of the present disclosure may be implemented as an apparatus, which may include servers (clusters), terminals, etc. electronic devices, using any suitable hardware, firmware, software, or any combination thereof, for a desired configuration. Fig. 10 schematically illustrates an example apparatus 1000 that may be used to implement various embodiments described herein.
For one embodiment, fig. 10 illustrates an example apparatus 1000 having one or more processors 1002, a control module (chipset) 1004 coupled to at least one of the processor(s) 1002, memory 1006 coupled to the control module 1004, non-volatile memory (NVM)/storage 1008 coupled to the control module 1004, one or more input/output devices 1010 coupled to the control module 1004, and a network interface 1012 coupled to the control module 1004.
The processor 1002 may include one or more single-core or multi-core processors, and the processor 1002 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1000 can be used as a server, a terminal, or the like in the embodiments of the present application.
In some embodiments, the apparatus 1000 may include one or more computer-readable media (e.g., the memory 1006 or the NVM/storage 1008) having instructions 1014 and one or more processors 1002 that, in conjunction with the one or more computer-readable media, are configured to execute the instructions 1014 to implement modules to perform the actions described in this disclosure.
For one embodiment, control module 1004 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1002 and/or any suitable device or component in communication with control module 1004.
The control module 1004 may include a memory controller module to provide an interface to the memory 1006. The memory controller module may be a hardware module, a software module, and/or a firmware module.
Memory 1006 may be used, for example, to load and store data and/or instructions 1014 for device 1000. For one embodiment, memory 1006 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the memory 1006 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).
For one embodiment, the control module 1004 may include one or more input/output controllers to provide an interface to the NVM/storage 1008 and input/output device(s) 1010.
For example, NVM/storage 1008 may be used to store data and/or instructions 1014. NVM/storage 1008 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more hard disk drive(s) (HDD (s)), one or more Compact Disc (CD) drive(s), and/or one or more Digital Versatile Disc (DVD) drive (s)).
NVM/storage 1008 may include storage resources that are part of the device on which apparatus 1000 is installed or may be accessible by the device and need not be part of the device. For example, NVM/storage 1008 may be accessed over a network via input/output device(s) 1010.
Input/output device(s) 1010 may provide an interface for apparatus 1000 to communicate with any other suitable device, input/output devices 1010 may include communication components, audio components, sensor components, and so forth. Network interface 1012 may provide an interface for device 1000 to communicate over one or more networks, and device 1000 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as to access a communication standard-based wireless network, such as WiFi, 2G, 3G, 4G, 5G, etc., or a combination thereof.
For one embodiment, at least one of the processor(s) 1002 may be packaged together with logic for one or more controllers of control module 1004 (e.g., memory controller module). For one embodiment, at least one of the processor(s) 1002 may be packaged together with logic for one or more controller(s) of control module 1004 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1002 may be integrated on the same die with the logic of one or more controllers of the control module 1004. For one embodiment, at least one of the processor(s) 1002 may be integrated on the same die with logic for one or more controller(s) of control module 1004 to form a system on chip (SoC).
In various embodiments, the apparatus 1000 may be, but is not limited to: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, the apparatus 1000 may have more or fewer components and/or different architectures. For example, in some embodiments, device 1000 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.
The detection device can adopt a main control chip as a processor or a control module, sensor data, position information and the like are stored in a memory or an NVM/storage device, a sensor group can be used as an input/output device, and a communication interface can comprise a network interface.
An embodiment of the present application further provides an electronic device, including: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform a method as described in one or more of the embodiments of the application.
Embodiments of the present application also provide one or more machine-readable media having executable code stored thereon that, when executed, cause a processor to perform a method as described in one or more of the embodiments of the present application.
For the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The embodiments in the present specification are all described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same and similar between the embodiments may be referred to each other.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.
The foregoing detailed description has provided a data processing method, a data processing apparatus, an electronic device, and a storage medium, and the principles and embodiments of the present application are described herein using specific examples, which are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A method of data processing, the method comprising:
acquiring text image data to be identified;
and recognizing the text image data according to a pre-trained text recognition model, and determining a recognition result, wherein the trained text recognition model and the single character recognition model share character information.
And outputting each text in the recognition result for displaying.
2. The method of claim 1, wherein the text image data contains a name of a target object, the method further comprising:
and searching and outputting related data of the target object according to the name of the target object in the identification result.
3. The method of claim 1, further comprising:
receiving a result adjusting instruction of the recognition result, providing an editing control, and acquiring a text adjusting operation based on the editing control, wherein the text adjusting operation comprises at least one of a position adjusting operation, a text reprogramming operation, a text deleting operation, a text adding operation and a text error marking operation;
adjusting the recognition result according to the text adjustment operation;
and adjusting the text recognition model according to the adjusted recognition result.
4. The method of claim 1, wherein outputting each text in the recognition result comprises:
classifying all texts in the recognition result, and determining uncommon texts and common texts;
acquiring uncommon text description information of the uncommon text, wherein the uncommon text description information comprises at least one of annotation information and phonetic notation information;
and outputting the common text, the uncommon text and the uncommon text description information.
5. The method of claim 1, further comprising:
acquiring the recognition confidence of the recognition result of each character;
and when the recognition confidence of the recognition result is lower than a preset confidence threshold, outputting question information corresponding to the recognition result so as to determine the recognition result according to response information corresponding to the question information.
6. The method of claim 1, further comprising:
acquiring the recognition confidence of the recognition result of each character, and determining the confidence level corresponding to the recognition result of the character according to the recognition confidence;
and determining a display mode corresponding to the recognition result of the character according to the credibility level so as to display the recognition result of the character in the interactive page according to the display mode, and facilitating the adjustment of the recognition result by the user.
7. The method of claim 1, wherein the text recognition model comprises:
a first feature extractor for extracting a first feature of the text image data;
the context modeling module is used for carrying out feature separation on the first features, determining character features and background features, enhancing the character features, and fusing the background features to obtain processed first features; performing context association on the processed first characteristic to obtain first characteristic data;
and the first character classifier is used for classifying the characters according to the first characteristic data and determining the recognition result.
8. The method of claim 7, further comprising the step of training a text recognition model:
acquiring a marked training text image;
determining a single-character image and marking the single-character image, wherein the single-character image is determined based on a training text image or is determined by synthesis;
training a text recognition model according to the marked training text image;
adjusting a single character recognition model according to the marked single character image, wherein the single character recognition model comprises a second feature extractor and a second character classifier;
and determining single-character classification adjustment quantity according to the adjustment of the character information of the single-character image on the second character classifier, and adjusting the first character classifier according to the single-character classification adjustment quantity.
9. A method of data processing, the method comprising:
providing an interactive page, wherein the interactive page comprises a data uploading control;
acquiring text image data to be recognized according to triggering of a data uploading control, uploading the text image data to a server, so that the server can recognize the text image data according to a pre-trained text recognition model and determine a recognition result, wherein the trained text recognition model and a single character recognition model share character information;
and receiving and displaying the identification result.
10. A method of data processing, the method comprising:
acquiring an ancient character image to be identified;
recognizing the ancient text image according to a pre-trained text recognition model, and determining a recognition result, wherein the trained text recognition model and a single character recognition model share character information;
and outputting the recognition result for display.
11. An electronic device, comprising: a processor; and
a memory having executable code stored thereon that, when executed, causes the processor to perform the method of one or more of claims 1-10.
12. One or more machine-readable media having executable code stored thereon that, when executed, causes a processor to perform the method of one or more of claims 1-10.
CN202110770563.9A 2021-07-07 2021-07-07 Data processing method and device, electronic equipment and storage medium Pending CN115661846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110770563.9A CN115661846A (en) 2021-07-07 2021-07-07 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110770563.9A CN115661846A (en) 2021-07-07 2021-07-07 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115661846A true CN115661846A (en) 2023-01-31

Family

ID=85015014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110770563.9A Pending CN115661846A (en) 2021-07-07 2021-07-07 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115661846A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115801483A (en) * 2023-02-10 2023-03-14 北京京能高安屯燃气热电有限责任公司 Information sharing processing method and system
CN116821408A (en) * 2023-08-29 2023-09-29 南京航空航天大学 Multi-task consistency countermeasure retrieval method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115801483A (en) * 2023-02-10 2023-03-14 北京京能高安屯燃气热电有限责任公司 Information sharing processing method and system
CN116821408A (en) * 2023-08-29 2023-09-29 南京航空航天大学 Multi-task consistency countermeasure retrieval method and system
CN116821408B (en) * 2023-08-29 2023-12-01 南京航空航天大学 Multi-task consistency countermeasure retrieval method and system

Similar Documents

Publication Publication Date Title
US11645826B2 (en) Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks
Mathew et al. Docvqa: A dataset for vqa on document images
CN111476284B (en) Image recognition model training and image recognition method and device and electronic equipment
CN111191078B (en) Video information processing method and device based on video information processing model
CN111738251B (en) Optical character recognition method and device fused with language model and electronic equipment
US10915788B2 (en) Optical character recognition using end-to-end deep learning
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN111259148A (en) Information processing method, device and storage medium
CN113011186A (en) Named entity recognition method, device, equipment and computer readable storage medium
CN112819686A (en) Image style processing method and device based on artificial intelligence and electronic equipment
Li et al. Publication date estimation for printed historical documents using convolutional neural networks
CN115661846A (en) Data processing method and device, electronic equipment and storage medium
CN104182381A (en) character input method and system
CN113901954A (en) Document layout identification method and device, electronic equipment and storage medium
Patel et al. Dynamic lexicon generation for natural scene images
Inunganbi et al. Handwritten Meitei Mayek recognition using three‐channel convolution neural network of gradients and gray
Tymoshenko et al. Real-Time Ukrainian Text Recognition and Voicing.
CN114661951A (en) Video processing method and device, computer equipment and storage medium
CN114357206A (en) Education video color subtitle generation method and system based on semantic analysis
Jishan et al. Hybrid deep neural network for bangla automated image descriptor
CN115130437B (en) Intelligent document filling method and device and storage medium
CN116977992A (en) Text information identification method, apparatus, computer device and storage medium
CN110825874A (en) Chinese text classification method and device and computer readable storage medium
US20230138491A1 (en) Continuous learning for document processing and analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240307

Address after: 51 Belarusian Pasha Road, Singapore, Lai Zan Da Building 1 # 03-06, Postal Code 189554

Applicant after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Applicant before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore