CN114419636A

CN114419636A - Text recognition method, device, equipment and storage medium

Info

Publication number: CN114419636A
Application number: CN202210023777.4A
Authority: CN
Inventors: 乔美娜; 刘珊珊; 吴亮; 吕鹏原; 章成全; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-29

Abstract

The disclosure provides a text recognition method, a text recognition device, text recognition equipment and a storage medium, relates to the technical field of artificial intelligence, particularly the technical field of deep learning and computer vision, and can be used in scenes such as OCR (optical character recognition). The specific implementation scheme is as follows: acquiring a first image, wherein the first image comprises N types of characters; determining a plurality of sub-images in the first image, and determining the character type of each sub-image, wherein each sub-image comprises a type of character; for each sub-image, performing character recognition processing on the sub-image through a character recognition model corresponding to the character type to obtain a character set corresponding to the sub-image, wherein the character set comprises at least one character; and determining text information corresponding to the first image according to the character sets corresponding to the sub-images. The scheme of the text recognition method and the text recognition device can improve the accuracy of the text recognition result.

Description

Text recognition method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning and computer vision technologies, and in particular, to a text recognition method, apparatus, device, and storage medium, which can be applied to scenes such as OCR.

Background

Optical Character Recognition (OCR) is to recognize characters in an image by an Optical technique to obtain text information corresponding to the image. Currently, OCR is widely used in a variety of fields such as medical care, retail, education, and the like.

In some scenarios, different types of characters may be present in the image, for example, both printed and handwritten characters, as well as, for example, characters of multiple fonts, and also, for example, characters of different languages.

In a scene that an image includes various types of characters, how to accurately perform text recognition is a technical problem to be solved urgently.

Disclosure of Invention

The disclosure provides a text recognition method, a text recognition device, a text recognition equipment and a storage medium.

According to a first aspect of the present disclosure, there is provided a text recognition method including:

acquiring a first image, wherein the first image comprises N types of characters, and N is an integer greater than 1;

determining a plurality of sub-images in the first image, and determining the character type of each sub-image, wherein each sub-image comprises a type of character;

for each sub-image, performing character recognition processing on the sub-image through a character recognition model corresponding to the character type to obtain a character set corresponding to the sub-image, wherein the character set comprises at least one character;

and determining text information corresponding to the first image according to the character sets corresponding to the sub-images.

According to a second aspect of the present disclosure, there is provided a text recognition apparatus including:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first image, the first image comprises N types of characters, and N is an integer greater than 1;

the first determining module is used for determining a plurality of sub-images in the first image and determining the character types of the sub-images, wherein each sub-image comprises a type of character;

the processing module is used for carrying out character recognition processing on the sub-images through the character recognition models corresponding to the character types aiming at each sub-image to obtain a character set corresponding to the sub-image, wherein the character set comprises at least one character;

and the second determining module is used for determining the text information corresponding to the first image according to the character sets corresponding to the sub-images respectively.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic diagram of a text recognition scenario provided by an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a text recognition method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an image segmentation process provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image segmentation and stitching process provided by an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of another text recognition method provided in the embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a text recognition process provided by an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of another text recognition method provided in the embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a text Recognition method, apparatus, device, and storage medium, which are applied to the field of artificial intelligence, specifically, the technical field of deep learning and computer vision, and can be applied to scenes such as Optical Character Recognition (OCR), and the like, and can improve the accuracy of a text Recognition result.

In order to facilitate understanding of the technical solution of the present disclosure, a text recognition scenario is first described with reference to fig. 1.

Fig. 1 is a schematic diagram of a text recognition scenario provided in an embodiment of the present disclosure. As shown in fig. 1, the scenario relates to a text recognition apparatus. The text recognition device has a function of recognizing characters in an image. Referring to fig. 1, the input of the text recognition device is a first image and the output of the text recognition device is characters in the first image.

In the text recognition scenario of fig. 1, OCR technology is applied. OCR technology refers to a process in which an electronic device scans characters on paper, determines the shape of the characters by detecting dark and light patterns, and then translates the shape into computer text using character recognition methods.

In the embodiment of the present disclosure, the text recognition device may be in the form of software and/or hardware. The text recognition device may have different product forms according to different practical application scenarios. In practical application, the OCR technology can be applied to various application scenarios such as medical treatment, education, translation and the like. For ease of understanding, several possible application scenarios are exemplified below.

Illustratively, when applied to the field of education, the text recognition apparatus in FIG. 1 may be an intelligent scoring device, or an apparatus integrated into an intelligent scoring device. The intelligent paper marking device scans paper test paper to obtain a test paper image. And identifying characters in the test paper image to obtain text information corresponding to the test paper image. Further, it is possible to determine whether or not the answer to the test paper is correct based on the text information obtained by the recognition.

For example, when applied to the translation field, the text recognition apparatus in fig. 1 may be a translation device, or an apparatus integrated into a translation device. The translation equipment scans the text lines on the paper to obtain an image. And identifying characters in the image to obtain text information corresponding to the image. Further, translation processing may be performed based on the recognized text information. And displaying or voice playing the translation result.

For example, when applied to the medical field, the text recognition apparatus in fig. 1 may be an electronic medical record generation device, or an apparatus integrated into an electronic medical record generation device. The electronic medical record generation equipment scans the paper medical record to obtain a medical record image. And identifying characters in the medical record image to obtain text information corresponding to the medical record image. And generating the electronic medical record based on the recognized text information.

It should be noted that the above several application scenarios are only examples, and the embodiment of the present disclosure is not limited to the application scenarios.

With the development of the deep learning technology, a character recognition model can be obtained by training in advance based on the deep learning technology, and the character recognition model is deployed in a text recognition device. In this way, the character recognition model can be used for recognizing characters in the image to obtain text information.

In some complex scenes, different types of characters may be present in the image. For example, both typographic and handwritten characters may be present in the image. For another example, characters of a plurality of fonts (e.g., a song style, a regular script, an clerical script, etc.) may be simultaneously present in the image. Also for example, characters in multiple languages (e.g., Chinese, Japanese, English, etc.) may be present in the image at the same time.

For the complex scene, when the character recognition model is used for recognizing characters of different types, the recognition accuracy is not high. In the related technology, the character recognition model can be trained in a targeted manner by increasing sample data according to the recognition requirements of the actual scene, so that the accuracy of the text recognition result is improved.

For example, it is assumed that a character recognition model is obtained by training using a character sample of a Chinese print, and the character recognition model has a recognition capability for characters of the Chinese print. When a certain application scenario may involve both a Chinese print and a Chinese handwriting, the character samples of the Chinese handwriting need to be added on the basis of the original training data. And training the character recognition model again by using the supplemented training data set, so that the character recognition model has the recognition capability on the Chinese print characters and the Chinese handwriting characters at the same time. Similarly, when a certain application scenario may involve a chinese print, a chinese handwriting, an english print, and an english handwriting at the same time, it is necessary to add character samples of the english print and the english handwriting in the training data set. And then, the supplemented training data set is used for retraining the character recognition model, so that the character recognition model has the recognition capability on Chinese print characters, Chinese handwritten characters, English print characters and English handwritten characters at the same time.

In the above process, for different application scenarios, the training data set needs to be supplemented and refined, and the character recognition model needs to be supplemented and trained. Thus, as the complexity of the scene increases, the recognition accuracy of the character recognition model for each type of characters is not high due to the fact that the character recognition model needs to take into account the recognition accuracy of various types of characters, and the accuracy of the text recognition result is low.

In order to solve the above technical problem, in the technical solution of the present disclosure, when the first image includes multiple types of characters, multiple sub-images may be determined in the first image, and the character type of each sub-image may be determined, where each sub-image includes one type of character. And aiming at each sub-image, carrying out character recognition processing on the sub-image through a character recognition model corresponding to the character type of the sub-image to obtain a character set corresponding to the sub-image. And then, obtaining text information corresponding to the first image according to the character set corresponding to the plurality of sub-images.

In the process, the recognition process of the first image is converted into the recognition process of the plurality of sub-images, and each sub-image only comprises one type of character, so that the character recognition model used in the method only needs to have the recognition capability of a single type of character, and therefore the method and the device can improve the accuracy of the text recognition result.

The technical scheme of the disclosure is described by combining specific embodiments. Several specific embodiments may be combined with each other below, and corresponding or similar concepts or processes may not be described in detail in some embodiments.

Fig. 2 is a schematic flowchart of a text recognition method according to an embodiment of the present disclosure. As shown in fig. 2, the method of the present embodiment includes:

s201: acquiring a first image, wherein the first image comprises N types of characters, and N is an integer greater than 1.

The first image is an image to be identified. The first image is obtained by scanning or shooting the current text to be recognized. It should be noted that one or more lines of characters may be included in the first image. In some scenes, a line of characters is used as a unit for recognition, and then the first image is an image corresponding to a line of characters to be recognized currently. In other examples, the recognition is performed in units of multiple lines of characters, and the first image is an image corresponding to multiple lines of characters to be currently recognized.

In the disclosed embodiment, the characters include but are not limited to: words, letters, numbers, operator symbols, punctuation marks, other symbols, and the like.

In this embodiment, the first image includes N types of characters, where N is an integer greater than 1. It should be understood that the type of character may be divided in a variety of ways. For example, the division is made by printing and handwriting, and the division can be made into a print type and a handwriting type. For another example, the language division may be classified into a chinese type, an english type, a japanese type, and the like. For example, the font type may be divided into a regular script type, an clerk type, and a song style. The embodiment of the present disclosure does not limit the classification manner of the characters. The above-mentioned several classification approaches can also be combined with each other.

Optionally, the first image includes two types of characters, namely a print type and a handwriting type. That is, a portion of the characters in the first image are of a print type and another portion of the characters are of a handwriting type. For example, in the first image shown in FIG. 1, the character "name of your memorabilia book is: "is a print type," I love youth "is a handwriting type. "is classified into" print type, "top page" handwriting type, "title page," print type, "normal industry" handwriting type, "and" three parts. "is of the print type.

S202: a plurality of sub-images are determined in the first image, and the character type of each sub-image is determined, wherein each sub-image comprises one type of characters.

In the present embodiment, only one type of character is included in each sub-image. The character type corresponding to each sub-image is the type of the character included in the sub-image. For example, if a character of a print type is included in a certain sub-image, the character type of the sub-image is the print type. If the character in the handwriting type is included in a certain sub-image, the character type of the sub-image is the handwriting type.

In a possible implementation manner, the first image may be segmented to obtain a plurality of sub-images, and each sub-image includes one type of characters.

As an example, fig. 3 is a schematic diagram of an image segmentation process provided by the embodiment of the present disclosure. As shown in fig. 3, adjacent characters of the same type in the first image are segmented into the same sub-image. Thus, the first image is divided into 7 sub-images. Wherein the content of the first and second substances,

the character in the sub-image 1 is "the name of your memorabilia album: ", all are print types;

the characters in the subimage 2 are 'I love youth', and are all in handwriting type;

the characters in the subimages 3 are divided into all types of printing forms;

characters in the sub-image 4 are 'home pages', and all characters are in handwriting type;

the characters in the sub-image 5 are "facing page", "all are print type";

the characters in the sub-image 6 are 'front page', and all characters are in handwriting type;

the characters in the sub-image 7 are "three parts". All are of the print type.

In this example, the character type of the sub-image 1 is a print style, the character type of the sub-image 2 is a handwriting type, the character type of the sub-image 3 is a print style, the character type of the sub-image 4 is a handwriting type, the character type of the sub-image 5 is a print style, the character type of the sub-image 6 is a handwriting type, and the character type of the sub-image 7 is a print style.

In another possible implementation, the first image may be segmented and stitched to obtain N sub-images, each sub-image includes one type of character, and the types of characters included in different sub-images are different.

As an example, fig. 4 is a schematic diagram of an image segmentation and stitching process provided by the embodiment of the present disclosure. As shown in fig. 4, on the basis of the example shown in fig. 3, it is possible to stitch both print-type sub-images and handwriting-type sub-images. Thus, after processing the first image, 2 sub-images are obtained, wherein,

the character in the sub-image 1 is "the name of your memorabilia album: the method is divided into three parts, namely a flyleaf and a flyleaf. ", all are print types;

the characters in the subimage 2 are ' I love youth ' front page ', and are all in handwriting type.

In this example, the character type of sub-image 1 is a print type, and the character type of sub-image 2 is a handwriting type.

It should be noted that, the embodiment of the present disclosure does not limit the manner of determining the plurality of sub-images in the first image. Fig. 3 and 4 illustrate only two possible processing results.

S203: and for each sub-image, performing character recognition processing on the sub-image through a character recognition model corresponding to the character type to obtain a character set corresponding to the sub-image, wherein the character set comprises at least one character.

In this embodiment, each character type corresponds to one character recognition model, and the character recognition model is used for recognizing the character of the corresponding type. Assuming that the first image includes characters of both print and handwriting types, two character recognition models may be employed in the present embodiment. The character recognition model A is used for recognizing characters of print types, and the character recognition model B is used for recognizing characters of handwriting types.

It should be noted that the character recognition model in the embodiment of the present disclosure is trained in advance. Illustratively, the character recognition model A is trained using character samples of print type. The character recognition model B is obtained by training with character samples of handwriting types. The structure and training mode of the character recognition model are not limited in this embodiment. It should be understood that, since each character recognition model in the present embodiment only needs to recognize one type of character, and does not need to consider other types of characters, it is possible to ensure higher recognition accuracy.

For example, in the example shown in fig. 3, character recognition processing may be performed on the sub-image 1 through the character recognition model a to obtain a character set corresponding to the sub-image 1; carrying out character recognition processing on the sub-image 2 through a character recognition model B to obtain a character set corresponding to the sub-image 2; carrying out character recognition processing on the sub-image 3 through a character recognition model A to obtain a character set corresponding to the sub-image 3; carrying out character recognition processing on the sub-image 4 through a character recognition model B to obtain a character set corresponding to the sub-image 4; carrying out character recognition processing on the sub-image 5 through a character recognition model A to obtain a character set corresponding to the sub-image 5; carrying out character recognition processing on the sub-image 6 through a character recognition model B to obtain a character set corresponding to the sub-image 6; and performing character recognition processing on the sub-image 7 through the character recognition model A to obtain a character set corresponding to the sub-image 7.

For example, in the example shown in fig. 4, character recognition processing may be performed on the sub-image 1 through the character recognition model a to obtain a character set corresponding to the sub-image 1; and performing character recognition processing on the sub-image 2 through the character recognition model B to obtain a character set corresponding to the sub-image 2.

It is understood that the number of sub-images in the example shown in fig. 4 is smaller than that in the example shown in fig. 3, and therefore, when the sub-images are subjected to the recognition processing in S203, the number of times of the sub-image recognition processing can be reduced. For example, based on the example shown in fig. 4, it is necessary to perform recognition processing on two sub-images, that is, the number of times of sub-image recognition processing is 2. Based on the example shown in fig. 3, it is necessary to perform the recognition processing on 7 sub-images, that is, the number of times of the sub-image recognition processing is 7. Therefore, compared to the image processing method shown in fig. 3, the image processing method shown in fig. 4 can reduce the number of times of sub-image recognition processing and improve the recognition processing efficiency.

S204: and determining text information corresponding to the first image according to the character sets corresponding to the sub-images.

In this embodiment, the character set corresponding to each sub-image includes characters recognized by the character recognition model from the sub-image. In S203, after the character set corresponding to each sub-image is obtained, the text information corresponding to the first image can be determined according to the character set corresponding to each sub-image.

The text recognition method provided by the embodiment comprises the following steps: acquiring a first image, wherein the first image comprises N types of characters, determining a plurality of sub-images in the first image, determining the character type of each sub-image, each sub-image comprises one type of character, and performing character recognition processing on each sub-image through a character recognition model corresponding to the character type of the sub-image to obtain a character set corresponding to the sub-image, wherein the character set comprises at least one character; and determining text information corresponding to the first image according to the character sets corresponding to the sub-images. In the process, the recognition process of the first image is converted into the recognition process of the plurality of sub-images, and each sub-image only comprises one type of character, so that the character recognition model used in the embodiment only needs to have the recognition capability of a single type of character, and the accuracy of the text recognition result can be improved.

On the basis of the above embodiments, the following describes the technical solution of the present disclosure in more detail with reference to a specific example.

Fig. 5 is a schematic flowchart of another text recognition method according to an embodiment of the present disclosure. As shown in fig. 5, the method of the present embodiment includes:

s501: acquiring a first image, wherein the first image comprises N types of characters, and N is an integer greater than 1.

S502: and carrying out character detection processing on the first image to obtain the area and the type occupied by each character in the first image.

In a possible implementation manner, a character detection model may be obtained by training in advance, and the character detection model is used to detect character regions in an image and the type of a character in each character region. Thus, the first image can be input into the character detection model, and the area occupied by each character and the type of each character in the first image can be obtained.

S503: and according to the area occupied by each character in the first image, carrying out segmentation processing on the first image to obtain a plurality of character images.

That is, the area occupied by each character in the first image is cut out to obtain a plurality of character images. Each character image includes a character. If the first image includes P characters, the segmentation process is performed to obtain P character images.

Further, after the first image is segmented to obtain a plurality of character images, the plurality of character images may be subjected to stitching processing according to the type of each character in the first image to obtain N sub-images. A possible way of splicing can be seen in S504 to S506.

S504: and according to the type of the characters in each character image, grouping the character images to obtain N groups of character images, wherein the types of the characters in one group of character images are the same.

It should be understood that since each character image includes one character, the type of the character in the character image may be determined as the type of the character image. For example, if the character in the character image is a print type, the type of the character image is a print type; if the character in the character image is of the handwriting type, the type of the character image is of the handwriting type. Thus, the character images of the same type can be divided into one group, and N groups of character images can be obtained.

S505: and respectively splicing the N groups of character images to obtain N sub-images.

Illustratively, the character images of the print type are divided into a first group of character images, and the character images of the handwriting type are divided into a second group of character images, thereby obtaining two groups of character images. And splicing the first group of character images to obtain a sub-image 1, and splicing the second group of character images to obtain a sub-image 2. Thus, two sub-images are obtained through the segmentation and splicing processes.

In a possible implementation manner, for any group of character images in the N groups of character images, the following manner may be adopted for stitching: and acquiring the position of each character image in the first image in the group of character images, and splicing the group of character images according to the sequence of the positions from front to back to obtain sub-images corresponding to the group of character images.

Since each character image is obtained by cutting out the area occupied by each character in the first image, in some implementations, after the character detection processing is performed on the first image to obtain the area occupied by each character, the position of the area occupied by each character can be used as the position of each character image in the first image.

It should be noted that, the sequence from front to back can be understood as the sequence in which the character images are read according to the common reading mode. Illustratively, for each line of characters, the sequence from left to right in the line is the precedence sequence. For the different lines of characters, the ith line of characters precedes the (i + 1) th line of characters.

In a possible implementation manner, if the heights of the character images in a group of character images are different, performing scaling processing on the character images in the group of character images, wherein the heights of the scaled character images are the same; and then, according to the sequence of the positions from front to back, splicing the group of character images after the zooming processing to obtain sub-images corresponding to the group of character images.

It should be understood that by performing scaling processing on each character image, the heights of each character image in the same group of character images are made the same, so that the subsequent stitching processing can be facilitated.

S506: and for each sub-image, performing character recognition processing on the sub-image through a character recognition model corresponding to the character type to obtain a character set corresponding to the sub-image, wherein the character set comprises at least one character.

It should be understood that the specific implementation manner of S506 is similar to S203 in fig. 2, and is not described herein again.

S507: and determining text information corresponding to the first image according to the character sets corresponding to the N sub-images.

Illustratively, the positions of the characters in the first image in the character sets corresponding to the N sub-images are obtained, and the characters in the character sets corresponding to the N sub-images are sorted according to the sequence of the positions from front to back to obtain the text information corresponding to the first image.

In this embodiment, when the first image includes N types of characters, the recognition process of the first image is converted into a recognition process of N sub-images, each sub-image includes only one type of character, and the types of characters included in different sub-images are different. In this way, N character recognition models can be used to perform character recognition processing on N sub-images, and each character recognition model only needs to have the recognition capability of a single type of character, so that the accuracy of a text recognition result can be improved. In addition, character images of the same character type are spliced into a sub-image, so that the number of the sub-images of the first image is reduced as few as possible, the recognition processing times of the sub-images are reduced to the greatest extent, and the text recognition efficiency is improved.

On the basis of any of the above embodiments, the following describes the technical solution of the present disclosure with reference to a specific example.

Fig. 6 is a schematic diagram of a text recognition process provided in an embodiment of the present disclosure. In this embodiment, a text recognition process of the first image in fig. 1 is taken as an example. The first image includes two types of characters, a print type and a handwriting type.

As shown in fig. 6, the text recognition process of the present embodiment includes:

(1) and carrying out character detection processing on the first image, marking the area occupied by each character in the first image by adopting a rectangular frame, and identifying to obtain the type of each character. Where 0 represents a print type and 1 represents a handwriting type.

(2) And according to the area occupied by each character, intercepting a plurality of character images from the first image, wherein each character image comprises one character.

(3) Character images of the print type are stitched into a sub-image 1, and character images of the handwriting type are stitched into a sub-image 2.

For example, referring to fig. 6, character images corresponding to the following print characters are stitched into a sub-image 1:

"you" "" "Ji" "Nian" "book" "of" "name" "word" "is" ": "," "score" "is" "" "title" "page" "" "three" "individual" "section" "score" "are shown. .

Continuing with fig. 6, the character images corresponding to the following handwritten characters are stitched into sub-image 2:

"I" "love" "green" "spring" "A" "first" "page" "positive" "page" ", and" "page" ".

(4) And inputting the sub-image 1 into a character recognition model corresponding to the print type to obtain a character set 1 corresponding to the sub-image 1.

For example, referring to fig. 6, the character set 1 is "{" you "", "Ji" "Nian" "album" "," Name "" word "" is ": "," "score" "is" "" "title" "page" "" "three" "individual" "section" "score" "are shown. "}

(5) And inputting the sub-image 2 into the character recognition model corresponding to the print style to obtain a character set 2 corresponding to the sub-image 2.

For example, referring to FIG. 6, the character set 2 is { "" "" "" I "" love "" green "" spring "" A "" "" "head" "page" "positive" "page" }

(6) And sequencing the characters in the character set 1 and the character set 2 to obtain text information corresponding to the first image.

For example, the final text information is obtained for the characters in the character set 1 and the character set 2 according to the position of each character in the first image as follows:

"the name of your autograph is: 'I love youth' divided into three parts of home page, head page and front page "

In practical application, errors are inevitable when the character recognition model recognizes the sub-images. In the technical scheme of the disclosure, after the character recognition processing is performed on each sub-image by using the character recognition model, the result of the character recognition processing can be analyzed, so as to further improve the accuracy of the text recognition result. This is explained in detail below with reference to fig. 7.

Fig. 7 is a schematic flowchart of another text recognition method according to an embodiment of the present disclosure. The present embodiment describes a process of performing character recognition processing on a sub-image by using a character recognition model to obtain a character set corresponding to the sub-image. This embodiment may be used as a possible implementation manner of S203 and S506.

As shown in fig. 7, the method of this embodiment includes:

s701: and performing character recognition processing on the sub-images through a character recognition model to obtain a character set to be selected.

The sub-image in this embodiment may be any one of the plurality of sub-images obtained in S202. Illustratively, the sub-image is input into a character recognition model, the character recognition model performs recognition processing on characters in the sub-image, and outputs a character set obtained by recognition and a recognition confidence coefficient of each character in the character set. In this embodiment, a character set output by the character recognition model is used as a candidate character set.

S702: and acquiring the first character number of the characters in the character set to be selected and the second character number of the characters included in the sub-image.

Assuming that the number of the first characters is K, namely the number of the characters obtained by the character recognition model is K; the second number of characters is M, i.e. the number of characters included in the sub-image is M. In this embodiment, whether the output result of the character recognition model is accurate or not can be determined according to the magnitude relationship between K and M. See, in particular, detailed descriptions of S703 to S705.

S703: and if the first character quantity is the same as the second character quantity, determining the character set to be selected as the character set corresponding to the sub-image.

That is, when the number of characters recognized by the character recognition model is equal to the number of characters included in the sub-image, the recognition result of the character recognition model may be considered to be accurate. Therefore, the character set to be selected is taken as the character set corresponding to the sub-image.

S704: and if the number of the first characters is smaller than the number of the second characters, performing quality enhancement processing on the sub-images to obtain enhanced images, and performing character recognition processing on the enhanced images through the character recognition model to obtain character sets corresponding to the sub-images.

That is, K < M, that is, the number of characters recognized by the character recognition model is smaller than the number of characters included in the sub-image, in this case, a case where the character recognition model has missing recognition is explained. Therefore, in this embodiment, the quality enhancement processing may be performed on the sub-image to obtain an enhanced image, and the character recognition processing may be performed on the enhanced image through the character recognition model to obtain a character set corresponding to the sub-image. Therefore, the accuracy of the character set corresponding to the sub-image can be improved.

Optionally, the quality enhancement treatment includes, but is not limited to: binarization processing, contrast adjustment processing, filtering processing, and the like.

S705: if the number of the first characters is larger than the number of the second characters, acquiring the recognition confidence of each character in the character set to be selected; and determining M characters with the highest recognition confidence degrees in the character set to be selected, and determining the M characters as the character set corresponding to the sub-image.

And M is the number of the second characters, and M is an integer greater than 1.

That is, K > M, that is, the number of characters recognized by the character recognition model is larger than the number of characters included in the sub-image, which indicates that the character recognition model misrecognized part of the noise as a character. Typically, these misrecognized characters have a low recognition confidence. Therefore, the M characters with the highest recognition confidence in the candidate character set can be used as the character set corresponding to the sub-image.

In the embodiment, after the character recognition model is used for carrying out character recognition processing on the sub-image to obtain the character set to be selected, the number of characters in the character set to be selected is compared with the number of characters in the sub-image to judge whether the conditions of missed recognition and false recognition exist or not, different processing strategies are adopted for processing the conditions of missed recognition and false recognition, and the accuracy of the character set corresponding to the sub-image is improved.

Fig. 8 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present disclosure. As shown in fig. 8, the text recognition apparatus 800 provided in this embodiment includes: an acquisition module 801, a first determination module 802, a processing module 803, and a second determination module 804.

The acquiring module 801 is configured to acquire a first image, where the first image includes N types of characters, and N is an integer greater than 1;

a first determining module 802, configured to determine a plurality of sub-images in a first image, and determine character types of the sub-images, where each sub-image includes a type of character;

the processing module 803 is configured to perform character recognition processing on each sub-image through a character recognition model corresponding to the character type to obtain a character set corresponding to the sub-image, where the character set includes at least one character;

a second determining module 804, configured to determine text information corresponding to the first image according to the character sets corresponding to the multiple sub-images.

In one possible implementation manner, the first determining module 802 includes:

and the first determining unit is used for carrying out segmentation and splicing processing on the first image to obtain N sub-images, wherein the types of characters in different sub-images are different.

In a possible implementation manner, the first determining unit includes:

the detection subunit is used for carrying out character detection processing on the first image to obtain the area and the type occupied by each character in the first image;

the segmentation subunit is used for carrying out segmentation processing on the first image according to the area occupied by each character in the first image to obtain a plurality of character images;

and the splicing subunit is used for splicing the plurality of character images according to the type of each character in the first image to obtain the N sub-images.

In a possible implementation manner, the splicing subunit is specifically configured to:

grouping the character images according to the types of characters in the character images to obtain N groups of character images, wherein the types of the characters in the group of character images are the same;

and respectively splicing the N groups of character images to obtain the N sub-images.

In one possible implementation, for any one of the N groups of character images; the splicing subunit is specifically configured to:

acquiring the position of each character image in the group of character images in the first image;

and splicing the group of character images according to the sequence of the positions from front to back to obtain sub-images corresponding to the group of character images.

if the heights of the character images in the group of character images are different, carrying out scaling processing on the character images in the group of character images, wherein the heights of the scaled character images are the same;

and splicing the group of character images after the zooming processing according to the sequence of the positions from front to back to obtain sub-images corresponding to the group of character images.

In a possible implementation manner, the processing module 803 includes:

the first identification unit is used for carrying out character identification processing on the sub-image through the character identification model to obtain a character set to be selected;

the first acquisition unit is used for acquiring a first character number of characters in the character set to be selected and a second character number of characters included in the sub-image;

and the second determining unit is used for determining the character set to be selected as the character set corresponding to the sub-image if the first character number is the same as the second character number.

In a possible implementation manner, the processing module 803 further includes: a second recognition unit;

the second identification unit is used for: and if the number of the first characters is smaller than the number of the second characters, performing quality enhancement processing on the sub-images to obtain enhanced images, and performing character recognition processing on the enhanced images through the character recognition model to obtain character sets corresponding to the sub-images.

In a possible implementation manner, the processing module 803 further includes: a third determination unit; the third determination unit is configured to:

if the number of the first characters is larger than the number of the second characters, acquiring the recognition confidence of each character in the character set to be selected;

determining M characters with the highest recognition confidence coefficient in the character set to be selected, wherein M is the number of the second characters, and M is an integer greater than 1;

and determining the M characters as a character set corresponding to the sub-image.

In a possible implementation manner, the second determining module 804 includes:

a second obtaining unit, configured to obtain positions of characters in a character set corresponding to each of the plurality of sub-images in the first image;

and the sorting unit is used for sorting the characters in the character sets corresponding to the sub-images according to the sequence of the positions from front to back to obtain the text information corresponding to the first image.

In one possible implementation, the N types include a print type and a handwriting type.

The text recognition apparatus provided in this embodiment may be configured to execute the text recognition method provided in any of the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a text recognition method. For example, in some embodiments, the text recognition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the text recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the text recognition method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A text recognition method, comprising:

2. The method of claim 1, wherein determining a plurality of sub-images in the first image comprises:

and carrying out segmentation and splicing processing on the first image to obtain N sub-images, wherein the types of characters in different sub-images are different.

3. The method of claim 2, wherein segmenting and stitching the first image to obtain N sub-images comprises:

carrying out character detection processing on the first image to obtain the area and the type occupied by each character in the first image;

according to the area occupied by each character in the first image, carrying out segmentation processing on the first image to obtain a plurality of character images;

and according to the type of each character in the first image, splicing the character images to obtain the N sub-images.

4. The method of claim 3, wherein the obtaining the N sub-images by stitching the plurality of character images according to the type of each character in the first image comprises:

5. The method of claim 4, wherein for any one of the N sets of character images; the splicing processing is carried out on the group of character images to obtain sub-images corresponding to the group of character images, and the method comprises the following steps:

6. The method of claim 5, wherein the splicing the group of character images according to the order of the positions from front to back to obtain sub-images corresponding to the group of character images comprises:

7. The method according to any one of claims 1 to 6, wherein the character recognition processing is performed on the sub-image through a character recognition model corresponding to the character type to obtain a character set corresponding to the sub-image, the character set including at least one character, the method includes:

performing character recognition processing on the sub-images through the character recognition model to obtain a character set to be selected;

acquiring a first character number of characters in the character set to be selected and a second character number of characters included in the sub-image;

and if the first character quantity is the same as the second character quantity, determining the character set to be selected as the character set corresponding to the sub-image.

8. The method of claim 7, further comprising:

and if the number of the first characters is smaller than the number of the second characters, performing quality enhancement processing on the sub-images to obtain enhanced images, and performing character recognition processing on the enhanced images through the character recognition model to obtain character sets corresponding to the sub-images.

9. The method of claim 7 or 8, further comprising:

10. The method according to any one of claims 1 to 9, wherein determining the text information corresponding to the first image according to the character set corresponding to each of the plurality of sub-images comprises:

acquiring the position of each character in the character set corresponding to each of the plurality of sub-images in the first image;

and according to the sequence of the positions from front to back, sequencing the characters in the character sets corresponding to the sub-images respectively to obtain the text information corresponding to the first image.

11. The method of any one of claims 1 to 10, wherein the N types include a print type and a handwriting type.

12. A text recognition apparatus comprising:

13. The apparatus of claim 12, wherein the first determining means comprises:

14. The apparatus of claim 13, wherein the first determining unit comprises:

15. The apparatus of claim 14, wherein the splicing subunit is specifically configured to:

16. The apparatus of claim 15, wherein for any of the N sets of character images; the splicing subunit is specifically configured to:

17. The apparatus of claim 16, wherein the splicing subunit is specifically configured to:

18. The apparatus of any of claims 12 to 17, the processing module comprising:

19. The apparatus of claim 18, the processing module further comprising: a second recognition unit;

20. The apparatus of claim 18 or 19, the processing module further comprising: a third determination unit; the third determination unit is configured to:

21. The apparatus of any of claims 12 to 20, the second determining means comprising:

22. The apparatus of any one of claims 12 to 21, wherein the N types include a print type and a handwriting type.

23. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 11.

25. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 11.