CN113610081A

CN113610081A - Character recognition method and related equipment thereof

Info

Publication number: CN113610081A
Application number: CN202110924295.1A
Authority: CN
Inventors: 蔡悦; 卢永晨; 黄灿; 王长虎
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-05

Abstract

The application discloses a character recognition method and related equipment thereof, wherein the method comprises the following steps: after a plurality of images to be recognized comprising the same character information are obtained, firstly, according to the plurality of images to be recognized, determining coding features corresponding to a text to be recognized so that the coding features corresponding to the text to be recognized are used for representing the character information carried by the plurality of images to be recognized; and then decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized. Therefore, the coding features corresponding to the text to be recognized can accurately represent the character information carried by all the images to be recognized, so that the coding features corresponding to the text to be recognized can more accurately represent each character in the text to be recognized, the character recognition result of the text to be recognized determined based on the coding features corresponding to the text to be recognized is more accurate, and the character recognition accuracy of multi-frame text line recognition is improved.

Description

Character recognition method and related equipment thereof

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a character recognition method and a related device.

Background

With the development of character recognition technology, the application range of character recognition technology is wider and wider. The character recognition technology is used for performing recognition processing on characters appearing in one image.

However, some Character Recognition technologies (e.g., Optical Character Recognition (OCR), etc.) have disadvantages, so that the Recognition accuracy of these Character Recognition technologies in some application scenarios (e.g., multi-frame text line Recognition, etc.) is low. Here, "multi-frame text line recognition" refers to recognition of the same text line appearing in a plurality of images (especially, a plurality of frames of consecutive video images in one video).

Disclosure of Invention

In order to solve the technical problem, the application provides a character recognition method and related equipment thereof, which can improve the character recognition accuracy of multi-frame text line recognition.

In order to achieve the above purpose, the technical solutions provided in the embodiments of the present application are as follows:

the embodiment of the application provides a character recognition method, which comprises the following steps:

acquiring a plurality of images to be identified; wherein the plurality of images to be recognized include the same character information;

determining the coding features corresponding to the text to be recognized according to the images to be recognized, so that the coding features corresponding to the text to be recognized are used for representing character information carried by the images to be recognized;

and decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

In a possible implementation manner, the determining, according to the plurality of images to be recognized, the coding feature corresponding to the text to be recognized includes:

respectively determining the coding characteristics of the images to be identified;

and splicing the coding features of the plurality of images to be recognized to obtain the coding features corresponding to the texts to be recognized.

In a possible implementation manner, if the number of the images to be recognized is N, the determining process of the coding feature of the nth image to be recognized includes:

performing feature extraction on the nth image to be recognized to obtain visual features of the nth image to be recognized;

coding the visual features of the nth image to be recognized to obtain the coding features of the nth image to be recognized; wherein N is a positive integer, N is less than or equal to N, and N is a positive integer.

In a possible implementation manner, the encoding the visual feature of the nth image to be recognized to obtain the encoding feature of the nth image to be recognized includes:

carrying out position coding on the visual features of the nth image to be recognized to obtain the position features of the nth image to be recognized;

performing feature fusion on the position feature of the nth image to be recognized and the visual feature of the nth image to be recognized to obtain a fusion feature of the nth image to be recognized;

and inputting the fusion characteristics of the nth image to be identified into a pre-constructed coding network to obtain the coding characteristics of the nth image to be identified output by the coding network.

In a possible implementation, the process of acquiring the plurality of images to be identified includes:

clustering a plurality of candidate images to obtain at least one candidate image set, so that all the candidate images in the candidate image set comprise the same character information;

and determining the plurality of images to be identified according to the image set to be identified in the at least one candidate image set.

In a possible implementation, the determining of the at least one candidate image includes:

performing text detection on a plurality of frames of video images in a video to be processed to obtain a text detection result of the plurality of frames of video images;

and respectively carrying out image cutting on the multiple frames of video images according to the text detection result of the at least one frame of video image to obtain the multiple candidate images.

An embodiment of the present application further provides a character recognition apparatus, including:

the image acquisition unit is used for acquiring a plurality of images to be identified; wherein the plurality of images to be recognized include the same character information;

the characteristic determining unit is used for determining the coding characteristics corresponding to the text to be recognized according to the images to be recognized, so that the coding characteristics corresponding to the text to be recognized are used for representing character information carried by the images to be recognized;

and the character recognition unit is used for decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

An embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:

the memory is used for storing a computer program;

the processor is used for executing any implementation mode of the character recognition method provided by the embodiment of the application according to the computer program.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a computer program, and the computer program is used for executing any implementation manner of the character recognition method provided in the embodiment of the present application.

The embodiment of the present application further provides a computer program product, and when the computer program product runs on a terminal device, the terminal device is enabled to execute any implementation manner of the character recognition method provided by the embodiment of the present application.

Compared with the prior art, the embodiment of the application has at least the following advantages:

in the character recognition method provided by the embodiment of the application, after a plurality of images to be recognized including the same character information are acquired, according to the plurality of images to be recognized, the coding features corresponding to the text to be recognized are determined, so that the coding features corresponding to the text to be recognized are used for representing the character information carried by the plurality of images to be recognized; and then decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

The method comprises the steps of determining the encoding characteristics corresponding to the text to be recognized according to the image to be recognized, and determining the encoding characteristics corresponding to the text to be recognized according to the image to be recognized, wherein the encoding characteristics corresponding to the text to be recognized can accurately represent character information carried by all the images to be recognized, so that the encoding characteristics corresponding to the text to be recognized can accurately represent each character in the text to be recognized, the character recognition result of the text to be recognized determined based on the encoding characteristics corresponding to the text to be recognized is more accurate, and the method is favorable for improving the character recognition accuracy of multi-frame text line recognition.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a character recognition method according to an embodiment of the present application;

fig. 2 is a schematic diagram of N images to be recognized according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of another N images to be recognized according to an embodiment of the present disclosure;

fig. 4 is a schematic process diagram for determining a coding feature corresponding to a text to be recognized according to an embodiment of the present application;

fig. 5 is a schematic diagram of a first coding network according to an embodiment of the present application;

fig. 6 is a schematic diagram of a second coding network according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application.

Detailed Description

The inventor finds in research on a character recognition technology that for multi-frame text line recognition, after a plurality of images comprising the same text line are acquired, character recognition can be performed on each image by using an OCR (optical character recognition) to obtain a character recognition result of each image; and combining the character recognition results of all the images according to a preset rule to obtain the character recognition result of the text line. However, since the text line may have different defects (e.g., occlusion, displacement, distortion, character missing, etc.) in different images, the character recognition results of the images are inaccurate, so that the character recognition results determined by combining the character recognition results of the images are also inaccurate, and thus the character recognition accuracy of the multi-frame text line recognition is low.

Based on the above findings, in order to solve the technical problems in the background art section, an embodiment of the present application provides a character recognition method, including: acquiring a plurality of images to be recognized, wherein the images to be recognized comprise the same character information; determining the coding features corresponding to the text to be recognized according to the images to be recognized, so that the coding features corresponding to the text to be recognized are used for representing character information carried by the images to be recognized; and decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

Therefore, the coding features corresponding to the text to be recognized are determined according to all the images to be recognized, so that the coding features corresponding to the text to be recognized can accurately represent character information carried by all the images to be recognized, the coding features corresponding to the text to be recognized can accurately represent all characters in the text to be recognized, the character recognition result of the text to be recognized determined based on the coding features corresponding to the text to be recognized is accurate, and the character recognition accuracy of multi-frame text line recognition is improved.

In addition, the embodiment of the present application does not limit the execution subject of the character recognition method, and for example, the character recognition method provided by the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like. The server may be a stand-alone server, a cluster server, or a cloud server.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For convenience of understanding and explaining the technical solution of the present application, the character recognition method provided by the embodiment of the present application is described below by taking a multi-frame text line recognition process for N images to be recognized as an example.

Method embodiment

Referring to fig. 1, the figure is a flowchart of a character recognition method according to an embodiment of the present application.

The character recognition method provided by the embodiment of the application comprises the following steps of S1-S3:

s1: and acquiring N images to be identified. Wherein N is a positive integer.

The N images to be recognized are used for representing images needing multi-frame text line recognition, and carry the same character information.

Note that the "same character information" may be the following: all characters appearing in the nth image to be recognized are completely the same as all characters appearing in any image to be recognized except the nth image to be recognized in the N images to be recognized; wherein N is a positive integer, N is less than or equal to N, and N is a positive integer. In the 'N images to be recognized', all characters appearing in some images to be recognized are completely the same, but compared with the 'all characters appearing in some images to be recognized', the phenomena of missing characters and missing characters can occur in other images to be recognized, so that the 'other images to be recognized' can only include most characters in the 'all characters appearing in some images to be recognized'. In the "N images to be recognized", although all the characters appearing in all the images to be recognized are identical, there is a certain difference in appearance positions (for example, there is a displacement or the like) or presentation effects (for example, presentation is performed with different degrees of distortion or with different colors or the like) of all the characters in different images to be recognized.

In addition, the embodiment of the present application does not limit the N images to be recognized, for example, the N images to be recognized may be the 1 st frame video image to the N th frame video image shown in fig. 2. For another example, the N images to be recognized may be text images corresponding to the 1 st frame of video image to text images corresponding to the nth frame of video image shown in fig. 3.

And the text image corresponding to the nth frame of video image is obtained by performing image segmentation on the nth frame of video image according to the text detection result of the nth frame of video image. N is a positive integer, N is not more than N, and N is a positive integer. It should be noted that the text detection result of the nth frame of video image may be implemented by using any existing text detection method, which is not specifically limited in this embodiment of the present application.

In addition, the embodiment of the present application does not limit the 1 st frame video image to the N th frame video image, for example, the 1 st frame video image to the N th frame video image may refer to consecutive N frames of video images in one video data (e.g., hereinafter, "video to be processed").

The text to be recognized is used to represent character information appearing in the N images to be recognized. For example, if the N images to be recognized are the N images shown in fig. 2 or fig. 3, the text to be recognized may be "this is a text with the same content in one line".

In addition, the present application does not limit the acquiring process of the N images to be identified (i.e., the implementation manner of S1), for example, in a possible implementation manner, the S1 may specifically include S11 to S13:

s11: a plurality of candidate images are acquired.

The candidate image refers to image data required to be used when N images to be identified are screened; also, the number of candidate images is not less than the number of images to be recognized (i.e., the number of candidate images is not less than N).

In addition, the embodiment of the present application also does not limit the candidate image, for example, the candidate image may be a video image of one frame in one video data, or may be a text image corresponding to a video image of one frame in one video data.

The present example is not limited to the embodiment of S11, and for the sake of easy understanding, the following description will be made in conjunction with both cases.

In case 1, if the candidate image is a frame of video image, S11 may specifically include: after the video to be processed is acquired, extracting multiple frames of video images from the video to be processed to serve as the multiple candidate images. The video to be processed refers to video data needing multi-frame text line identification.

In some cases, after the video to be processed is acquired, multiple frames of video images in the video to be processed may be directly determined as multiple candidate images (for example, each frame of video image in the video to be processed may be determined as a candidate image), so that each candidate image is a video image in the video to be processed.

In case 2, if the candidate image is a text image corresponding to one frame of video image, S11 may specifically include S111-S112:

s111: and performing text detection on the multi-frame video image in the video to be processed to obtain a text detection result of the multi-frame video image.

The text detection result of one frame of video image is used for indicating the position of the text in the frame of video image.

In addition, the embodiment of the present application is not limited to the implementation of "text detection" in S111, and may be implemented by any existing or future text detection method.

Based on the related content of S111, after the to-be-processed video is obtained, text detection may be performed on the multiple frames of video images in the to-be-processed video to obtain a text detection result of the multiple frames of video images (for example, text detection is performed on each frame of video image in the to-be-processed video to obtain a text detection result of each frame of video image in the to-be-processed video), so that a text image corresponding to each frame of video image can be determined based on the text detection result of each frame of video image in the following process.

S112: and respectively carrying out image cutting on the multi-frame video images according to the text detection results of the multi-frame video images to obtain a plurality of candidate images.

In the embodiment of the application, if the "multi-frame video image" includes a T-frame video image, after the text detection result of the T-frame video image is obtained, image segmentation may be performed on the T-frame video image according to the text detection result of the T-frame video image to obtain a text image corresponding to the T-frame video image, so that the text image corresponding to the T-frame video image can accurately represent character information carried by the T-frame video image; and determining the text image corresponding to the t-th frame of video image as a t-th candidate image. The size of the text image corresponding to the t-th frame of video image is smaller than that of the t-th frame of video image, so that the non-character information carried by the text image corresponding to the t-th frame of video image is less than that carried by the t-th frame of video image, and the character recognition based on the text image corresponding to the t-th frame of video image is more accurate. T is a positive integer, T is less than or equal to T, and T is a positive integer.

Based on the related contents of S111 to S112, in some cases, after the video to be processed is obtained, text detection may be performed on multiple frames of video images in the video to be processed first, so as to obtain a text detection result of the multiple frames of video images; respectively cutting out a text image corresponding to each frame of video image from each frame of video image according to the text detection result of each frame of video image; and finally, determining the text images corresponding to the multiple frames of video images as candidate images.

Based on the above-mentioned related content of S11, in some application scenarios, multiple candidate images may be determined according to multiple frames of video images in one video data (e.g., a video to be processed), so that multiple images with the same text content can be subsequently screened from the multiple candidate images for performing multiple frames of text line recognition.

S12: and clustering the candidate images to obtain at least one candidate image set, so that all the candidate images in each candidate image set comprise the same character information.

Wherein, the y candidate image set refers to a set of candidate images including the y text. Y is a positive integer, Y is less than or equal to Y, Y is a positive integer, and Y represents the number of candidate image sets.

In addition, the embodiment of the present application is not limited to the implementation of "clustering" in S12, and may be implemented by using any existing or future clustering method.

Based on the related content of S12, after T candidate images are obtained, clustering may be performed on the T candidate images first, so that candidate images carrying the same character information are divided into the same class, and candidate images carrying different character information are divided into different classes, so as to obtain Y class candidate images, and all candidate images in the Y class candidate images include the Y-th text; and then all candidate images in the y-th class of candidate images are collected to be determined as a y-th candidate image set. Wherein Y is a positive integer and is less than or equal to Y.

S13: and determining N images to be identified according to the image set to be identified in the at least one candidate image set.

Wherein the image set to be identified is used to represent any one of the candidate image sets.

It can be seen that after the y-th candidate image set is obtained, each candidate image in the y-th candidate image set may be determined as an image to be recognized, so that a character recognition result (i.e., a character recognition result of the y-th text) corresponding to the y-th candidate image set can be determined subsequently by using the following S2-S3. Wherein Y is a positive integer, Y is less than or equal to Y, and Y is a positive integer.

Based on the related content of S1, for some application scenarios, after the to-be-processed video is acquired, N to-be-recognized images may be determined according to multiple frames of video images in the to-be-processed video (for example, the 1 st frame of video image to the nth frame of video image in fig. 2), so that the N to-be-recognized images can represent character information carried by the multiple frames of video images in the to-be-processed video (for example, "this is a line of text with the same content"), so that multiple frames of text line recognition can be performed on the N to-be-recognized images subsequently.

S2: and determining the coding features corresponding to the text to be recognized according to the N images to be recognized, so that the coding features corresponding to the text to be recognized are used for representing character information carried by the N images to be recognized.

The coding features corresponding to the text to be recognized are determined according to the text content in the N images to be recognized, so that the coding features corresponding to the text to be recognized are used for representing character information carried by the N images to be recognized.

In addition, the embodiments of the present application are not limited to the implementation of S2, for example, in one possible implementation, as shown in fig. 4, S2 may specifically include:

s21: and determining the coding characteristics of the nth image to be identified. Wherein N is a positive integer and is less than or equal to N.

The coding characteristics of the nth image to be recognized are used for representing character information carried by the nth image to be recognized.

In addition, the embodiment of S21 is not limited in the examples of the present application, for example, in one possible embodiment, S21 may specifically include S211 to S212:

s211: and performing feature extraction on the nth image to be recognized to obtain the visual feature of the nth image to be recognized.

The visual characteristics of the nth image to be recognized are used for representing the image information carried by the nth image to be recognized.

In addition, the embodiment of the present application is not limited to the implementation of "feature extraction" in S211, and may be implemented by any existing or future method capable of performing feature extraction on image data.

For example, in one possible implementation, S211 may specifically include: and inputting the nth image to be recognized into a pre-constructed convolutional neural network to obtain the visual characteristics of the nth image to be recognized output by the convolutional neural network.

The convolutional neural network is used for performing visual feature extraction on input data of the convolutional neural network, and the convolutional neural network can be constructed in advance according to the first sample image and the actual visual features of the first sample image.

Based on the related content of S211, after the nth to-be-recognized image is acquired, feature extraction may be performed on the nth to-be-recognized image to obtain the visual feature of the nth to-be-recognized image, so that the visual feature of the nth to-be-recognized image can accurately represent the image information carried by the nth to-be-recognized image.

S212: and coding the visual characteristics of the nth image to be recognized to obtain the coding characteristics of the nth image to be recognized.

The embodiment of the application is not limited to the implementation of S212, for example, in a possible implementation, S212 may specifically include S2121-S2123:

s2121: and carrying out position coding on the visual features of the nth image to be recognized to obtain the position features of the nth image to be recognized.

The position feature of the nth image to be recognized is used for representing the position information carried in the nth image to be recognized.

The embodiment of the present application is not limited to the implementation of the "position Encoding" in S2121, and for example, the present application may be implemented by using any existing or future position Encoding method (for example, a position Encoding (Positional Encoding) module in a transform model may be used for implementation).

S2122: and performing feature fusion on the position feature of the nth image to be recognized and the visual feature of the nth image to be recognized to obtain the fusion feature of the nth image to be recognized.

In this embodiment of the application, after the position feature of the nth image to be recognized is obtained, feature fusion (e.g., splicing or adding) may be performed on the position feature of the nth image to be recognized and the visual feature of the nth image to be recognized, so as to obtain a fusion feature of the nth image to be recognized, so that the fusion feature of the nth image to be recognized can more accurately represent character-related information (e.g., information such as each character and the arrangement order of each character) carried by the nth image to be recognized.

S2123: and inputting the fusion characteristics of the nth image to be recognized into a pre-constructed coding network to obtain the coding characteristics of the nth image to be recognized output by the coding network.

The coding network is used for coding input data of the coding network. In addition, the embodiment of the present application is not limited to the encoding network, and may be implemented by using any existing or future encoding network (e.g., the first encoding network shown in fig. 5 or the second encoding network shown in fig. 6).

For the first encoding network shown in FIG. 5, the first encoding network may include M₁Each coding layer comprises a Multi-Head Self Attention Module (Multi-Head Self Attention Module), a Feed-Forward neural network (Feed Forward Module) and two summation and normalization modules (Add)&Norm). In addition, in the first coding network, the input data of the 1 st coding layer is the input data of the first coding network (for example, the fusion feature of the nth image to be identified), and the mth coding layer₁The input data of each coding layer is m₁Output data of 1 coding layer, m₁Is a positive integer, 2 is less than or equal to m₁≤M₁，M₁Is a positive integer (e.g., M)₁＝6)。

The embodiment of the present application is not limited to the implementation of the first coding network, and for example, the first coding network may be implemented by using an Encoder network in a transform model.

For the second encoding network shown in FIG. 6, the second encoding networkComprising M₂Each coding layer comprises a Multi-Head Self Attention Module (Multi-Head Self Attention Module), two Feed Forward neural networks (Feed Forward modules), a Convolution Module (Convolution Module) and a normalization Module (Layernorm). In addition, in the second coding network, the input data of the 1 st coding layer is the input data of the second coding network (for example, the fusion feature of the nth image to be identified), and the mth coding layer₂The input data of each coding layer is m₂Output data of 1 coding layer, m₂Is a positive integer, 2 is less than or equal to m₂≤M₂，M₂Is a positive integer (e.g., M)₂＝7)。

The embodiment of the present application is not limited to the implementation of the second coding network, and may be implemented by using a former network, for example. In addition, in order to save the amount of computation, the convolution module in the second coding network may be implemented by combining channel convolution (poitwise Conv) and spatial convolution (Depthise Conv).

Based on the related content of the above S2123, after the fusion feature of the nth image to be recognized is obtained, the fusion feature of the nth image to be recognized may be input into a pre-constructed coding network, so that the coding network performs coding processing on the fusion feature of the nth image to be recognized, and obtains and outputs the coding feature of the nth image to be recognized, so that the coding feature of the nth image to be recognized can accurately represent character information carried by the nth image to be recognized.

Based on the related content of S21, after the N images to be recognized are obtained, the coding processing may be performed on each image to be recognized, so as to obtain the coding features of the 1 st image to be recognized to the nth image to be recognized, so that the coding features of the N images to be recognized may be subjected to feature fusion in the following process, so as to obtain the coding features corresponding to the text to be recognized.

S22: and splicing the coding features of the 1 st image to be recognized to the coding features of the Nth image to be recognized to obtain the coding features corresponding to the text to be recognized.

In the embodiment of the application, after the coding features of the 1 st to nth to-be-identified images are acquired, the coding features of the N to-be-identified images can be spliced to obtain the coding features corresponding to the to-be-identified text.

Based on the related contents of S21 to S22, after the N images to be recognized are obtained, image coding processing may be performed on the 1 st image to be recognized to the nth image to be recognized, so as to obtain coding features of the 1 st image to be recognized to coding features of the nth image to be recognized; and splicing the coding features of the N images to be recognized to obtain the coding features corresponding to the text to be recognized, so that the coding features corresponding to the text to be recognized can accurately represent the common character information in the N images to be recognized, and the coding features corresponding to the text to be recognized can more accurately represent the character information carried by the text to be recognized.

S3: and decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

The character recognition result of the text to be recognized is used for representing the shared character information in the N images to be recognized.

In addition, the embodiment of the present application is not limited to the implementation of the "decoding process" in S3, and may be implemented by a Decoder network in a transform model, for example.

Based on the related contents of S1 to S3, it can be known that, in the character recognition method provided in the embodiment of the present application, after a plurality of images to be recognized, which all include the same character information, are acquired, according to the plurality of images to be recognized, the encoding features corresponding to the text to be recognized are determined, so that the encoding features corresponding to the text to be recognized are used for representing the character information carried by the plurality of images to be recognized; and then decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

Based on the character recognition method provided by the above method embodiment, the embodiment of the present application further provides a character recognition apparatus, which is explained and explained below with reference to the accompanying drawings.

Device embodiment

Please refer to the above method embodiment for the technical details of the character recognition apparatus provided by the apparatus embodiment.

Referring to fig. 7, the figure is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application.

The character recognition apparatus 700 provided in the embodiment of the present application includes:

an image acquisition unit 701 configured to acquire a plurality of images to be recognized; wherein the plurality of images to be recognized include the same character information;

a feature determining unit 702, configured to determine, according to the multiple images to be recognized, coding features corresponding to the texts to be recognized, so that the coding features corresponding to the texts to be recognized are used to represent character information carried by the multiple images to be recognized;

the character recognition unit 703 is configured to decode the coding features corresponding to the text to be recognized, so as to obtain a character recognition result of the text to be recognized.

In a possible implementation, the feature determining unit 702 includes:

the first determining subunit is used for respectively determining the coding features of the images to be identified;

and the feature splicing subunit is used for splicing the coding features of the multiple images to be recognized to obtain the coding features corresponding to the texts to be recognized.

In one possible embodiment, the first determining subunit includes:

the feature extraction subunit is configured to, if the number of the images to be recognized is N, perform feature extraction on the nth image to be recognized to obtain a visual feature of the nth image to be recognized; wherein N is a positive integer, N is not more than N, and N is a positive integer;

the coding processing subunit is configured to perform coding processing on the visual feature of the nth image to be identified to obtain a coding feature of the nth image to be identified; wherein N is a positive integer, N is less than or equal to N, and N is a positive integer.

In a possible implementation manner, the encoding processing subunit is specifically configured to:

In a possible implementation manner, the image obtaining unit 701 is specifically configured to:

clustering at least one candidate image to obtain at least one candidate image set, so that all candidate images in the candidate image set comprise the same character information;

performing text detection on at least one frame of video image in a video to be processed to obtain a text detection result of the at least one frame of video image;

and respectively carrying out image cutting on the at least one frame of video image according to the text detection result of the at least one frame of video image to obtain the at least one candidate image.

Based on the related content of the character recognition apparatus 700, for the character recognition apparatus 700, after acquiring a plurality of images to be recognized, which all include the same character information, according to the plurality of images to be recognized, determining a coding feature corresponding to a text to be recognized, so that the coding feature corresponding to the text to be recognized is used for representing character information carried by the plurality of images to be recognized; and then decoding the coding features corresponding to the text to be recognized to obtain a character recognition result of the text to be recognized.

Further, an embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:

the memory is used for storing a computer program;

Further, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, where the computer program is used to execute any implementation manner of the character recognition method provided in the embodiment of the present application.

Further, an embodiment of the present application also provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation of the character recognition method provided in the embodiment of the present application.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims

1. A method of character recognition, the method comprising:

determining coding features corresponding to texts to be recognized according to the images to be recognized, so that the coding features corresponding to the texts to be recognized are used for representing character information carried by the images to be recognized;

2. The method according to claim 1, wherein the determining, according to the plurality of images to be recognized, the coding feature corresponding to the text to be recognized comprises:

3. The method according to claim 2, wherein if the number of the images to be recognized is N, the process of determining the coding features of the nth image to be recognized comprises:

4. The method according to claim 3, wherein the encoding the visual feature of the nth image to be recognized to obtain the encoding feature of the nth image to be recognized comprises:

5. The method according to claim 1, wherein the process of acquiring the plurality of images to be identified comprises:

6. The method of claim 5, wherein the determining of the at least one candidate image comprises:

and respectively carrying out image cutting on the multi-frame video images according to the text detection results of the multi-frame video images to obtain the plurality of candidate images.

7. A character recognition apparatus, comprising:

8. An apparatus, comprising a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to perform the method of any of claims 1-6 in accordance with the computer program.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1-6.

10. A computer program product, characterized in that the computer program product, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-6.