CN113283432A

CN113283432A - Image recognition and character sorting method and equipment

Info

Publication number: CN113283432A
Application number: CN202010106180.7A
Authority: CN
Inventors: 郑琪; 于智; 李亮城; 高飞宇; 王永攀; 张建锋
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2021-08-20

Abstract

The embodiment of the application provides an image recognition method, a character sorting method and equipment. The method comprises the following steps: identifying a plurality of character information to be sorted contained in the image to be identified; determining the reading sequence of the plurality of text messages to be sorted according to the respective corresponding characteristics of the plurality of text messages to be sorted; wherein the features carry semantic features; and sequencing the plurality of text messages to be sequenced according to the reading sequence to obtain a text message sequence to be sequenced. The sorting method provided by the embodiment of the application is suitable for images in any character typesetting format, and is wide in application range and good in applicability.

Description

Image recognition and character sorting method and equipment

Technical Field

The application relates to the technical field of computers, in particular to an image recognition and character sorting method and device.

Background

With the development of computer technology, image character recognition technology comes, and through the technology, the equipment can automatically recognize characters in the image.

In the prior art, a plurality of characters recognized from an image are generally read and ordered from left to right and from top to bottom by default. The simple reading and sorting mode can only be suitable for the pictures with simple typesetting, and the pictures with complex typesetting (such as column-dividing and circular typesetting) can be invalid, because the simple reading sequence can destroy the original semantic consistency.

Therefore, the reading sorting method in the prior art has poor applicability or universality.

Disclosure of Invention

In view of the above, the present application is directed to providing an image recognition, text sorting method and apparatus that solves the above problems, or at least partially solves the above problems.

Thus, in one embodiment of the present application, an image recognition method is provided. The method comprises the following steps:

identifying a plurality of character information to be sorted contained in the image to be identified;

determining the reading sequence of the plurality of text messages to be sorted according to the respective corresponding characteristics of the plurality of text messages to be sorted; wherein the features carry semantic features;

and sequencing the plurality of text messages to be sequenced according to the reading sequence to obtain a text message sequence to be sequenced.

In yet another embodiment of the present application, a method of word ordering is provided. The method comprises the following steps:

acquiring a plurality of character information to be sequenced;

combining the characteristics corresponding to the plurality of text messages to be sorted and the adjacency relation among the plurality of text messages to be sorted, and determining the reading sequence of the plurality of text messages to be sorted;

and sequencing the plurality of first text messages according to the reading sequence to obtain a first text message sequence.

In one embodiment of the present application, an image recognition method is provided. The method comprises the following steps:

determining the character types of the plurality of character information to be sequenced;

acquiring an arrangement rule corresponding to the character type;

and sequencing the plurality of character information to be sequenced according to the sequencing rule to obtain a character information sequence to be sequenced.

In another embodiment of the present application, an electronic device is provided. The apparatus, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

the processor, coupled with the memory, to execute the program stored in the memory to:

the memory is used for storing programs;

acquiring a plurality of character information to be sequenced;

the memory is used for storing programs;

acquiring an arrangement rule corresponding to the character type;

According to the technical scheme provided by the embodiment of the application, after a plurality of pieces of text information to be sorted contained in the image to be identified are identified, the plurality of pieces of text information to be sorted are read and sorted by combining the semantics corresponding to the plurality of pieces of text information to be sorted. The sorting method provided by the embodiment of the application is suitable for images in any character typesetting format, and is wide in application range and good in applicability.

According to the technical scheme provided by the embodiment of the application, when the plurality of character information to be sequenced are read and sequenced, the respective characteristics of each character information are considered, and the adjacent relation among the plurality of character information to be sequenced is also considered, so that the sequencing accuracy can be effectively improved, and the semantic relevance of the finally obtained character information sequence is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1a is a diagram illustrating an example of an image recognition method according to an embodiment of the present application;

FIG. 1b is a diagram illustrating an example of an image recognition method according to another embodiment of the present application;

fig. 1c is a schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating a text sorting method according to an embodiment of the present application;

fig. 3 is a block diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 4 is a block diagram of a text sorting apparatus according to another embodiment of the present application;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

At present, the default of the existing image and text recognition products is to provide a simple reading sequence from left to right and from top to bottom, and the simple sorting scheme fails to work for composing complicated pictures.

In the prior art, there are two types of methods: firstly, a typesetting analysis method and secondly, a structured template method.

The layout analysis method specifically comprises two implementation modes of bottom-up and top-down. The bottom-up mode is to utilize the visual information of the text block, such as the characteristics of distance, size, color and the like, to synthesize the paragraph by rules; after the paragraph is composed, the text ordering for the interior of the paragraph is still read from left to right, top to bottom. The top-down mode is that the picture is segmented according to paragraphs by directly using an image segmentation method; after the segmentation of the paragraph is completed, the sequential reading inside the paragraph is performed in the reading order from left to right and from top to bottom.

The typesetting analysis method can process most of document pictures, namely pictures formed by large regular words. But for the complex graphics context mixed arrangement, such as network advertisement graph and E-commerce description graph, the effect is not good.

The structured template method is to output a character structure according to a configured template rule, can process a more complex typesetting condition, but is only suitable for a more single typesetting format, such as invoices, certificates, bank cards and the like, and cannot generate a semantically related sequence under a general condition.

In order to improve the applicability or universality of the reading and sorting method, the embodiment of the application provides a method for reading and sorting characters based on semantics.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Further, in some flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 1c is a schematic flowchart illustrating an image processing method according to an embodiment of the present application. The execution main body of the method can be a client or a server. The client may be hardware integrated on the terminal and having an embedded program, may also be application software installed in the terminal, and may also be tool software embedded in an operating system of the terminal, which is not limited in this embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, an intelligent sound box and the like. The server may be a common server, a cloud, a virtual server, or the like, which is not specifically limited in this embodiment of the application.

As shown in fig. 1c, the method comprises:

101. and identifying a plurality of pieces of character information to be sorted contained in the image to be identified.

102. And determining the reading sequence of the plurality of text messages to be sorted according to the respective corresponding characteristics of the plurality of text messages to be sorted.

103. And sequencing the plurality of text messages to be sequenced according to the reading sequence to obtain a text message sequence to be sequenced.

In the foregoing 101, the image to be recognized refers to an image including text information, for example: network advertisement picture, E-commerce description picture, invoice picture, certificate picture, bank card picture and the like.

Image text recognition algorithms can be utilized, for example: and recognizing the image to be recognized by Optical Character Recognition (OCR) to obtain a plurality of Character information to be sorted contained in the image. The specific implementation of the image character recognition algorithm can be referred to in the prior art, and is not described herein again. Generally, an image character recognition algorithm is used for recognizing an image to be recognized, so that not only can a plurality of pieces of character information to be sorted contained in the image to be recognized be obtained in a recognition mode, but also the position of each piece of character information to be sorted in the image to be recognized can be obtained in a recognition mode.

Wherein, the character information to be sorted refers to single characters or words. For example: "I" is a single word and "we" is a word. In practical applications, the text information to be sorted is usually referred to as a single word.

In 102, the feature carries a semantic feature. A natural language processing algorithm can be adopted to extract semantic features corresponding to the plurality of recognized character information to be sequenced respectively. The semantic feature extraction process can be referred to in the prior art, and is not described in detail here. In an example, the semantic features corresponding to each text message to be sorted can be used as the features corresponding to each text message to be sorted.

In another example, visual features may also be carried in the features. The method can effectively improve the accuracy of sequencing by combining the semantic feature and the visual feature. For example: the text information recognized in the image to be recognized comprises 'I','s' and 'you', if seen from the semantic aspect, the 'I' and the 'you' can be arranged in front of the's', and form 'us' and 'your' together with the's'; if the visual aspect is combined, the fonts of 'I' and 'M' are found to be inconsistent; if the fonts of "you" and "people" are consistent, it can be determined that "you" is in front of "people".

It should be noted that the features referred to in the embodiments of the present application may be in the form of vectors.

In an implementation scheme, the reading order of the plurality of text messages to be sorted may be determined specifically according to the semantic features and the preset syntax corresponding to the plurality of text messages to be sorted, respectively.

In 103, the semantics of the text information sequence to be sorted, which is obtained by sorting the text information to be sorted according to the reading order, are coherent.

In the image to be recognized, some association relationship is hidden among a plurality of pieces of text information to be sorted, for example: semantically, visually, and/or positionally related. These associations are very important information, and if they can be utilized, the sorting accuracy can be effectively improved. Therefore, in an example, in the above 102, "determining the reading order of the text messages to be sorted according to the respective corresponding features of the text messages to be sorted" may specifically be implemented by the following steps:

1021. and determining the adjacency relation among the plurality of text messages to be sorted.

1022. And determining the reading sequence of the plurality of text messages to be sorted by integrating the characteristics corresponding to the plurality of text messages to be sorted and the adjacency relation among the plurality of text messages to be sorted.

At 1021, the adjacency relation between the text messages to be sorted can indicate the association relation between each text message to be sorted and other text messages to be sorted.

The adjacency relation among the plurality of text messages to be sorted can be determined by adopting one or more of the following methods:

the method comprises the following steps: aiming at each character information to be sorted, searching the character information to be sorted in a set range taking the position of the character information to be sorted in the image to be recognized as the center in the image to be recognized; and determining that the character information to be sorted is adjacent to the character information to be sorted in the set range, and also determining that the character information to be sorted is not adjacent to the character information to be sorted outside the set range.

In the first method, the adjacency relation is specifically the adjacency relation of the relevant positions.

The second method comprises the following steps: and determining the adjacency relation among the plurality of character information to be sorted according to the respective corresponding characteristics of the plurality of character information to be sorted.

In the second method, the following steps can be specifically adopted to implement:

and S11, calculating the correlation between every two pieces of character information to be sorted according to the corresponding characteristics of every two pieces of character information to be sorted in the plurality of pieces of character information to be sorted.

S12, determining whether every two pieces of character information to be sorted are adjacent or not according to the correlation between every two pieces of character information to be sorted.

In the above S11, in an implementation scheme, the similarity between the features corresponding to every two pieces of text information to be sorted may be directly calculated, and the similarity between the features corresponding to every two pieces of text information to be sorted is used as the correlation between every two pieces of text information to be sorted. In practical application, the features may be in a vector form; the inner product between the characteristics corresponding to every two pieces of character information to be sorted can be used as the similarity.

In the above step S12, a correlation threshold may be set in advance, and if the correlation between every two pieces of text information to be sorted is greater than or equal to the correlation threshold, it is determined that the two pieces of text information to be sorted are adjacent to each other.

The above features carry semantic features, so the adjacency relation is specifically the adjacency relation related to semantics.

In another example, the above features may also carry visual features, so the adjacency relation is specifically related to semantic and visual.

When the characteristics also carry visual characteristics, in order to improve the accuracy of the correlation calculation, in the above S11, "calculate the correlation between every two pieces of text information to be sorted according to the characteristics corresponding to every two pieces of text information to be sorted in the text information to be sorted", specifically, the following steps may be adopted to implement:

A. and calculating a first similarity between the semantic features corresponding to every two pieces of character information to be sorted.

B. And calculating a second similarity between the visual features corresponding to every two pieces of character information to be sorted.

C. And integrating the first similarity and the second similarity to determine the correlation between every two pieces of character information to be sorted.

In the step a, the semantic features are specifically in a vector form, and an inner product between the semantic features can be used as the first similarity.

In the step B, the visual features are specifically in a vector form, and an inner product between the visual features can be used as the second similarity.

In the above step C, in an implementation scheme, a sum of the first similarity and the second similarity may be used as the correlation between each two text messages to be sorted.

In another implementation scheme, the first similarity and the second similarity may be summed in a weighted manner to obtain the correlation between each two pieces of text information to be sorted. The weight corresponding to the first similarity and the weight of the second similarity may be set according to actual needs, for example: the determination may be performed in combination with a priori experience, and the embodiment of the present application is not particularly limited thereto.

In 1021, the adjacency relation between the text messages to be sorted indicates whether every two text messages to be sorted in the text messages to be sorted are adjacent.

In 1022, the corresponding features of the text messages to be sorted only include their own information, but do not include their surrounding related information (e.g., information of other adjacent text messages to be sorted). That is, the expression of the features is not good enough, not comprehensive enough. In order to improve the feature expression, the features corresponding to the plurality of text messages to be sorted and the adjacency relation among the plurality of text messages to be sorted can be integrated, and the features corresponding to the plurality of text messages to be sorted are updated to obtain the updated features corresponding to the text messages to be sorted. And determining the reading sequence of the plurality of text messages to be sorted according to the updated characteristics corresponding to the plurality of text messages to be sorted respectively. The updated characteristics corresponding to each character information to be sorted not only comprise the information of the character information per se, but also comprise the information of other adjacent character information to be sorted around the character information to be sorted, the characteristic expression is more and more abstract and better, and the improvement of the sorting accuracy is facilitated.

In an example, in the above 1022, "determining a reading order of the text messages to be sorted by synthesizing the features corresponding to the text messages to be sorted and the adjacency relationship between the text messages to be sorted" may specifically be implemented by adopting the following steps:

s21, constructing a graph structure with nodes and edges according to the adjacency relation among the plurality of character information to be sorted.

Wherein, the nodes in the graph structure are used for representing the text information to be sorted; edges in the graph structure are used to indicate whether or not there is an adjacency between nodes.

And S22, taking the characteristics corresponding to the plurality of pieces of character information to be sorted and the graph structure as the input of the trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the plurality of pieces of character information to be sorted.

In the above S21, the nodes in the graph structure are used to represent the text information to be sorted; edges in the graph structure are used to indicate whether or not there is an adjacency between nodes. An edge exists between two nodes, indicating that the two nodes are adjacent. The graph structure may be represented by an adjacency matrix. The graph structure described above may also be referred to as a topology.

In the above S22, the graph convolution neural network model can extract features well, and can effectively improve the sorting accuracy.

Wherein the graph convolution neural network model is specifically configured to:

and S31, obtaining updated characteristics corresponding to the plurality of text messages to be sorted through graph convolution operation according to the characteristics corresponding to the text messages to be sorted and the graph structure.

S32, determining the reading sequence of the plurality of text messages to be sorted according to the updated characteristics corresponding to the plurality of text messages to be sorted respectively.

In S31, the graph convolution operation may embed the graph structure information into the features corresponding to the text information to be sorted, so as to obtain updated features corresponding to the text information to be sorted. To obtain higher dimensional features, multiple graph convolution operations may be performed to obtain the features.

Specifically, a feature extraction sub-network may be included in the graph convolution neural network model; the feature extraction subnetwork may include a plurality of graph convolution network layers; each graph convolution network layer performs a graph convolution operation. The characteristics corresponding to the plurality of character information to be sorted and the adjacent matrix used for representing the graph structure can be used as the input of the first graph volume network layer in the plurality of graph volume network layers; taking the characteristics corresponding to a plurality of character information to be sequenced output by a previous graph convolution network layer in a plurality of graph convolution network layers as the input of a next graph convolution network layer; and taking the characteristics corresponding to the plurality of character information to be sorted output by the last graph volume network layer in the plurality of graph volume network layers as the updated characteristics corresponding to the plurality of character information to be sorted. It should be noted that, the characteristics corresponding to the plurality of pieces of text information to be sorted output by each graph convolution network layer are different from the characteristics corresponding to the plurality of pieces of text information to be sorted input as the graph convolution network layer, and the characteristics are more abstract and have higher dimensionality.

It should be noted that each graph convolution network layer includes a trained feature extraction parameter matrix; each of the graph volume network layers is executed in connection with the feature extraction parameter matrix during the graph volume operation. The specific implementation manner of the graph convolution operation can be designed according to actual needs, and this embodiment is not particularly limited.

In an implementation scheme, the text information to be sorted includes first text information to be sorted, and the first text information to be sorted refers to any one of the text information to be sorted. Each graph convolution network layer is specifically configured to: aiming at first character information to be sorted, combining an adjacency matrix to determine at least one second character information to be sorted adjacent to the first character information to be sorted; respectively splicing the features corresponding to the input first character information to be sorted and the features corresponding to the input at least one second character information to be sorted to obtain at least one first splicing feature; combining the at least one first splicing feature into a splicing feature matrix; performing matrix multiplication on the spliced feature matrix and the trained feature extraction parameter matrix to obtain a first matrix; and performing pooling processing on the first matrix to obtain characteristics corresponding to the output first character information to be sorted. Wherein, the pooling treatment may be average pooling or maximum pooling.

For example: the input character information to be sorted is corresponding to the characteristic of h-dimensional vector; the characteristic corresponding to the second character information to be sorted is a j-dimensional vector; the first splice feature is a (h + j) -dimensional vector.

In an implementation scheme, in the above S32, "determining the reading order of the text messages to be sorted according to the updated features corresponding to the text messages to be sorted", specifically, the following steps may be adopted:

s321, integrating the updated features corresponding to the plurality of character information to be sorted, and calculating the global character information features to serve as initial reference features.

And S322, calculating attention weights corresponding to the at least one character information to be sorted which is not output in the plurality of character information to be sorted according to the reference features and the updated features corresponding to the at least one character information to be sorted which is not output in the plurality of character information to be sorted.

And S323, outputting the character information to be sorted corresponding to the maximum attention weight, taking the updated feature corresponding to the character information to be sorted corresponding to the maximum attention weight as a new reference feature, and continuing to execute the attention weight calculation step until all the character information to be sorted is output.

S324, determining the output sequence of the plurality of text messages to be sorted as the reading sequence of the plurality of text messages to be sorted.

In practical applications, the graph convolution neural network model may further include an attention subnetwork. The above steps S321 and S322 are performed by the attention subnetwork.

In the above S321, in an example, the updated features corresponding to the text information to be sorted may be pooled to obtain the global text information features. Wherein, the pooling treatment can be average pooling or maximum pooling. The global character information features are fused with the features of a plurality of character information to be sequenced.

The global character information characteristics are determined, so that the first character information with the first sequence in the first character information sequence can be conveniently found out subsequently.

In another example, the graph convolution operation can be further continuously utilized to further update the updated features corresponding to the plurality of the sorted text messages to be processed, so as to obtain further updated features; and subsequently performing pooling treatment on the further updated characteristics corresponding to the plurality of character information to be sorted respectively to obtain global character information characteristics. Namely, the attention subnetwork includes a graph volume network layer and a pooling layer, and the graph volume network layer may be a fully connected network layer.

In the above S322, at least one text message to be sorted that is not yet output in the text messages to be sorted includes a third text message to be sorted. The third text information to be sorted refers to any one of the at least one text information to be sorted. Taking the feature as a vector form as an example, the reference feature and the updated feature corresponding to the third text information to be sorted can be spliced to obtain a second splicing feature; and performing point multiplication on the second splicing characteristics and the attention parameter vector to obtain the attention weight corresponding to the third character information to be sequenced.

For example: the reference feature is an n-dimensional vector; the updated features corresponding to the third character information to be sorted are m-dimensional vectors; the second stitching feature is an (n + m) -dimensional vector.

In the above S323, the text information to be sorted corresponding to the maximum attention weight is output, and the updated feature corresponding to the text information to be sorted corresponding to the maximum attention weight is used as a new reference feature.

If the plurality of the character information to be sorted are not completely output, the attention weight of the character information to be sorted which is not output currently is calculated based on the new reference characteristics.

And if the plurality of character information to be sorted are all output, stopping the attention weight calculation step.

In the above S324, the output sequence of the text messages to be sorted is also the reading sequence of the text messages to be sorted.

In the above embodiment, the text information to be sorted corresponding to the maximum attention weight is one by default, and when there are a plurality of text information to be sorted, a plurality of reading sorts may occur at this time. Namely, the graph convolution neural network model can determine a plurality of reading sequences of the plurality of text messages to be sequenced. The text messages to be sorted can be sorted according to a plurality of reading sequences respectively to obtain a plurality of text message sequences to be sorted. The method may further include: and displaying the plurality of character information sequences to be sorted on a user interface for a user to select. In addition, the model can be optimized according to the target character information sequence to be sequenced selected by the user. Specifically, the model can be subjected to model training once by combining the image to be recognized and the target character information sequence to be ordered, and the model can be optimized.

The training process of the graph convolution neural network model is as follows:

104. the method comprises the steps of obtaining a sample image and an expected text information sequence corresponding to a plurality of sample text information contained in the sample image.

105. And optimizing the graph convolution neural network model according to the sample characteristics corresponding to the sample text messages, the graph structures corresponding to the sample text messages and the sample text message sequence.

In 105, the sample features corresponding to the sample text messages and the graph structures corresponding to the sample text messages may be input into a graph convolution neural network model, and the predicted reading order corresponding to the sample text messages may be determined; sequencing the plurality of sample text messages according to the predicted reading sequence to obtain a predicted text message sequence; and performing parameter optimization on the graph convolution neural network model according to the difference between the predicted character information sequence and the expected character information sequence. The specific parameter optimization process can be referred to in the prior art, and is not described in detail herein.

In practical application, the method may further include:

106. and extracting semantic features corresponding to the plurality of recognized character information to be sorted respectively.

107. And extracting visual features corresponding to the plurality of text messages to be sorted respectively.

108. And fusing the semantic features and the visual features corresponding to the plurality of pieces of character information to be sorted to obtain the features corresponding to the plurality of pieces of character information to be sorted.

In the above 106, a natural language processing algorithm may be adopted to extract semantic features corresponding to the identified text information to be sorted.

In an implementation scheme, in the above 107, "extracting visual features corresponding to the text messages to be sorted", may specifically be implemented by the following steps:

1071. and determining the sub-image areas of the plurality of pieces of text information to be sorted from the image to be recognized according to the positions of the plurality of pieces of text information to be sorted in the image to be recognized.

1072. And respectively extracting visual features corresponding to the plurality of text messages to be sorted according to the sub-image areas where the plurality of text messages to be sorted are respectively located.

1071, the text information recognition technology is used to recognize the text information to be sorted and the position of each text information to be sorted in the image to be processed from the image to be processed.

In an example, the sub-image region where the text information to be sorted is located may be a compact rectangular frame region surrounding the text information to be sorted.

1072, the visual characteristics may include information such as font, font color, background texture, etc.

In practical applications, some conventional feature extraction algorithms can be utilized, such as: SIFT (Scale-invariant feature transform), to extract visual features.

The visual features extracted by the traditional feature extraction algorithm are low-dimensional information and not high-dimensional information, namely, the feature expressiveness is poor. To improve the expressiveness of the visual features, in one example, the trained convolutional neural network can be used to extract the visual features, for example: the sub-image areas where the plurality of pieces of text information to be sorted are respectively located can be respectively input into the trained convolutional neural network, so that the visual features corresponding to the plurality of pieces of text information to be sorted are obtained. The detailed implementation and training process of the convolutional neural network can be referred to in the prior art, and will not be described in detail herein.

In 108, the text information to be sorted includes first text information to be sorted; the semantic features and the visual features corresponding to the first to-be-sorted text information can be spliced to obtain the features corresponding to the first to-be-sorted text information.

In practical application, multiple text areas usually exist in an image to be recognized; the distance between each character area is far, the character area division can be carried out on the image to be recognized at the moment, and the characters in each character area are sequenced subsequently, so that the difficulty of subsequent sequencing can be reduced, and the sequencing accuracy can be improved. Therefore, in an example, the method may further include:

109. and identifying a plurality of character information contained in the image to be identified and the position of each character information in the image to be identified.

110. And dividing the plurality of character information by utilizing a clustering algorithm according to the position of each character information in the image to be identified to obtain at least one character information cluster.

111. And selecting a plurality of text messages in one text message cluster from the at least one text message cluster as the plurality of text messages to be sequenced.

In the above step 109, an OCR algorithm may be specifically used for implementation, and the specific implementation may refer to corresponding contents in the above embodiments, which is not described herein again.

In the above description 110, the clustering algorithm may specifically adopt a hierarchical clustering algorithm. The distance between the character information in the same character information cluster is smaller than the distance between the character information in different character information clusters. The specific implementation process of the clustering algorithm can be referred to in the prior art, and is not described herein again.

In the above 111, a plurality of text messages in one text message cluster are selected from the at least one text message cluster as the plurality of text messages to be sorted.

In actual application, the plurality of character information in each character information cluster can be sorted respectively.

An image recognition method provided by an embodiment of the present application will be described below with reference to fig. 1 a:

step 1: and performing character recognition in the image to be recognized, and recognizing that the image to be recognized contains three characters of year, section and goods.

Step 2: and respectively extracting semantic features corresponding to the three characters of year, section and goods by using a natural language processing algorithm.

And step 3: and sequencing the plurality of characters according to the semantic features corresponding to the characters to obtain a character sequence of 'annual festival'.

And 4, outputting the character sequence to an interface for displaying.

An image recognition method according to another embodiment of the present application will be described below with reference to fig. 1 b:

step a: in the image to be recognized, character recognition is carried out, and the image to be recognized contains three characters of 'year', 'section' and 'goods', and the positions of the three characters in the image to be recognized. And taking out the sub-image (namely the sub-image area) where each character is positioned according to the position of each character in the image to be recognized.

Step b: respectively extracting semantic features corresponding to three characters, namely 'year', 'section' and 'goods', by using a natural language processing algorithm; and respectively extracting visual features of the subgraphs of the characters through a Convolutional Neural Network (CNN) to obtain the visual features corresponding to the characters.

Step c: calculating the correlation between any two characters according to the similarity of the visual features and the similarity of the semantic features; based on the correlation, a graph structure is constructed.

Step d: splicing the semantic features and the visual features corresponding to the characters to obtain the features corresponding to the characters; inputting the graph structure and the characteristics corresponding to each character into the trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the three characters.

Step e: and sequencing the three characters according to the reading sequence of the three characters to obtain a character sequence of 'year festival', and outputting the character sequence to an interface for displaying.

The method does not depend on the template, and can generate a certain sequence under any typesetting condition, so that the method has a better application range. The evaluation indexes of the common horizontal text and the column typesetting text in the test set are both higher than 80%.

Yet another embodiment of the present application provides an image recognition method, including:

501. and identifying a plurality of pieces of character information to be sorted contained in the image to be identified.

502. And determining the character types of the plurality of character information to be sequenced.

503. And acquiring an arrangement rule corresponding to the character type.

504. And sequencing the plurality of character information to be sequenced according to the sequencing rule to obtain a character information sequence to be sequenced.

For the specific implementation of the above-mentioned step 501, reference may be made to corresponding contents in the above-mentioned embodiments, and details are not described herein again.

In the above 502, the arrangement rules corresponding to different character types are usually different. For example: the ancient texts in the image to be processed are generally arranged from top to bottom and from right to left; modern texts in the image to be processed are usually arranged in the order from left to right and from top to bottom.

In one example, the text types may include an ancient text type and a modern text type.

In 503, the arrangement rule corresponding to the character type may be obtained according to a correspondence relationship between the character type and the arrangement rule established in advance. Arranging rules from top to bottom and from right to left can be configured in advance for ancient types; the arrangement rules are configured from left to right and top to bottom for the type of script.

In the step 504, the text information to be sorted may be specifically sorted according to the arrangement rule and positions of the text information to be sorted in the image to be processed, so as to obtain a text information sequence to be sorted.

In the embodiment, the text information is sorted according to different arrangement rules aiming at different text types, so that the applicability and the accuracy of the sorting scheme can be effectively improved.

Here, it should be noted that: the content of each step in the method provided by the embodiment of the present application, which is not described in detail in the foregoing embodiment, may refer to the corresponding content in the foregoing embodiment, and is not described herein again. In addition, the method provided in the embodiment of the present application may further include, in addition to the above steps, other parts or all of the steps in the above embodiments, and specific reference may be made to corresponding contents in the above embodiments, which is not described herein again.

Fig. 2 is a flowchart illustrating a text sorting method according to another embodiment of the present application. The execution main body of the method can be a client or a server. The client may be hardware integrated on the terminal and having an embedded program, may also be application software installed in the terminal, and may also be tool software embedded in an operating system of the terminal, which is not limited in this embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, an intelligent sound box and the like. The server may be a common server, a cloud, a virtual server, or the like, which is not specifically limited in this embodiment of the application.

As shown in fig. 2, the method includes:

201. and acquiring a plurality of character information to be sequenced.

202. And determining the reading sequence of the plurality of text messages to be sorted by integrating the characteristics corresponding to the plurality of text messages to be sorted and the adjacency relation among the plurality of text messages to be sorted.

203. And sequencing the plurality of first text messages according to the reading sequence to obtain a first text message sequence.

201, the text information to be sorted may be identified from the image to be identified; and may also be user input.

For example: the primary school student home teaching machine can be embedded with a character sorting function, when the primary school student encounters a 'conjunctive sentence' problem in the working process, the primary school student can input a plurality of words in the subject in the home teaching machine, and the words are a plurality of character information to be sorted.

In the above 202, the adjacent relationship among the plurality of pieces of text information to be sorted may be a semantic adjacent relationship, or may be an adjacent relationship in other aspects, which is not specifically limited in this embodiment of the application.

For the specific implementation processes of 202 and 203, reference may be made to corresponding contents in the foregoing embodiments, and details are not described herein.

Optionally, the method may further include:

204. and determining the adjacency relation among the first character messages according to the respective corresponding characteristics of the first character messages.

For the specific implementation of the above 204, reference may be made to corresponding contents in the above embodiments, which are not described herein again.

Optionally, in the above 202, "determining a reading order of the text information to be sorted by synthesizing features corresponding to the text information to be sorted and an adjacency relation between the text information to be sorted", specifically, the reading order may be determined by adopting the following steps:

2021. and constructing a graph structure with nodes and edges according to the adjacency relation among the plurality of character information to be sorted.

2022. And taking the characteristics corresponding to the plurality of pieces of character information to be sorted and the graph structure as the input of a trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the plurality of pieces of character information to be sorted.

For the specific implementation process of the 2021 and the 2022, reference may be made to corresponding contents in the foregoing embodiments, and details are not described herein again.

Fig. 3 shows a block diagram of an image recognition apparatus according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:

the first identification module 301 is configured to identify a plurality of text messages to be sorted included in an image to be identified.

The first determining module 302 is configured to determine a reading order of the text messages to be sorted according to respective corresponding features of the text messages to be sorted.

Wherein the features carry semantic features.

The first sorting module 303 is configured to sort the text messages to be sorted according to the reading order, so as to obtain a text message sequence to be sorted.

Optionally, the apparatus may further include:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample image and an expected text information sequence corresponding to a plurality of sample text information contained in the sample image;

and the first optimization module is used for optimizing the graph convolution neural network model according to the sample characteristics corresponding to the sample text messages, the graph structures corresponding to the sample text messages and the sample text message sequence.

Optionally, the apparatus may further include:

the first extraction module is used for extracting the semantic features corresponding to the plurality of recognized character information to be sorted respectively; extracting visual features corresponding to the plurality of character information to be sorted respectively;

and the first fusion module is used for fusing the semantic features and the visual features corresponding to the plurality of text messages to be sorted to obtain the features corresponding to the plurality of text messages to be sorted.

Here, it should be noted that: the image recognition apparatus provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the above method embodiments, and is not described herein again.

Fig. 4 is a block diagram illustrating a structure of a text sorting apparatus according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:

a second obtaining module 401, configured to obtain a plurality of text messages to be sorted;

a second determining module 402, configured to synthesize features corresponding to the text messages to be sorted and an adjacency relationship between the text messages to be sorted, and determine a reading order of the text messages to be sorted;

the second sorting module 403 is configured to sort the plurality of first text messages according to the reading order, so as to obtain a first text message sequence.

Optionally, the apparatus may further include:

and the third determining module is used for determining the adjacency relation among the plurality of first character messages according to the respective corresponding characteristics of the plurality of first character messages.

Here, it should be noted that: the text sorting device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the above method embodiments, and is not described herein again.

Yet another embodiment of the present application provides an image recognition apparatus, including:

and the second identification module is used for identifying a plurality of character information to be sorted contained in the image to be identified.

And the fourth determining module is used for determining the character types of the plurality of character information to be sequenced.

And the third acquisition module is used for acquiring the arrangement rule corresponding to the character type.

And the third sequencing module is used for sequencing the plurality of character information to be sequenced according to the sequencing rule to obtain a character information sequence to be sequenced.

Fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown, the electronic device includes a memory 1101 and a processor 1102. The memory 1101 may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory 1101 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The memory is used for storing programs;

the processor 1102 is coupled to the memory 1101, and configured to execute the program stored in the memory 1101, so as to implement the image recognition method or the text sorting method in the foregoing embodiments.

Further, as shown in fig. 5, the electronic device further includes: communication components 1103, display 1104, power components 1105, audio components 1106, and the like. Only some of the components are schematically shown in fig. 5, and it is not meant that the electronic device comprises only the components shown in fig. 5.

Accordingly, embodiments of the present application further provide a computer-readable storage medium storing a computer program, where the computer program can implement the steps or functions of the image recognition method and the text sorting method provided in the foregoing embodiments when executed by a computer.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image recognition method, comprising:

2. The method according to claim 1, wherein determining the reading order of the text messages to be sorted according to the respective corresponding features of the text messages to be sorted comprises:

determining the adjacency relation among the plurality of text messages to be sorted;

and determining the reading sequence of the plurality of text messages to be sorted by integrating the characteristics corresponding to the plurality of text messages to be sorted and the adjacency relation among the plurality of text messages to be sorted.

3. The method according to claim 2, wherein determining the reading order of the text messages to be sorted by combining the features corresponding to the text messages to be sorted and the adjacency relationship between the text messages to be sorted comprises:

constructing a graph structure with nodes and edges according to the adjacency relation among the plurality of character information to be sequenced; the nodes in the graph structure are used for representing character information to be sorted; edges in the graph structure are used for indicating whether the nodes are adjacent or not;

and taking the characteristics corresponding to the plurality of pieces of character information to be sorted and the graph structure as the input of a trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the plurality of pieces of character information to be sorted.

4. The method of claim 3, wherein the graph convolves with a neural network model for:

obtaining updated features corresponding to the plurality of character information to be sorted by graph convolution operation according to the features corresponding to the plurality of character information to be sorted and the graph structure;

and determining the reading sequence of the plurality of text messages to be sorted according to the updated characteristics corresponding to the plurality of text messages to be sorted respectively.

5. The method according to claim 4, wherein determining the reading order of the plurality of text messages to be sorted according to the updated features corresponding to the plurality of text messages to be sorted respectively comprises:

integrating the updated features corresponding to the plurality of character information to be sorted respectively, and calculating the global character information features to be used as initial reference features;

calculating attention weights corresponding to the at least one character information to be sorted which is not output in the plurality of character information to be sorted according to the reference features and the updated features corresponding to the at least one character information to be sorted which is not output in the plurality of character information to be sorted;

outputting the character information to be sorted corresponding to the maximum attention weight, taking the updated feature corresponding to the character information to be sorted corresponding to the maximum attention weight as a new reference feature, and continuing to execute the attention weight calculation step until all the character information to be sorted is output;

and determining the output sequence of the plurality of the character information to be sorted as the reading sequence of the plurality of the character information to be sorted.

6. The method of any of claims 3 to 5, further comprising:

acquiring a sample image and an expected text information sequence corresponding to a plurality of sample text information contained in the sample image;

and optimizing the graph convolution neural network model according to the sample characteristics corresponding to the sample text messages, the graph structures corresponding to the sample text messages and the sample text message sequence.

7. The method according to any one of claims 2 to 5, wherein determining the adjacency relation among the plurality of text messages to be sorted comprises:

and determining the adjacency relation among the plurality of character information to be sorted according to the respective corresponding characteristics of the plurality of character information to be sorted.

8. The method according to claim 7, wherein determining the adjacency relationship between the text messages to be sorted according to the respective corresponding features of the text messages to be sorted comprises:

calculating the correlation between every two pieces of character information to be sorted according to the characteristics corresponding to every two pieces of character information to be sorted in the plurality of pieces of character information to be sorted;

and determining whether every two pieces of character information to be sorted are adjacent or not according to the correlation between every two pieces of character information to be sorted.

9. The method of claim 8, wherein the features further carry visual features;

calculating the correlation between every two pieces of character information to be sorted according to the corresponding characteristics of every two pieces of character information to be sorted in the plurality of pieces of character information to be sorted, wherein the calculation comprises the following steps:

calculating a first similarity between semantic features corresponding to every two pieces of character information to be sorted;

calculating a second similarity between the visual features corresponding to each two pieces of character information to be sorted;

and integrating the first similarity and the second similarity to determine the correlation between every two pieces of character information to be sorted.

10. The method of any one of claims 1 to 5, further comprising:

extracting semantic features corresponding to the plurality of recognized character information to be sorted respectively;

extracting visual features corresponding to the plurality of character information to be sorted respectively;

and fusing the semantic features and the visual features corresponding to the plurality of pieces of character information to be sorted to obtain the features corresponding to the plurality of pieces of character information to be sorted.

11. The method of claim 10, wherein extracting visual features corresponding to the text messages to be sorted respectively comprises:

determining a sub-image area where the plurality of pieces of text information to be sorted are respectively located from the image to be recognized according to the positions of the plurality of pieces of text information to be sorted in the image to be recognized;

and respectively extracting visual features corresponding to the plurality of text messages to be sorted according to the sub-image areas where the plurality of text messages to be sorted are respectively located.

12. The method according to any one of claims 1 to 5, wherein the step of identifying a plurality of text messages to be sorted contained in the image to be identified comprises the following steps:

identifying a plurality of character information contained in the image to be identified and the position of each character information in the image to be identified from the image to be identified;

dividing the plurality of character information by using a clustering algorithm according to the position of each character information in the image to be identified to obtain at least one character information cluster;

and selecting a plurality of text messages in one text message cluster from the at least one text message cluster as the plurality of text messages to be sequenced.

13. A method for sorting words, comprising:

acquiring a plurality of character information to be sequenced;

14. The method of claim 13, further comprising:

and determining the adjacency relation among the first character messages according to the respective corresponding characteristics of the first character messages.

15. The method of claim 13, wherein determining the reading order of the text messages to be sorted by combining the features corresponding to the text messages to be sorted and the adjacency relationship between the text messages to be sorted comprises:

16. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

17. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

acquiring a plurality of character information to be sequenced;

18. An image recognition method, comprising:

acquiring an arrangement rule corresponding to the character type;