CN115294578A

CN115294578A - Text information extraction method, device, equipment and medium based on artificial intelligence

Info

Publication number: CN115294578A
Application number: CN202210958496.8A
Authority: CN
Inventors: 刘东煜; 周坤胜; 张蓉
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-11-04

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a text information extraction method, a text information extraction device, text information extraction equipment and a text information extraction medium based on artificial intelligence. The method includes the steps of inputting an image to be processed into a character recognition model to obtain recognition characters and corresponding center points, determining associated characters according to the center points of the recognition characters, determining boundary information of the recognition characters according to the center points of the associated characters, determining boundary characteristic values of the recognition characters according to the boundary information, inputting the embedded vectors and the recognition characters into a language model as embedded vectors, obtaining text information extraction results, conducting center point prediction on the recognition characters, enabling the character recognition model to partition the characters more accurately, improving accuracy of character recognition, meanwhile, constructing the boundary characteristic vectors according to the boundary information of the recognition characters, providing effective position information for the language model, improving accuracy of semantic analysis and improving accuracy of text information extraction.

Description

Text information extraction method, device, equipment and medium based on artificial intelligence

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a text information extraction method, a text information extraction device, text information extraction equipment and a text information extraction medium based on artificial intelligence.

Background

At present, with the development of artificial intelligence technology, the way of digitizing text information into a computer by manual entry is gradually changed into the way of recognizing and extracting text information by a machine, usually, an Optical Character Recognition technology based on a bounding box is adopted to perform machine Recognition, optical Character Recognition (OCR) refers to a process of translating the shapes of printed characters into computer characters, in the OCR process, closer printed characters are divided into the same bounding box, and the printed characters in the same bounding box are treated as one word or one sentence, so that the accuracy of text Recognition can be improved.

However, the space between printed characters in a printed text is usually uncertain, and there may be a plurality of words or sentences in the divided bounding box, and since context information is considered in the character recognition process, the context information is messed when such a situation occurs, so that irrelevant information is introduced in the character recognition, and the accuracy of extracting text information is low, therefore, how to improve the accuracy of extracting text information becomes a problem which needs to be solved urgently.

Disclosure of Invention

In view of this, embodiments of the present invention provide a text information extraction method, apparatus, device and medium based on artificial intelligence, so as to solve the problem of low accuracy of text information extraction.

In a first aspect, an embodiment of the present invention provides a text information extraction method based on artificial intelligence, where the text information extraction method includes:

inputting the acquired image to be processed into a trained character recognition model to obtain a recognition result, wherein the recognition result comprises at least one recognition character and a central point corresponding to the recognition character;

aiming at any recognition character, determining the recognition character adjacent to the recognition character as an associated character according to the central point of the recognition character;

determining boundary information of the recognition character according to the central point of the associated character;

when the boundary information is detected to meet the preset condition, determining the boundary characteristic value of the recognition character as a first characteristic value, otherwise, determining the boundary characteristic value of the recognition character as a second characteristic value to obtain the boundary characteristic value of each recognition character;

and taking a boundary characteristic vector consisting of the boundary characteristic values of all the recognition characters as an embedded vector, and inputting a character sequence consisting of the embedded vector and the recognition characters into a trained language model to obtain a text information extraction result.

In a second aspect, an embodiment of the present invention provides an artificial intelligence-based text information extraction apparatus, where the text information extraction apparatus includes:

the character recognition module is used for inputting the acquired image to be processed into a trained character recognition model to obtain a recognition result, and the recognition result comprises at least one recognition character and a central point corresponding to the recognition character;

the character association module is used for determining the identification character adjacent to the identification character as an association character according to the central point of the identification character aiming at any identification character;

the boundary determining module is used for determining the boundary information of the identification character according to the central point of the associated character;

the characteristic value determining module is used for determining the boundary characteristic value of the recognition character as a first characteristic value when the boundary information is detected to meet the preset condition, otherwise, determining the boundary characteristic value of the recognition character as a second characteristic value to obtain the boundary characteristic value of each recognition character;

and the information extraction module is used for taking a boundary characteristic vector consisting of the boundary characteristic values of all the recognition characters as an embedded vector, and inputting a character sequence consisting of the embedded vector and the recognition characters into a trained language model to obtain a text information extraction result.

In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor implements the text information extraction method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements the text information extraction method according to the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

inputting an acquired image to be processed into a trained character recognition model to obtain a recognition result, wherein the recognition result comprises at least one recognition character and a central point corresponding to the recognition character, determining the recognition character adjacent to the recognition character as an associated character according to the central point of the recognition character aiming at any recognition character, determining the boundary information of the recognition character according to the central point of the associated character, determining the boundary characteristic value of the recognition character as a first characteristic value when the boundary information is detected to meet a preset condition, otherwise, determining the boundary characteristic value of the recognition character as a second characteristic value to obtain the boundary characteristic value of each recognition character, taking the boundary characteristic vector consisting of the boundary characteristic values of all the recognition characters as an embedded vector, inputting a character sequence consisting of the embedded vector and the recognition character into the trained language model to obtain a text information extraction result, performing central point prediction on each recognition character to enable the character recognition model to more accurately segment the character, improving the accuracy of character recognition, and simultaneously constructing the boundary characteristic vector according to provide effective position information for the language model, improving the accuracy of semantic analysis, thereby improving the accuracy of text information extraction.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of an application environment of a text information extraction method based on artificial intelligence according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a text information extraction method based on artificial intelligence according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a text information extraction method based on artificial intelligence according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of an artificial intelligence-based text information extraction apparatus according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present specification and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

It should be understood that, the sequence numbers of the steps in the following embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

The text information extraction method based on artificial intelligence provided by the embodiment of the invention can be applied to the application environment shown in fig. 1, wherein a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud terminal device, a Personal Digital Assistant (PDA), and other computer devices. The server can be implemented by an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 2, which is a schematic flow chart of a text information extraction method based on artificial intelligence according to an embodiment of the present invention, the text information extraction method may be applied to a client side in fig. 1, a computer device corresponding to the client side is connected to a server side to obtain an image to be processed, where the image is to be subjected to text information extraction, a trained character recognition model and a trained language model are deployed in the computer device corresponding to the client side, the trained character recognition model may be used to recognize a printed text as a computer character, and the trained language model may be used to obtain semantic information according to computer character analysis. As shown in fig. 2, the text information extracting method may include the steps of:

step S201, inputting the acquired image to be processed into the trained character recognition model to obtain a recognition result.

The image to be processed can be a printed text image needing semantic extraction, the character recognition model can be an optical character recognition model, the optical character recognition model can comprise a text detection model and a text recognition model, and the recognition result comprises at least one recognition character and a central point corresponding to the recognition character.

Specifically, before the image to be processed is input into the trained character recognition model, the image to be processed is subjected to size normalization operation, that is, the size of the image to be processed is scaled and converted into a fixed size, so as to adapt to the input parameters of the trained character recognition model.

Optionally, the trained character recognition model includes a trained text detection model and a trained text recognition model;

inputting the acquired image to be processed into a trained character recognition model, and obtaining a recognition result comprises the following steps:

inputting the image to be processed into a trained text detection model to obtain a bounding box positioning point;

determining an enclosing frame according to the enclosing frame positioning point, and cutting the to-be-processed image according to the enclosing frame to obtain a to-be-processed region image;

and inputting the image of the area to be processed into the trained text recognition model to obtain at least one recognition character and a central point corresponding to the recognition character.

The text detection model may be configured to determine a position of a text to be recognized in an image, the bounding box locating point may include an upper left corner of the bounding box and a lower right corner of the bounding box, the bounding box may be a rectangular box in the image to be processed, and the image of the region to be processed may be an image that only includes the text to be recognized.

The text recognition model may be used to determine a computer character corresponding to the text to be recognized, the recognition character may refer to a computer character corresponding to the text to be recognized, and the central point of the recognition character may refer to a central point of a corresponding region of the recognition character in the image to be processed.

Specifically, the text detection model may adopt a target detection model, such as a Faster R-CNN model, an SSD (Single Shot multi box Detector) model, an FPN (Feature Pyramid Net) model, and the like, and the text detection model may obtain a bounding box positioning point, for example, the obtained upper left corner point is (x) corner point ₁ ,y ₁ ) Lower right partCorner point is (x) ₂ ,y ₂ ) Then, the lower left corner point and the upper right corner point can be determined by the upper left corner point and the lower right corner point, i.e. the lower left corner point is (x) ₁ ,y ₂ ) The upper right corner point is (x) ₂ ,y ₁ ) Connecting any two corner points with the same abscissa and ordinate, for example, if the lower left corner is the same as the abscissa of the upper left corner, connecting the lower left corner with the upper left corner, if the lower left corner is different from the upper right corner, not connecting the lower left corner with the upper right corner, and the result obtained by the connection is the edge of the bounding box.

In this embodiment, the bounding boxes are all rectangular boxes, that is, the sides of the bounding boxes are all parallel to the X-axis or Y-axis of the image coordinate system.

In this embodiment, in order to ensure that the image size after clipping is consistent, clipping may be implemented in a masking manner, that is, a masking image is generated according to a bounding box, the pixel value of a pixel point in the bounding box region in the masking image is 1, the pixel values of other pixel points are 0, the masking image is multiplied by the image to be processed point by point, the pixel value of a pixel point in the bounding box region in the image to be processed is retained, and the pixel values of other pixel points are set to 0, so that the clipping effect is achieved while the size of the clipped image is ensured.

In the embodiment, the surrounding box is adopted to position the text, so that the image to be processed is cut, and only the region containing the text information is subjected to character recognition, so that irrelevant information in the image to be processed is isolated, and the accuracy of character recognition is improved.

Optionally, the sample image of the region to be processed is used as a training sample during training of the text recognition model, the actual character is used as a training label during training of the text recognition model, and the cross entropy loss is used as a loss function during training of the text recognition model;

the training process of the text recognition model comprises the following steps:

dividing a sample image of a region to be processed into M sub-sample images according to a preset step length;

inputting the sub-sample image into a text recognition model aiming at any sub-sample image to obtain an initial sample character;

and calculating cross entropy loss according to the initial sample characters and the actual characters, updating parameters of the text recognition model by adopting a gradient descent method according to the cross entropy loss until the cross entropy loss is converged, and obtaining the preliminarily trained text recognition model.

The to-be-processed area sample image may be an image obtained by processing and cutting the historical printed text as a sample through the character detection model, the sub-sample image may be an image obtained by dividing the to-be-processed area sample image according to a preset step length, and the actual character may be a real character corresponding to the text, that is, the character corresponding to the text is known.

The preset step size may refer to a segmentation step size of a character, the sub-sample image may refer to an image including partial information of a single character, the initial sample character may refer to a character recognition result of the corresponding sub-sample image, and the gradient descent method may refer to a random gradient descent method.

Specifically, the size of the sample image of the area to be processed is consistent with the size of the sample image of the area to be processed, and the characters in the sample image of the area to be processed may be complete characters or incomplete characters.

The embodiment performs the preliminary training of the text recognition model by using the sample image of the area to be processed, and can effectively recognize incomplete characters, thereby improving the accuracy of character recognition.

Optionally, after obtaining the preliminarily trained text recognition model, the method further includes:

aiming at any sub-sample image, inputting the sub-sample image into a preliminarily trained text recognition model to obtain an updated sample character and a central point corresponding to the updated sample character;

combining the sub-sample images belonging to the same updated sample character to obtain N updated sub-sample images, and extracting the central point of each updated sub-sample image as a central point label of the identification character;

and calculating cross entropy loss according to the central point of the corresponding updated sample character and the central point label of the identification character, and updating the parameters of the text identification model by adopting a gradient descent method according to the cross entropy loss until the cross entropy loss is converged to obtain the trained text identification model.

The updated sample character may refer to a preliminary recognition result corresponding to the sub-sample image, and the central point of the updated sample character may refer to a prediction result of the central position of the updated sample character range.

Merging may refer to treating the sub-sample images to be merged as one image, that is, updating the sub-sample images may refer to images including complete character information, and the center points of the updated sub-sample images may include the mean value of the center points of all the sub-sample images to be merged, and in this embodiment, the mean value may refer to the horizontal coordinate mean value.

Specifically, in this embodiment, a central point prediction branch is added, that is, the text recognition model includes a recognition branch and a central point prediction branch, the recognition branch and the central point prediction branch share one feature extraction encoder to obtain a text feature vector, and the text feature vector is respectively input into a recognition full-link layer of the recognition branch and a prediction full-link layer of the central point prediction branch.

It should be noted that random position disturbance can be performed on the subsample image, in this embodiment, the abscissa disturbance range is set to [0,2], and the unit is the number of pixels, that is, the subsample image is translated by 0 to 2 pixels, so that the effect of data expansion is achieved, and the robustness and accuracy of the trained character recognition model are increased.

In the embodiment, the identification character central point label can be obtained by calculating the central point of the sub-sample image to be combined, so that manual marking is not needed, the prediction content of the identification character model is improved, and the training efficiency of the identification character model is improved.

The step of inputting the acquired image to be processed into the trained character recognition model to obtain the recognition result converts the image to be processed of the printed text into the computer character, thereby facilitating semantic extraction of the printed text in the form of the computer character, effectively reducing the analysis difficulty of the subsequent language model and improving the accuracy of text information extraction.

Step S202, aiming at any recognition character, determining the recognition character adjacent to the recognition character as an associated character according to the central point of the recognition character.

Wherein the associated character may refer to a character adjacent to the recognized character.

Specifically, after the center point of each recognition character is obtained, for any recognition character, the center point thereof is set to (x) _a ,y _a ) According to the abscissa x of its center point _a And searching for a correlation abscissa with the smallest absolute value of the difference between the correlation abscissa and the abscissa, wherein the identification character corresponding to the correlation abscissa is the correlation character.

It should be noted that, when searching for the associated abscissa having the smallest difference absolute value from the abscissa, the associated abscissa having the smallest difference absolute value of the first two is retained, and it is detected whether both the two associated abscissas are larger than the abscissa, if both the two associated abscissas are larger than the abscissa, the identification character corresponding to the smaller value of the two associated abscissas is determined as the associated character, and if one of the two associated abscissas is larger than the abscissa, the identification characters corresponding to both the two associated abscissas are determined as the associated characters.

The step of determining the recognition character adjacent to the recognition character as the associated character according to the central point of the recognition character aiming at any recognition character is performed, and the associated character is determined through the central point of the recognition character, so that the position information can be conveniently provided for semantic extraction of a language model subsequently, and the accuracy of text information extraction is improved.

Step S203, determining the boundary information of the recognition character according to the central point of the associated character.

The boundary information may refer to a segmentation line between the recognition character and a recognition character with irrelevant semantics, and in this embodiment, the segmentation line may refer to a straight line parallel to the Y axis.

Optionally, determining boundary information of the recognition character according to the center point of the associated character includes:

comparing the horizontal coordinate of the central point of the associated character with the horizontal coordinate of the central point of the identification character, if the horizontal coordinate of the central point of the associated character is smaller than the horizontal coordinate of the central point of the identification character, determining that the associated character is a left associated character, otherwise, determining that the associated character is a right associated character;

calculating a first mean value of the horizontal coordinate of the center point of the corresponding left associated character and the horizontal coordinate of the center point of the corresponding recognized character, and determining the first mean value as the horizontal coordinate of the left boundary;

calculating a second average value of the abscissa of the center point of the corresponding right associated character and the abscissa of the center point of the corresponding recognition character, and determining the second average value as the abscissa of the right boundary;

and determining the left boundary abscissa and the right boundary abscissa as boundary information of the recognition character.

Wherein the left associated character may refer to the associated character to the left of the recognition character and the right associated character may refer to the associated character to the right of the recognition character.

The left boundary abscissa may refer to an abscissa used to determine a left boundary straight line, and the right boundary abscissa may refer to an abscissa used to determine a right boundary straight line.

Specifically, let the abscissa of the center point of the recognized character be x _a The abscissa of the center point of the associated character is x _b For convenience of calculation, the image coordinate system is set to use the lower left corner of the image as the origin, the ray pointing to the upper left corner at the lower left corner of the image as the Y-axis, and the ray pointing to the lower right corner at the lower left corner of the image as the X-axis, so that the horizontal coordinates of the center points of the identification characters are all larger than 0, for example, if X is greater than 0 _a Less than x _b The associated character is a right associated character if x _a Greater than x _b The associated character is left associated character with abscissa mean value

For the boundary abscissa corresponding to the associated character, since the boundary straight lines are all parallel to the Y-axis in the embodiment, the boundary straight lines are expressed as

Note that, since the recognition character may have only one associated character, one of the left and right abscissa of the boundary is used as the boundary information.

According to the method and the device, the categories of the associated characters are divided, the left boundary horizontal coordinate and the right boundary horizontal coordinate are calculated, and the boundary information is further refined, so that the representation capability of the boundary information on the character position information is improved, and the accuracy of text information extraction is improved.

The step of determining the boundary information of the recognized characters according to the central points of the associated characters determines the boundary information through the central points, so that the characters are more accurately segmented according to the position information, and the situation that text information extraction is performed on the characters with irrelevant semantics through a language model, so that the characters are mutually interfered, and the text information extraction accuracy is low is avoided.

And step S204, when the boundary information is detected to meet the preset condition, determining the boundary characteristic value of the recognition character as a first characteristic value, otherwise, determining the boundary characteristic value of the recognition character as a second characteristic value to obtain the boundary characteristic value of each recognition character.

The boundary feature value may refer to an encoding value used to represent boundary information of the recognition character, for example, in this embodiment, the first feature value is set to 1, and the second feature value is set to 0. The preset condition may refer to a condition for judging the boundary information as a division boundary between character sequences.

Optionally, the process of detecting whether the boundary information satisfies the preset condition includes:

when the left associated character of the identification character is detected, calculating a difference value between a left boundary abscissa of the identification character and a right boundary abscissa of the left associated character to obtain a first difference value;

when the recognition character is detected to have a right associated character, calculating a difference value between the abscissa of the right boundary of the recognition character and the abscissa of the left boundary of the right associated character to obtain a second difference value;

and calculating the ratio of the first difference value to the second difference value, comparing the calculation result with a preset threshold value, and if the calculation result is greater than the preset threshold value, determining the right boundary abscissa of the left associated character and the left boundary abscissa of the right associated character as segmentation boundary information.

The first difference value may be used to represent a distance between the recognition character and the left associated character in the image to be processed, the second difference value may be used to represent a distance between the recognition character and the right associated character in the image to be processed, and the preset threshold may be used to determine whether the recognition character and the associated character are the same character string, for example, in this embodiment, the preset threshold is set to 0.6.

When the calculation result is larger than the preset threshold value, the distance between the two recognition characters is larger, the distance between the characters belonging to the same character string is smaller according to the printing habit of the printed text, the recognition characters can be further judged to be split from the middle of the two recognition characters, the character sequence is divided into two character strings, each pair of adjacent recognition characters is repeatedly split and judged, and finally a plurality of character strings split from the character sequence can be obtained.

In the embodiment, whether the boundary information is a segmentation boundary is judged by comparing the threshold values, the calculation speed is high, the mode is simple, and the threshold values can be flexibly adjusted according to actual conditions, so that the efficiency and the accuracy of extracting the text information are improved.

And if the boundary information meets the preset conditions, determining the boundary characteristic value of the recognition character as a first characteristic value, otherwise, determining the boundary characteristic value of the recognition character as a second characteristic value to obtain the boundary characteristic value of each recognition character, judging whether the boundary information is a segmentation boundary between the characters according to the preset conditions, and further segmenting the character sequence, thereby providing character segmentation information for a subsequent language model and further improving the accuracy of text information extraction.

Step S205, using the boundary feature vector composed of the boundary feature values of all the recognition characters as an embedded vector, and inputting the character sequence composed of the embedded vector and the recognition characters into the trained language model to obtain the text information extraction result.

The boundary feature vector may refer to a 1 × K-dimensional vector composed of boundary feature values, that is, a row of K columns of vectors, and K may refer to the number of recognition characters.

Specifically, an embedded vector is input into a first encoder to perform feature extraction, an embedded feature vector is obtained, a character sequence formed by identifying characters is input into a second encoder to perform feature extraction, a character feature vector is obtained, after feature fusion is performed on the embedded feature vector and the character feature vector, a feature fusion result is input into a circulating network model to perform semantic extraction, and semantic representation, namely a text information extraction result, is obtained.

The step of obtaining the text information extraction result by inputting the character sequence composed of the embedded vector and the recognition characters into the trained language model and extracting the text information through the multi-dimensional feature cooperation is performed by taking the boundary feature vector composed of the boundary feature values of all the recognition characters as the embedded vector, so that the richness of the feature of the recognition characters is improved, further more feature information is provided for semantic extraction, and the accuracy of text information extraction can be effectively improved.

The embodiment predicts the central point of each recognition character, so that the character recognition model can more accurately segment the characters, the accuracy of character recognition is improved, meanwhile, the boundary feature vector is constructed according to the boundary information of the recognition characters, effective position information is provided for the language model, the accuracy of semantic analysis is improved, and the accuracy of text information extraction is improved.

Referring to fig. 3, which is a schematic flow chart of a text information extraction method based on artificial intelligence according to a second embodiment of the present invention, in the text information extraction method, when a character sequence composed of an embedded vector and a recognition character is input into a trained language model, the character sequence may be directly input into the trained language model as an input quantity, or a feature of the character sequence may be constructed, and then the constructed feature may be input into the trained language model as an input quantity.

The process of directly inputting the character sequence as the input quantity into the trained language model is described in the first embodiment, and is not described herein again.

The process of constructing the character sequence and inputting the constructed character sequence as the input quantity into the trained language model comprises the following steps:

step S301, carrying out term matching on all recognition characters in the character sequence through presetting a title term, and distributing corresponding identifications for the recognition characters which are successfully matched according to the matched title term to obtain identification vectors;

step S302, converting the character sequence into a text word vector through word vector embedding;

and step S303, performing feature fusion on the text word vector, the identification vector and the embedded vector, and inputting a feature fusion result into the trained language model to obtain a text information extraction result.

The preset title terms may refer to a pre-stored common title database, and the common titles may include basic titles such as "date", "number" and "name", and specific application titles such as "amount" and "time limit". The term matching may refer to regular matching, and the identifier may refer to coding information, and in this embodiment, coding information in a digital form is used, that is, different numbers correspond to different terms.

Word vector embedding may refer to a process of converting computer characters into Word vectors, and may employ, for example, a language Representation model (BERT), a Word vector conversion model (Word 2 Vec), and the like.

In this embodiment, the feature fusion may refer to feature concatenation, that is, different features are spliced end to end according to dimensions, the feature fusion result may be a feature vector after the text word vector, the identification vector, and the embedded vector are concatenated, and the text information extraction result may refer to semantic information.

In one embodiment, feature fusion may refer to feature point-by-point multiplication or feature point-by-point accumulation.

In the embodiment, the character sequence is used for feature construction, and the constructed features are input into the trained language model as input quantity, so that the characterization capability of the features corresponding to the character sequence is improved, the condition that the features of the character sequence with high information richness cannot correspond to semantic information through the language model is avoided as much as possible, and the accuracy of text information extraction is improved.

Fig. 4 shows a structural block diagram of a text information extraction device based on artificial intelligence according to a third embodiment of the present invention, where the text information extraction device is applied to a client, a computer device corresponding to the client is connected to a server to obtain an image to be processed, where the image needs to be subjected to text information extraction, a trained character recognition model and a trained language model are deployed in the computer device corresponding to the client, the trained character recognition model may be used to recognize a printed text as a computer character, and the trained language model may be used to obtain semantic information according to computer character analysis. For ease of illustration, only portions relevant to embodiments of the present invention are shown.

Referring to fig. 4, the text information extracting apparatus includes:

the character recognition module 41 is configured to input the acquired image to be processed into a trained character recognition model to obtain a recognition result, where the recognition result includes at least one recognition character and a central point corresponding to the recognition character;

the character association module 42 is configured to determine, for any recognition character, a recognition character adjacent to the recognition character as an associated character according to a center point of the recognition character;

a boundary determining module 43, configured to determine boundary information of the recognition character according to a center point of the associated character;

the feature value determining module 44 is configured to determine, when it is detected that the boundary information satisfies the preset condition, that the boundary feature value of the identification character is a first feature value, and otherwise, determine, as a second feature value, that the boundary feature value of the identification character is a second feature value, to obtain a boundary feature value of each identification character;

and the information extraction module 45 is configured to use a boundary feature vector formed by the boundary feature values of all the recognition characters as an embedded vector, and input a character sequence formed by the embedded vector and the recognition characters into the trained language model to obtain a text information extraction result.

the character recognition module 41 includes:

the text positioning unit is used for inputting the image to be processed into the trained text detection model to obtain the bounding box positioning point;

the image clipping unit determines the surrounding frame according to the surrounding frame positioning point and clips the surrounding frame from the image to be processed to obtain an image of the area to be processed;

and the text recognition unit is used for inputting the image of the area to be processed into the trained text recognition model to obtain at least one recognition character and a central point corresponding to the recognition character.

the text information extraction device further includes:

the sample dividing module is used for dividing the sample image of the area to be processed into M sub-sample images according to a preset step length;

the first sample identification module is used for inputting the subsample images into the text identification model aiming at any subsample image to obtain initial sample characters;

and the first training module is used for calculating cross entropy loss according to the initial sample characters and the actual characters, updating parameters of the text recognition model by adopting a gradient descent method according to the cross entropy loss until the cross entropy loss is converged, and obtaining the preliminarily trained text recognition model.

Optionally, the text information extracting apparatus further includes:

the second sample recognition module is used for inputting the sub-sample images into the preliminarily trained text recognition model aiming at any sub-sample image to obtain updated sample characters and central points corresponding to the updated sample characters;

the sample merging module is used for merging the sub-sample images belonging to the same updated sample character to obtain N updated sub-sample images, and extracting the central point of each updated sub-sample image as a central point label of the identification character;

and the second training module is used for calculating cross entropy loss according to the central point of the corresponding updated sample character and the central point label of the identification character, updating the parameters of the text identification model by adopting a gradient descent method based on the cross entropy loss until the cross entropy loss is converged, and obtaining the trained text identification model.

Optionally, the boundary determining module 43 includes:

the association determining unit is used for comparing the central point abscissa of the associated character with the central point abscissa of the identification character, if the central point abscissa of the associated character is smaller than the central point abscissa of the identification character, the associated character is determined to be a left associated character, and if not, the associated character is determined to be a right associated character;

the first mean value calculating unit is used for calculating a first mean value corresponding to the abscissa of the center point of the left associated character and the abscissa of the center point of the corresponding identified character, and determining the first mean value as the abscissa of the left boundary;

the second mean value calculating unit is used for calculating a second mean value corresponding to the abscissa of the center point of the right associated character and the abscissa of the center point of the corresponding recognized character, and determining the second mean value as the abscissa of the right boundary;

and the boundary information acquisition unit is used for determining the left boundary abscissa and the right boundary abscissa as the boundary information of the recognition character.

Optionally, the characteristic value determining module 44 includes:

the first difference value calculating unit is used for calculating the difference value between the left boundary abscissa of the recognition character and the right boundary abscissa of the left associated character when the recognition character is detected to have the left associated character, so as to obtain a first difference value;

the second difference value calculating unit is used for calculating the difference value between the abscissa of the right boundary of the recognition character and the abscissa of the left boundary of the right correlation character when the recognition character is detected to have the right correlation character, so as to obtain a second difference value;

and the threshold value comparison unit is used for calculating the ratio of the first difference value to the second difference value, comparing the calculation result with a preset threshold value, and if the calculation result is greater than the preset threshold value, determining the right boundary abscissa of the left associated character and the left boundary abscissa of the right associated character as the segmentation boundary information.

Optionally, the information extracting module 45 includes:

the identification vector determining unit is used for carrying out lexical item matching on all the identification characters in the character sequence through presetting the title lexical items, and distributing corresponding identifications for the identification characters which are successfully matched according to the matched title lexical items to obtain identification vectors;

the word vector conversion unit is used for converting the character sequence into a text word vector through word vector embedding;

and the feature fusion unit is used for performing feature fusion on the text word vector, the identification vector and the embedded vector, and inputting a feature fusion result into the trained language model to obtain a text information extraction result.

It should be noted that, because the above-mentioned modules and units are based on the same concept, and their specific functions and technical effects are brought about by the method embodiment of the present invention, reference may be made to the method embodiment part specifically, and details are not described here again.

Fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. As shown in fig. 5, the computer apparatus of this embodiment includes: at least one processor (only one shown in fig. 5), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various text information extraction method embodiments described above when executing the computer program.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 5 is merely an example of a computer device and is not intended to be limiting, and that a computer device may include more or fewer components than those shown, or some components may be combined, or different components may be included, such as a network interface, a display screen, and input devices, etc.

The Processor may be a CPU, or other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes readable storage medium, internal memory, etc., where the internal memory may be a memory of the computer device, and the internal memory provides an environment for the operating system and the execution of computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of the computer device, and in other embodiments may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used for storing an operating system, application programs, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. For the specific working processes of the units and modules in the above-mentioned apparatus, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method of the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used for instructing relevant hardware, and when the computer program is executed by a processor, the steps of the above method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, recording medium, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

The present invention can also be implemented by a computer program product, which when executed on a computer device causes the computer device to implement all or part of the processes in the method of the above embodiments.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A text information extraction method based on artificial intelligence is characterized by comprising the following steps:

and taking a boundary characteristic vector consisting of boundary characteristic values of all the recognition characters as an embedded vector, and inputting a character sequence consisting of the embedded vector and the recognition characters into a trained language model to obtain a text information extraction result.

2. The method of extracting text information according to claim 1, wherein the trained character recognition model includes a trained text detection model and a trained text recognition model;

the step of inputting the acquired image to be processed into the trained character recognition model to obtain a recognition result comprises the following steps:

and inputting the image of the area to be processed into the trained text recognition model to obtain at least one recognition character and a central point of the corresponding recognition character.

3. The method for extracting text information according to claim 2, wherein a sample image of an area to be processed is taken as a training sample during training of the text recognition model, an actual character is taken as a training label during training of the text recognition model, and cross entropy loss is taken as a loss function during training of the text recognition model;

dividing the sample image of the area to be processed into M sub-sample images according to a preset step length;

inputting the sub-sample image into the text recognition model aiming at any sub-sample image to obtain an initial sample character;

and calculating the cross entropy loss according to the initial sample characters and the actual characters, and updating parameters of the text recognition model by adopting a gradient descent method according to the cross entropy loss until the cross entropy loss is converged to obtain a preliminarily trained text recognition model.

4. The method of claim 3, further comprising, after the obtaining the preliminarily trained text recognition model:

aiming at any sub-sample image, inputting the sub-sample image into the preliminarily trained text recognition model to obtain an updated sample character and a central point corresponding to the updated sample character;

and calculating the cross entropy loss according to the center point of the corresponding updated sample character and the identification character center point label, and updating the parameters of the text identification model by adopting a gradient descent method based on the cross entropy loss until the cross entropy loss is converged to obtain the trained text identification model.

5. The method according to any one of claims 1 to 4, wherein the determining boundary information of the recognition character according to the center point of the associated character includes:

calculating a first mean value of the horizontal coordinate of the center point of the corresponding left associated character and the horizontal coordinate of the center point of the corresponding recognition character, and determining the first mean value as the horizontal coordinate of the left boundary;

calculating a second mean value of the horizontal coordinate of the center point of the corresponding right associated character and the horizontal coordinate of the center point of the corresponding recognition character, and determining the second mean value as the horizontal coordinate of a right boundary;

6. The method according to claim 5, wherein the process of detecting whether the boundary information satisfies a preset condition includes:

when the recognition character is detected to have the right associated character, calculating a difference value between the right boundary abscissa of the recognition character and the left boundary abscissa of the right associated character to obtain a second difference value;

7. The method of claim 6, wherein the inputting the character sequence composed of the embedded vector and the recognition character into a trained language model to obtain the text information extraction result comprises:

performing term matching on all recognition characters in the character sequence through a preset title term, and allocating corresponding identifications to the recognition characters which are successfully matched according to the matched title term to obtain identification vectors;

converting the character sequence into a text word vector by word vector embedding;

and performing feature fusion on the text word vector, the identification vector and the embedded vector, and inputting a feature fusion result into the trained language model to obtain a text information extraction result.

8. An artificial intelligence-based text information extraction device, characterized by comprising:

the characteristic value determining module is used for determining the boundary characteristic value of the identification character as a first characteristic value when the boundary information is detected to meet the preset condition, otherwise, determining the boundary characteristic value of the identification character as a second characteristic value to obtain the boundary characteristic value of each identification character;

9. A computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the text information extraction method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the text information extraction method according to any one of claims 1 to 7.