CN115730043A - Text recognition method and device and computer readable storage medium - Google Patents

Text recognition method and device and computer readable storage medium Download PDF

Info

Publication number
CN115730043A
CN115730043A CN202211580848.7A CN202211580848A CN115730043A CN 115730043 A CN115730043 A CN 115730043A CN 202211580848 A CN202211580848 A CN 202211580848A CN 115730043 A CN115730043 A CN 115730043A
Authority
CN
China
Prior art keywords
target
language model
matrix
result
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211580848.7A
Other languages
Chinese (zh)
Inventor
范艳
吴伟华
禹世杰
叶桔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN HARZONE TECHNOLOGY CO LTD
Original Assignee
SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN HARZONE TECHNOLOGY CO LTD filed Critical SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority to CN202211580848.7A priority Critical patent/CN115730043A/en
Publication of CN115730043A publication Critical patent/CN115730043A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a text recognition method, a text recognition device and a computer readable storage medium, wherein the method comprises the following steps: acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers; inputting the target unstructured text into a target language model for prediction to obtain a prediction result, wherein the target language model is a language model obtained by adding noise to a pre-training language model and performing disturbance, and the prediction result comprises at least one triple between an entity and a relation; decoding the prediction result according to a preset decoding strategy based on the entity and relation of the angle marker to obtain a target triple; and determining a target structured text corresponding to the target unstructured text according to the target triple. By adopting the text structuring method and device, the text structuring precision can be improved.

Description

Text recognition method and device and computer readable storage medium
Technical Field
The present application relates to the field of natural language processing technology or computer technology, and in particular, to a text recognition method, apparatus, and computer-readable storage medium.
Background
Information extraction aims at extracting structured information from large-scale unstructured natural language text. Text structuring is one of the important subtasks in it, the main purpose of which is to identify interesting entities from the text and to extract semantic relationships between entities. For example, the sentence "the author of the novel cliff is the jingler" includes an entity pair (the cliff and the jingler), and the relationship between the two entities is the "author", but at present, the precision of text structuring is not high, and therefore, the problem of how to improve the precision of text structuring needs to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a text recognition method, a text recognition device and a computer-readable storage medium, which can improve the text structuring precision.
In a first aspect, an embodiment of the present application provides a text recognition method, where the method includes:
acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers;
inputting the target unstructured text into a target language model for prediction to obtain a prediction result, wherein the target language model is a language model obtained by adding noise to a pre-training language model and performing disturbance, and the prediction result comprises at least one triple between an entity and a relation;
decoding the prediction result according to a preset decoding strategy based on the entity and relation of the angle marker to obtain a target triple;
and determining a target structured text corresponding to the target unstructured text according to the target triple.
In a second aspect, an embodiment of the present application provides a text recognition apparatus, where the apparatus includes: an acquisition unit, a prediction unit, a decoding unit, and a determination unit, wherein,
the acquisition unit is used for acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers;
the prediction unit is used for inputting the target unstructured text into a target language model for prediction to obtain a prediction result, the target language model is a language model obtained by adding noise to a pre-training language model and performing disturbance, and the prediction result comprises at least one triple between an entity and a relation;
the decoding unit is used for decoding the prediction result according to a preset decoding strategy based on the entity and the relation of the angle marker to obtain a target triple;
the determining unit is configured to determine a target structured text corresponding to the target unstructured text according to the target triple.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for executing the steps in the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform some or all of the steps described in the first aspect of the embodiment of the present application.
In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
The embodiment of the application has the following beneficial effects:
it can be seen that, in the text recognition method, apparatus, and computer-readable storage medium described in the embodiments of the present application, a target unstructured text is obtained, where the target unstructured text includes L tokens and K predefined relationships, where L and K are both positive integers, the target unstructured text is input to a target language model for prediction to obtain a prediction result, the target language model is a language model obtained by adding noise to a pre-trained language model for perturbation, and the prediction result includes at least one triplet between an entity and a relationship, the prediction result is decoded according to a preset decoding strategy based on an entity and a relationship of an angle marker to obtain a target triplet, and a target structured text corresponding to the target unstructured text is determined according to the target triplet.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1A is a schematic flowchart of a text recognition method according to an embodiment of the present application;
FIG. 1B is a schematic diagram illustrating a representation of noise-plus-interference of a language model according to an embodiment of the present application;
fig. 1C is a schematic diagram illustrating an exemplary effect of a classifier according to an embodiment of the present disclosure;
fig. 1D is a schematic diagram illustrating a text recognition method according to an embodiment of the present application;
FIG. 2 is a schematic flowchart of another text recognition method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 4 is a block diagram illustrating functional units of a text recognition apparatus according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.
The electronic Devices described in the embodiments of the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a car data recorder, a server, a notebook computer, a Mobile Internet device (MID, mobile Internet Devices), or a wearable device (e.g., a smart watch, a bluetooth headset), which are merely examples, but are not exhaustive and include but not limited to the above electronic Devices.
In the embodiment of the present application, the unstructured text is unstructured text data, and the unstructured text mainly refers to: unstructured data in the form of data, such as text (e.g., characters, numbers, punctuation, various printable symbols, etc.), is typically represented by documents in library databases that may contain structural fields, such as title, author, publication date, length, category, etc., as well as a large number of unstructured text components, such as abstract and body content.
The following describes embodiments of the present application in detail.
Referring to fig. 1A, fig. 1A is a schematic flowchart of a text recognition method according to an embodiment of the present application, and as shown in the figure, the text recognition method includes:
101. and acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers.
In this embodiment of the present application, the K predefined relationships may be preset, or the system defaults to store a predefined relationship library in advance, which may include a plurality of predefined relationships.
In specific implementation, the target unstructured text can be obtained by character recognition through a camera or input by a user, the target unstructured text can comprise L tokens and K predefined relations, and L and K are positive integers.
By way of example, the target unstructured text may be "the author of the fiction city is a cashbox book". Wherein, the "city wall" and "the" Qiancoshu "are entities, and the" author "is a relationship.
The method described in the embodiment of the application can be widely applied to the construction of a relation graph and a knowledge graph or other applications depending on entities and relation extraction tasks.
102. Inputting the target unstructured text into a target language model for prediction to obtain a prediction result, wherein the target language model is a language model obtained by adding noise to a pre-training language model and then disturbing, and the prediction result comprises at least one triple between an entity and a relation.
In the embodiment of the application, the entity and the relation can be predicted simultaneously, the entity and relation triple can be directly predicted, and the interdependency among the triples can be better captured.
Specifically, the target unstructured text may be input to a target language model for prediction to obtain a prediction result, where the target language model is a language model obtained by adding noise to a pre-trained language model and performing perturbation. Wherein the prediction result may include at least one triple between the entity and the relationship. The predicted result may be understood as an intermediate result that may include the identification of all possible triples in the input text.
In specific implementation, in the embodiment of the present application, a single-module and one-time prediction entity relationship joint extraction method may be adopted. As shown in fig. 1C, for example, the sentence "author of a fiction box is a jockey book", "box" and "jockey book" are two entities in the sentence, "author" is a predefined relationship, and a triple (box, author, jockey book) can be directly identified by judging its correctness. The method aims to fully capture the dependency relationship among the three elements by simultaneously inputting the head entity, the relation entity and the tail entity into one classification module, thereby reducing redundant information. Secondly, the One-Step decoding mode does not depend on other outputs, and can effectively avoid cascade errors. And thirdly, the simple structure of single module and single step decoding enables the network to be visual and easy to train.
In specific implementation, as shown in fig. 1C, the sentence is also taken as an example, (say, go, father and son are), (say, go, father and son are money) \8230, and (8230), etc., all the possible triples are classified and predicted by the classifier, and the correct result is retained and finally output.
Optionally, before the step 101, the method may further include the following steps:
a1, acquiring a parameter matrix of the pre-training language model;
a2, determining a standard deviation according to the parameter matrix;
a3, acquiring a preset noise matrix;
a4, carrying out disturbance processing on the parameter matrix according to the standard deviation and the preset noise matrix to obtain a target parameter matrix;
and A5, optimizing the pre-training language model according to the target parameter matrix to obtain the target language model.
In the embodiment of the present application, the pre-training language model may be preset or default to the system. The pre-trained language model may include at least one of: examples of the pre-training model include, but are not limited to, neural network models, non-textual Language model learning (Non-textual) words, second-generation pre-training Language model learning (textual) words, knowledge-enhanced (Knowledge-enhanced) pre-training models, multi-lingual/cross-lingual (multilinngual) pre-training models, language-Specific (Language-Specific) pre-training models, multi-Modal (Multi-Modal, including video-text, image-text, voice-text, etc.) pre-training models, domain-Specific (Domain-Specific) pre-training models, task-Specific (Task-Specific) pre-training models, and the like.
The preset noise matrix may also be preset or default to the system.
In the specific implementation, a parameter matrix of the pre-training language model may be obtained, a standard deviation may be determined according to the parameter matrix, a preset noise matrix may be obtained, the preset noise matrix is related to a probability density function of the noise model, the parameter matrix may be disturbed according to the standard deviation and the preset noise matrix to obtain a target parameter matrix, and finally, the pre-training language model may be optimized according to the target parameter matrix to obtain the target language model, for example, the parameter matrix of the pre-training language model may be replaced by the target parameter matrix to obtain the target language model, or the parameter matrix of the pre-training language model may be replaced by the target parameter matrix, and then the replaced pre-training language model may be trained to obtain the target language model.
In the embodiment of the application, a small amount of noise is added to the pre-training language model in a targeted manner, some disturbance is added to the pre-training language model, and then the effect of subsequent tasks is improved.
In specific implementation, in consideration of the differentiation characteristics of different types of parameters in the model, the exponential distribution noises with different intensities are added to different parameter matrixes according to the standard deviation of the different parameter matrixes.
Wherein the parameter matrix of the pre-trained language model may be expressed as: [ W ] 1 ,W 2 ,...,W n ]And n is the number of the parameter matrixes of the pre-training language model, the disturbance mode is as follows:
W i ′=W i +E(a)*std(W i )
wherein E (a) is the sum of W generated by the two-way exponential noise model with parameter a i Noise matrix of the same dimension, the probability density function of the noise model is:
Figure BDA0003990828100000051
wherein a is a real number greater than 0.
In the embodiment of the application, the method for adding a small amount of noise to the pre-training language model can help the model to explore more potential feature spaces, so that the over-fitting problem of pre-training tasks and data is relieved.
Optionally, in the step 102, inputting the target unstructured text into the target language model for prediction to obtain a prediction result, the method may include the following steps:
21. embedding the target unstructured text by taking the target language model as an encoder to obtain an embedded result;
22. inputting the embedding result into a preset classifier to obtain the at least one triple;
23. determining the prediction outcome from the at least one triplet.
Wherein the preset classifier can be preset or default by the system.
In specific implementation, a target language model can be used as an encoder to embed the target unstructured text to obtain an embedded result, the embedded result can be a vector, the embedded result is input to a preset classifier to obtain at least one triple, and a prediction result is determined according to the at least one triple.
In specific implementation, in the embodiment of the present application, an entity relationship joint extraction task may be converted into a triple classification problem. If one sentence S = { w ] is given 1 ,w 2 ,...,w L H, having L tokens and K predefined relations R = { R = { R } 1 ,r 2 ,...,r K In the embodiment of the present application, all possible triples in the sentence S will be paired
Figure BDA0003990828100000052
And (5) performing identification. Where w represents token in a sentence, which is equivalent to a word in English (word).
Specifically, for an input sentence, the input sentence may be embedded using the trained language model as an encoder:
{e 1 ,e 2 ,...,e i ...,e L }=LM{x 1 ,x 2 ,...,x i ...,x L }
wherein x is i For the input token and location information etc.,e i for the corresponding embedded vector representation, i represents the ith position.
The embedding of the input sentence can then be input into the designed classifier as a possible triple (w) i ,r k ,w j ) The assignment tags are combined. i. j is the subscript of each token in the sentence, i may correspond to a row in FIG. 1D, and j may correspond to a column in FIG. 1D.
Optionally, the embedding result includes a plurality of embedding vectors, and each character corresponds to one embedding vector; the step 22 of inputting the embedding result into a preset classifier to obtain the at least one triple may include the following steps:
221. determining the characteristic features between any two embedded vectors in the plurality of embedded vectors to obtain a plurality of characteristic features;
222. inputting the plurality of characterization features into a preset score function to obtain a plurality of score vectors;
223. and inputting the plurality of score vectors into a normalization function to obtain a plurality of triples, wherein each triplet corresponds to one label.
Wherein the preset score function may be preset or system default.
In the embodiment of the application, the characterization features between any two of the multiple embedded vectors can be determined to obtain multiple characterization features, then the multiple characterization features are input to the preset score function to obtain multiple score vectors, the multiple score vectors are input to the normalization function to obtain multiple triples, and each triple corresponds to one tag.
In specific implementation, the classifier designed in the embodiment of the present application uses the HOLE idea for reference, and specifically, the character pair e can be obtained according to the following formula i,j And (3) characterization:
e i,j =tanh(W e ·[e i ;e j ] T +b e )
wherein, [;]is a connection operation, W e And b e Are trainable weights and biases, and Tanh () is the Tanh activation function. Then the character pair e i,j Characterizing inputsAnd obtaining a score vector of the character pair and relation triple in the score function.
Wherein the score function is defined as:
Figure BDA0003990828100000061
where v is a score vector, v specifically represents a score vector of a triple of character pairs and relationships, and the matrix R T Denotes the transposition of R, R ∈ R d*4K (where d is the dimension representing the entity pair, K is the total number of predefined relationships, 4 is the total number of class labels) is all relationship representations, is a trainable weight, φ () is a ReLU activation function, drop () is a dropout strategy, and W and b are fully connected layer trainable weights and biases.
Next, the score vector v may be input into the softmax function, thereby predicting the corresponding tag:
Figure BDA0003990828100000062
where v represents the score vector of the character pair and relational triplet, S represents the input sentence,
Figure BDA0003990828100000063
indicating that when the input is S, the character pair w i 、w j And relation r k The output probability of the triplet.
103. And decoding the prediction result according to a preset decoding strategy based on the entity and relation of the angle marker to obtain the target triple.
In the embodiment of the application, the preset decoding strategy based on the entity and the relationship of the corner mark may be preset or default by the system.
In the embodiment of the present application, the corner mark may be understood as: for each predefined relationship, a header entity can be decoded from the matrix column, the header entity spanning from "HB-TE" to "HE-TE"; tail entities can be decoded from the matrix rows, and the span of the tail entities is connected from 'HB-TB' to 'HB-TE'; the head and tail entities share the same tag "HB-TE". As shown in FIG. 1D, the three labels HB-TB, HB-TE, HE-TE appear as a triangle.
In the embodiment of the application, the single-module and one-time prediction entity relationship combined extraction method classifies all possible triples, and the classification result is stored in a three-dimensional matrix M L*4M*L Wherein, L is the number of tokens of the input sequence, K is the number of relationships, and 4 is the number of class labels. The task in the inference phase is therefore to decode the entities and relationships from the result matrix M.
Optionally, in the step 103, decoding the prediction result according to a preset decoding strategy based on the entity and relationship of the angle marker to obtain the target triple, the method may include the following steps:
31. determining a first result matrix according to the triples and the labels, and preprocessing the first result matrix to obtain a second result matrix, wherein the second result matrix is a three-dimensional matrix;
32. converting the second result matrix into a two-dimensional matrix to obtain a third result matrix;
33. finding out indexes of elements larger than 0 from the third result matrix to obtain at least one index;
34. determining a label meeting a preset requirement according to the at least one index to obtain at least one label;
35. determining the index ranges of the head entity and the tail entity according to the at least one label;
36. and determining the target triple according to the index range and the K tokens.
Wherein, the preset requirement can be preset or default to the system.
In this embodiment of the present application, a first result matrix may be determined according to the multiple triples and the multiple labels, and the first result matrix is preprocessed, specifically, argmax operation may be performed on every 4 channels of the second dimension of the matrix to obtain a second result matrix, where the second result matrix isAnd the first result matrix is also a three-dimensional matrix. Each channel of the first or second result matrix corresponds to a predicted result of a relationship, assuming that the second result matrix is M L*4M*L
Then, the second result matrix can be converted into a two-dimensional matrix to obtain a third result matrix, then an index of an element larger than 0 is found out from the third result matrix to obtain at least one index, a label meeting a preset requirement is determined according to the at least one index to obtain at least one label, the index ranges of the head entity and the tail entity are determined according to the at least one label, and the target triple is determined according to the index ranges and the K tokens.
In a specific implementation, since the range of an entity can be determined by the boundary of the entity, at least 4 labels are required for performing label connection on a head entity and a tail entity in a token-pair manner, where a row represents the head entity and a column represents the tail entity, where 4 label classes define: label 1: HB-TB, representing the beginning tag of the head entity and the beginning tag of the tail entity; and 2, labeling: HB-TE, representing the beginning marker for the head entity and the ending marker for the tail entity; and (3) label: HE-TE indicating an end marker of a head entity and an end marker of a tail entity; label 0: the other cases are shown, and the specific example can be shown in fig. 1D.
In the embodiment of the application, the result matrix M can be used L*4M*L The entity and relationship triplets can be decoded simply and directly: that is, for each predefined relationship, the head entity can be decoded from the matrix column, with the span of the head entity being connected from "HB-TE" to "HE-TE"; tail entities are decodable from the matrix rows, and the span of the tail entities is connected from 'HB-TB' to 'HB-TE'; the head and tail entities share the same label "HB-TE".
For example, the following steps are specifically performed:
1. for result matrix M L*4M*L Processing is carried out, argmax operation is carried out on every 4 channels of the second dimension of the matrix, and the matrix M is obtained L*4M*L Each channel of the matrix corresponds to a predicted result of the relationship.
2. Take the single relation r as an example, in this case, the matrix MDegenerated into a two-dimensional matrix M L*L . Slave matrix M L*L Find the index of the element greater than 0.
3. And judging the indexes to search for the index with the label of HB-TB, searching for the index with the label of HB-TE according to the index meeting the requirement of HB-TB, and finally searching for the index with the label of HE-TE from the index with the label of HB-TE.
4. And determining the ranges of the head entity and the tail entity according to the indexes of the HB-TB, the HB-TE and the HE-TE labels searched in the step 3 (the span of the head entity is connected to the row index of the HE-TE from the row index of the HB-TE, and the span of the tail entity is connected to the column index of the HB-TE from the column index of the HB-TB).
5. And (5) obtaining the relation triple (h, r, t) according to the index ranges of the head entity and the tail entity found in the step (4) and the token sequence of the input sentence.
In practical applications, in general, if the head entity is located in front of the tail entity, the label "HB-TE" is located in the upper right corner of the label matrix, and if the tail entity is located in front of the head entity, the label "HB-TE" is located in the lower left corner of the label matrix.
As for the type of the entity, in the embodiment of the present application, it is not explicitly indicated, but is implied in the relationship, and as in the example of the figure, a triple (a city, an author, a jingle book) is identified, and the type of the entity of the head entity "city" can be regarded as a novel, and the type of the entity of the tail entity "jingle book" is a name of a person.
104. And determining a target structured text corresponding to the target unstructured text according to the target triple.
In the embodiment of the application, before fine tuning, according to the differentiated features of different types of parameters in the pre-training model, some disturbance is added to the original model, so that the model is helped to explore more potential feature spaces, and the over-fitting problem of pre-training tasks and data is reduced. Meanwhile, the method combines the entity identification and the relation extraction in the information extraction, converts the entity identification and the relation extraction into a fine-grained triple classification problem, can fully capture the dependency among three elements of a head entity, a relation and a tail entity, reduces redundant information, can effectively avoid cascade errors, and is intuitive in network and easy to train.
In specific implementation, the target structured text corresponding to the target unstructured text can be determined according to the target triples, that is, the target structured text corresponding to the target triples can be generated based on the semantic relationship corresponding to the target triples, and the target structured text can be displayed.
In the embodiment of the application, information extraction can be performed on the unstructured text, entities and relations in the unstructured text can be recognized, firstly, a small amount of noise is added to a large-scale used pre-training language model for disturbance, so that more potential feature spaces can be explored, secondly, the disturbed pre-training language model is finely adjusted by using a designed entity relation combined extraction method and marked data, then, the unstructured text is input to the finely adjusted model for prediction, thirdly, output is decoded according to a proposed decoding strategy, structured text information is obtained, furthermore, the problem that the model possibly falls into local optimization in the fine adjustment process can be relieved, the process of extracting the entities and the relations is simplified, the model is more visual, training and reasoning are easy, and the accuracy and the efficiency of text structuring are finally improved.
It can be seen that, according to the text recognition method described in the embodiment of the present application, a target unstructured text is obtained, where the target unstructured text includes L tokens and K predefined relationships, where L and K are positive integers, the target unstructured text is input to a target language model for prediction to obtain a prediction result, the target language model is a language model obtained by adding noise to a pre-trained language model and performing perturbation, and the prediction result includes at least one triplet between an entity and a relationship, the prediction result is decoded according to a preset decoding strategy based on an entity and a relationship of an angle marker to obtain a target triplet, and a target structured text corresponding to the target unstructured text is determined according to the target triplet.
Referring to fig. 2 in a manner consistent with the embodiment shown in fig. 1A, fig. 2 is a schematic flowchart of another text recognition method provided in the embodiment of the present application, and the text recognition method is applied to an electronic device, and as shown in the diagram, the text recognition method includes:
201. and acquiring a parameter matrix of the pre-training language model.
202. And determining the standard deviation according to the parameter matrix.
203. And acquiring a preset noise matrix.
204. And carrying out disturbance processing on the parameter matrix according to the standard deviation and the preset noise matrix to obtain a target parameter matrix.
205. And optimizing the pre-training language model according to the target parameter matrix to obtain a target language model.
206. And acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers.
207. And inputting the target unstructured text into the target language model for prediction to obtain a prediction result, wherein the prediction result comprises at least one triple between an entity and a relation.
208. And decoding the prediction result according to a preset decoding strategy based on the entity and relation of the angle marker to obtain the target triple.
209. And determining a target structured text corresponding to the target unstructured text according to the target triple.
The specific description of the steps 201 to 209 may refer to the corresponding steps of the text recognition method described in the above fig. 1A, and are not repeated herein.
In the embodiment of the application, information extraction can be performed on the unstructured text, entities and relations in the unstructured text can be recognized, firstly, a small amount of noise is added to a large-scale used pre-training language model for disturbance, so that more potential feature spaces can be explored, secondly, the disturbed pre-training language model is finely adjusted by using a designed entity relation combined extraction method and marked data, then, the unstructured text is input to the finely adjusted model for prediction, thirdly, output is decoded according to a proposed decoding strategy, structured text information is obtained, furthermore, the problem that the model possibly falls into local optimization in the fine adjustment process can be relieved, the process of extracting the entities and the relations is simplified, the model is more visual, training and reasoning are easy, and the accuracy and the efficiency of text structuring are finally improved.
In accordance with the foregoing embodiment, please refer to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown, the electronic device includes a processor, a memory, a communication interface, and one or more programs applied to the electronic device, the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:
acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers;
inputting the target unstructured text into a target language model for prediction to obtain a prediction result, wherein the target language model is a language model obtained by adding noise to a pre-training language model and performing disturbance, and the prediction result comprises at least one triple between an entity and a relation;
decoding the prediction result according to a preset decoding strategy based on the entity and relation of the angle marker to obtain a target triple;
and determining a target structured text corresponding to the target unstructured text according to the target triple.
Optionally, the program further includes instructions for performing the following steps:
acquiring a parameter matrix of the pre-training language model;
determining a standard deviation according to the parameter matrix;
acquiring a preset noise matrix;
carrying out disturbance processing on the parameter matrix according to the standard deviation and the preset noise matrix to obtain a target parameter matrix;
and optimizing the pre-training language model according to the target parameter matrix to obtain the target language model.
Optionally, in the aspect that the target unstructured text is input to a target language model for prediction to obtain a prediction result, the program includes instructions for executing the following steps:
embedding the target unstructured text by taking the target language model as an encoder to obtain an embedding result;
inputting the embedding result into a preset classifier to obtain the at least one triple;
determining the prediction outcome from the at least one triplet.
Optionally, the embedding result includes a plurality of embedding vectors, and each character corresponds to one embedding vector;
in the aspect of inputting the embedding result into a preset classifier to obtain the at least one triple, the program includes instructions for performing the following steps:
determining the characteristic features between any two embedded vectors in the plurality of embedded vectors to obtain a plurality of characteristic features;
inputting the plurality of characterization features into a preset score function to obtain a plurality of score vectors;
and inputting the plurality of score vectors into a normalization function to obtain the plurality of triples, wherein each triplet corresponds to one label.
Optionally, in the aspect that the prediction result is decoded according to a preset decoding policy based on an entity and a relationship of an angle marker to obtain a target triple, the program includes instructions for executing the following steps:
determining a first result matrix according to the triples and the labels, and preprocessing the first result matrix to obtain a second result matrix, wherein the second result matrix is a three-dimensional matrix;
converting the second result matrix into a two-dimensional matrix to obtain a third result matrix;
finding out indexes of elements larger than 0 from the third result matrix to obtain at least one index;
determining a label meeting a preset requirement according to the at least one index to obtain at least one label;
determining the index ranges of the head entity and the tail entity according to the at least one label;
and determining the target triple according to the index range and the K tokens.
It can be seen that, in the electronic device described in the embodiment of the present application, a target unstructured text is obtained, where the target unstructured text includes L tokens and K predefined relationships, where L and K are positive integers, and the target unstructured text is input to a target language model for prediction to obtain a prediction result, where the target language model is a language model obtained by adding noise to a pre-trained language model and performing perturbation, and the prediction result includes at least one triplet between an entity and a relationship, and the prediction result is decoded according to a preset decoding strategy based on an entity and a relationship of an angle marker to obtain a target triplet, and a target structured text corresponding to the target unstructured text is determined according to the target triplet, on one hand, a small amount of noise is added to a large-scale pre-trained language model for perturbation, so that more potential feature spaces can be explored, on the other hand, the input of the unstructured text to the model after fine tuning is predicted, and again, the output is decoded according to provide a decoding strategy to obtain structured text information, thereby reducing the problem that the model may be involved in a fine tuning process, simplifying the entity and the extraction of the relationship, and further improving the efficiency of the structured text and the final efficiency.
Fig. 4 is a block diagram showing functional units of a text recognition apparatus 400 according to an embodiment of the present application. The text recognition apparatus 400 is applied to an electronic device, and the text recognition apparatus 400 may include: an acquisition unit 401, a prediction unit 402, a decoding unit 403, and a determination unit 404, wherein,
the obtaining unit 401 is configured to obtain a target unstructured text, where the target unstructured text includes L tokens and K predefined relationships, where L and K are positive integers;
the prediction unit 402 is configured to input the target unstructured text into a target language model for prediction to obtain a prediction result, where the target language model is a language model obtained by adding noise to a pre-trained language model and performing perturbation, and the prediction result includes at least one triplet between an entity and a relationship;
the decoding unit 403 is configured to decode the prediction result according to a preset decoding strategy based on an entity and a relationship of an angle marker, so as to obtain a target triple;
the determining unit 404 is configured to determine, according to the target triple, a target structured text corresponding to the target unstructured text.
Optionally, the apparatus 400 is further specifically configured to:
acquiring a parameter matrix of the pre-training language model;
determining a standard deviation according to the parameter matrix;
acquiring a preset noise matrix;
carrying out disturbance processing on the parameter matrix according to the standard deviation and the preset noise matrix to obtain a target parameter matrix;
and optimizing the pre-training language model according to the target parameter matrix to obtain the target language model.
Optionally, in terms of inputting the target unstructured text into a target language model for prediction to obtain a prediction result, the prediction unit 402 is specifically configured to:
embedding the target unstructured text by taking the target language model as an encoder to obtain an embedding result;
inputting the embedding result into a preset classifier to obtain the at least one triple;
determining the prediction outcome from the at least one triplet.
Optionally, the embedding result includes a plurality of embedding vectors, and each character corresponds to one embedding vector;
in the aspect that the embedding result is input to a preset classifier to obtain the at least one triple, the prediction unit 402 is specifically configured to:
determining the characteristic features between any two embedded vectors in the plurality of embedded vectors to obtain a plurality of characteristic features;
inputting the plurality of characterization features into a preset score function to obtain a plurality of score vectors;
and inputting the plurality of score vectors into a normalization function to obtain a plurality of triples, wherein each triplet corresponds to one label.
Optionally, in terms of decoding the prediction result according to a preset decoding policy based on the entity and the relationship of the angle marker to obtain the target triple, the decoding unit 403 is specifically configured to:
determining a first result matrix according to the triples and the labels, and preprocessing the first result matrix to obtain a second result matrix, wherein the second result matrix is a three-dimensional matrix;
converting the second result matrix into a two-dimensional matrix to obtain a third result matrix;
finding out indexes of elements larger than 0 from the third result matrix to obtain at least one index;
determining a label meeting a preset requirement according to the at least one index to obtain at least one label;
determining the index ranges of the head entity and the tail entity according to the at least one label;
and determining the target triple according to the index range and the K tokens.
It can be seen that, the text recognition apparatus described in the embodiment of the present application obtains a target unstructured text, where the target unstructured text includes L tokens and K predefined relationships, and both L and K are positive integers, inputs the target unstructured text into a target language model for prediction to obtain a prediction result, where the target language model is a language model obtained by adding noise to a pre-trained language model and performing perturbation, and the prediction result includes at least one triplet between an entity and a relationship, decodes the prediction result according to a preset decoding strategy based on an entity and a relationship of an angle marker to obtain a target triplet, and determines a target structured text corresponding to the target unstructured text according to the target triplet.
It can be understood that the functions of each program module of the text recognition apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the above methods of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps of the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, the memory including: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of text recognition, the method comprising:
acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers;
inputting the target unstructured text into a target language model for prediction to obtain a prediction result, wherein the target language model is a language model obtained by adding noise to a pre-training language model and performing disturbance, and the prediction result comprises at least one triple between an entity and a relation;
decoding the prediction result according to a preset decoding strategy based on the entity and relation of the angle marker to obtain a target triple;
and determining a target structured text corresponding to the target unstructured text according to the target triple.
2. The method of claim 1, further comprising:
acquiring a parameter matrix of the pre-training language model;
determining a standard deviation according to the parameter matrix;
acquiring a preset noise matrix;
carrying out disturbance processing on the parameter matrix according to the standard deviation and the preset noise matrix to obtain a target parameter matrix;
and optimizing the pre-training language model according to the target parameter matrix to obtain the target language model.
3. The method according to claim 1 or 2, wherein the inputting the target unstructured text into a target language model for prediction to obtain a prediction result comprises:
embedding the target unstructured text by taking the target language model as an encoder to obtain an embedded result;
inputting the embedding result into a preset classifier to obtain the at least one triple;
determining the prediction outcome from the at least one triplet.
4. The method of claim 3, wherein the embedding result comprises a plurality of embedding vectors, one for each character;
inputting the embedding result into a preset classifier to obtain the at least one triple, including:
determining the characteristic features between any two embedded vectors in the plurality of embedded vectors to obtain a plurality of characteristic features;
inputting the plurality of characterization features into a preset score function to obtain a plurality of score vectors;
and inputting the plurality of score vectors into a normalization function to obtain the plurality of triples, wherein each triplet corresponds to one label.
5. The method according to claim 4, wherein the decoding the prediction result according to a preset decoding policy based on the entity and relationship of the corner mark to obtain a target triple comprises:
determining a first result matrix according to the triples and the labels, and preprocessing the first result matrix to obtain a second result matrix, wherein the second result matrix is a three-dimensional matrix;
converting the second result matrix into a two-dimensional matrix to obtain a third result matrix;
finding out indexes of elements larger than 0 from the third result matrix to obtain at least one index;
determining a label meeting a preset requirement according to the at least one index to obtain at least one label;
determining the index ranges of the head entity and the tail entity according to the at least one label;
and determining the target triple according to the index range and the K tokens.
6. A text recognition apparatus, characterized in that the apparatus comprises: an acquisition unit, a prediction unit, a decoding unit, and a determination unit, wherein,
the acquisition unit is used for acquiring a target unstructured text, wherein the target unstructured text comprises L tokens and K predefined relations, and L and K are positive integers;
the prediction unit is used for inputting the target unstructured text into a target language model for prediction to obtain a prediction result, the target language model is a language model obtained by adding noise to a pre-training language model and then disturbing, and the prediction result comprises at least one triple between an entity and a relation;
the decoding unit is used for decoding the prediction result according to a preset decoding strategy based on the entity and the relation of the angle marker to obtain a target triple;
the determining unit is configured to determine, according to the target triple, a target structured text corresponding to the target unstructured text.
7. The apparatus of claim 6, wherein the apparatus is further specifically configured to:
acquiring a parameter matrix of the pre-training language model;
determining a standard deviation according to the parameter matrix;
acquiring a preset noise matrix;
carrying out disturbance processing on the parameter matrix according to the standard deviation and the preset noise matrix to obtain a target parameter matrix;
and optimizing the pre-training language model according to the target parameter matrix to obtain the target language model.
8. The apparatus according to claim 6 or 7, wherein, in the aspect of inputting the target unstructured text into the target language model for prediction to obtain a prediction result, the prediction unit is specifically configured to:
embedding the target unstructured text by taking the target language model as an encoder to obtain an embedded result;
inputting the embedding result into a preset classifier to obtain the at least one triple;
determining the prediction outcome from the at least one triplet.
9. An electronic device comprising a processor, a memory for storing one or more programs and configured for execution by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-5.
10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any of the claims 1-5.
CN202211580848.7A 2022-12-09 2022-12-09 Text recognition method and device and computer readable storage medium Pending CN115730043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211580848.7A CN115730043A (en) 2022-12-09 2022-12-09 Text recognition method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211580848.7A CN115730043A (en) 2022-12-09 2022-12-09 Text recognition method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115730043A true CN115730043A (en) 2023-03-03

Family

ID=85301049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211580848.7A Pending CN115730043A (en) 2022-12-09 2022-12-09 Text recognition method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115730043A (en)

Similar Documents

Publication Publication Date Title
Sohangir et al. Big Data: Deep Learning for financial sentiment analysis
CN111753081B (en) System and method for text classification based on deep SKIP-GRAM network
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
EP3926531B1 (en) Method and system for visio-linguistic understanding using contextual language model reasoners
Alhumoud et al. Arabic sentiment analysis using recurrent neural networks: a review
Rani et al. An efficient CNN-LSTM model for sentiment detection in# BlackLivesMatter
CN112231569B (en) News recommendation method, device, computer equipment and storage medium
Arumugam et al. Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
US11003950B2 (en) System and method to identify entity of data
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
KR102379660B1 (en) Method for utilizing deep learning based semantic role analysis
Biswas et al. Scope of sentiment analysis on news articles regarding stock market and GDP in struggling economic condition
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
Salur et al. A soft voting ensemble learning-based approach for multimodal sentiment analysis
CN114416995A (en) Information recommendation method, device and equipment
Kumar et al. ATE-SPD: simultaneous extraction of aspect-term and aspect sentiment polarity using Bi-LSTM-CRF neural network
Tüselmann et al. Recognition-free question answering on handwritten document collections
CN113627151A (en) Cross-modal data matching method, device, equipment and medium
CN112445862A (en) Internet of things equipment data set construction method and device, electronic equipment and storage medium
CN112100364A (en) Text semantic understanding method and model training method, device, equipment and medium
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN114662496A (en) Information identification method, device, equipment, storage medium and product
CN115730043A (en) Text recognition method and device and computer readable storage medium
Zhang et al. Big data-assisted urban governance: A comprehensive system for business documents classification of the government hotline

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination