CN114817564A - Attribute extraction method and device and storage medium - Google Patents
Attribute extraction method and device and storage medium Download PDFInfo
- Publication number
- CN114817564A CN114817564A CN202210458635.0A CN202210458635A CN114817564A CN 114817564 A CN114817564 A CN 114817564A CN 202210458635 A CN202210458635 A CN 202210458635A CN 114817564 A CN114817564 A CN 114817564A
- Authority
- CN
- China
- Prior art keywords
- text
- vector representation
- word
- attribute
- global vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention converts the attribute extraction task into a segment extraction type reading understanding task, and adopts a multi-task model of combined training of attribute extraction and text attribute judgment. The model takes BERT-B i-LSTM as a coding module, codes input texts and questions respectively, and takes structured information as the questions to enhance the generalization capability of the model. Then, a word boundary feature enhancement method is used to help the model capture the boundary features of the attribute values, and the word features are merged on the basis of the global vector features in combination with a multi-head attention mechanism. Meanwhile, a text feature interaction method is designed for judging whether an attribute value corresponding to a problem exists in a text or not, and the method is used as an auxiliary task and an attribute value boundary prediction task for joint training.
Description
Technical Field
The present invention relates to the field of technologies, and in particular, to a method, an apparatus, a device, and a computer storage medium for extracting attributes.
Background
In the prior art, at present, fields such as e-commerce, movie and television, medical treatment and the like are designed to construct a high-quality knowledge graph in the field, an attribute extraction task is one of important links for constructing the knowledge graph, and the task is oriented to unstructured texts in the vertical field and aims to extract attributes and attribute values related to entities. Taking e-commerce data as an example, given a commodity category "jacket" and a description text "foreign trade man autumn hood-linked excellent splicing lappet hip-hop jacket big code jacket", the goal is to extract attributes and attribute values related to the "jacket" from the description text, such as "material-lappet", "style-hip-hop", "material" and "style" are attributes of the "jacket", and "lappet" and "hip-hop" are corresponding attribute values. The attribute extraction task can improve the integrity of the entity node expression of the knowledge graph and enhance the interactive experience of the user and the knowledge graph.
Existing attribute extraction methods are mainly classified into rule-based methods, traditional machine learning-based methods, and deep learning-based methods. The rule-based method requires the manual construction of a rule template related to the field, and the constructed rule template is utilized to match attributes and attribute values corresponding to the entity in the natural language text. The method is based on the problem of poor applicability when the rules related to the field are formulated in a single field. When the rule size becomes large, the whole rule set is difficult to maintain, and once the text which cannot be covered by the new rule appears, additional rule design is needed, and the process is time-consuming and labor-consuming. The method based on traditional machine learning generally utilizes a supervised learning strategy, and needs a large amount of labeled corpus training models so that the models can fully learn the attribute characteristics contained in the data.
In recent years, deep learning methods have been widely used in various information extraction tasks in natural language processing, and have achieved good results in tasks such as named entity recognition, event extraction, relationship extraction, and entity-relationship joint extraction. For example, a Recurrent Neural Network (RNN), a Long Short-Term Memory (LSTM) Network, and a Gated Network (GRU) Network are highlighted for convenience in extracting information from natural text. In addition, some researchers combine the attention mechanism with BilSTM-CRF to capture the inherent semantic relation of the commodity title, so that the model can better extract the attribute and the attribute value corresponding to the commodity title. Currently, the pre-training language model becomes a mainstream coder for information extraction tasks such as attribute extraction and the like, such as BERT, ALBERT, RoBERTa, elettra, XLNet and the like, by virtue of excellent coding capability.
The prior art related to attribute extraction has the following disadvantages:
1. the feature extraction method based on the artificial template is time-consuming and labor-consuming for manually filtering data, and the extraction quality is difficult to ensure based on a fuzzy matching mode. The template constructed based on expert knowledge has high cost, and the coverage of the template is limited, so that the template cannot be flexibly applied. This approach does not allow for extraction of new attribute values that do not appear in the data.
2. The feature extraction method based on the bidirectional long-short term memory neural network is difficult to solve the problem of long-distance dependence and easy to cause information loss.
3. The method based on the pre-training language model does not fully consider vocabulary information, and has the problems that the model is difficult to judge entity boundaries and the generalization capability is insufficient.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the problems of long-distance dependence and insufficient generalization capability in the prior art.
In order to solve the above technical problem, the present invention provides an attribute extraction method, device, apparatus and computer storage medium, including:
inputting the preprocessed problem and the text into a pre-trained attribute extraction model, wherein the problem is that the triple after the MASK marks replace head and tail entities, namely structured information;
calculating by using a BERT model to obtain problem global vector representation and first text global vector representation, and encoding the first text global vector representation through a Bi-directional long-short term memory layer Bi-LSTM to obtain second text global vector representation;
interacting the second text global vector representation and the problem global vector representation by using a multi-head attention mechanism to obtain a text global vector representation with problem structural information generalization characteristics;
inputting the text into an automatic word segmentation tool to obtain a word segmentation result and a word segmentation vector representation of the text;
adding word segmentation vector representation at the corresponding position of the text global vector representation with problem structured information generalization characteristics according to the absolute position index of the word head-tail label in the word segmentation result to obtain final text vector representation;
and predicting the attribute value boundary to be extracted in the final text vector representation to obtain a target attribute value.
Preferably, the calculating by using the BERT model to obtain the problem global vector representation and the first text global vector representation, and the encoding the first text global vector representation by the Bi-directional long-short term memory layer Bi-LSTM to obtain the second text global vector representation comprises:
segmenting the question Q and the text S into words, each word being represented by a tagged word vector TE (w) i ) A word vector SE (w) distinguishing two different sentences i ) And a location word vector PE (w) i ) Composing to obtain a vector representation of the question and the text;
inputting the vector representation of the question and text into the BERT model to obtain the coded global vector representation of the questionAnd the first text global vector representation WhereinIs BThe vector representation of each character in the post-ERT encoded problem,vector representation for each character in the text after BERT coding;
representing X for the first text global vector using the Bi-directional long-short term memory layer Bi-LSTM s Coding to obtain the global vector representation of the second textWhereinA vector representation for each character in the Bi-LSTM encoded text.
Preferably, the first text global vector representation X is represented by using the bidirectional long-short term memory layer Bi-LSTM s Coding to obtain the global vector representation of the second textThe method comprises the following steps:
computing the first text global vector X s To obtain a coded global vector representation of said second text
The hidden state o at each time i i Hidden state by forward LSTMHidden state of backward LSTMAnd splicing to obtain the following calculation formula:
preferably, the adding the word segmentation vector representation to the corresponding position of the text global vector representation having the problem structured information generalization feature according to the absolute position index of the word head-tail label in the word segmentation result to obtain the final text vector representation includes:
the word position in the text word segmentation result is represented as:
P[a i ,t i ]={p 1 [a 1 ,t 1 ],p 2 [a 2 ,t 2 ]…p n [a n ,t n ]in which a is i 、t i An absolute position index, p, representing the head and tail labels of each word in the text, respectively n Represents the nth word;
in the text global vector representation with problem structured information generalization characteristics Adding the word segmentation vector representation V containing word time sequence characteristics after Bi-LSTM and normalization into the corresponding position to obtain the final text vector representation H v 。
Preferably, the predicting the attribute value boundary to be extracted in the final text vector representation, and obtaining the target attribute value includes:
and respectively predicting the probability of each word in the final text vector as a starting position s and an ending position e by adopting two linear layers:
s i =sigmoid(FNN(H v ))
e i =sigmoid(FNN(H v )
wherein s is i Representing the probability of the ith word of text as the starting position of the attribute value, e i Representing the probability that the ith character of the text is used as the ending position of the attribute value;
and taking the starting position and the corresponding ending position as the coordinates of the target attribute value.
Preferably, the training process of the attribute extraction model includes an attribute value boundary prediction task, and the specific steps of the attribute value boundary prediction task are as follows:
constructing a corresponding training set;
training a model using the training set until a loss function converges, the loss function including a loss with each word as a starting position s And loss of termination location e :
Wherein the content of the first and second substances,andis a boundary representation of the true attribute values.
Preferably, the training process of the attribute extraction model includes a text attribute type classification task, and the text attribute type classification task includes the specific steps of:
representing the texts and the problems in the training set in the CLS Token output of the BRET modelAndas textA feature representation and an attribute type feature representation;
by using a multi-head attention mechanismAndinteracting to obtain comprehensive classification characteristics h Att ;
And training and judging whether the attribute value to be extracted related to the attribute type to be extracted exists in the text according to the comprehensive classification characteristic by using a classifier so as to enable the model to pay more attention to the attribute value related to the attribute to be extracted in the text, wherein the loss function is as follows:
wherein, y j Representing true class true value, P j Indicating a predicted value for the jth class attribute type.
The invention also provides a device for extracting the attributes, which comprises:
the input module is used for inputting the preprocessed problem and the text into a pre-trained attribute extraction model, wherein the problem is that the triple after the MASK marks replace head and tail entities, namely structured information;
the coding module is used for calculating by using a BERT model to obtain problem global vector representation and first text global vector representation, and coding the first text global vector representation through a Bi-directional long-short term memory layer Bi-LSTM to obtain second text global vector representation;
the interaction module is used for interacting the second text global vector representation with the problem global vector representation by utilizing a multi-head attention mechanism to obtain a text global vector representation with problem structural information generalization characteristics;
the word segmentation module is used for inputting the text into the automatic word segmentation tool to obtain a word segmentation result and a word segmentation vector representation of the text;
the word boundary enhancing module is used for adding word segmentation vector representation to the corresponding position of the text global vector representation with the problem structured information generalization characteristics according to the absolute position index of the word head tag and the word tail tag in the word segmentation result to obtain final text vector representation;
and the extraction module is used for predicting the boundary of the attribute value to be extracted in the final text vector representation to obtain a target attribute value.
The invention also provides a device for extracting the attributes, which comprises:
a memory for storing a computer program; and the processor is used for realizing the step of extracting the attributes when executing the computer program.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of attribute extraction as described above.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the attribute extraction task is converted into a segment extraction type reading understanding task, the model takes BERT-Bi-LSTM as a coding module, the input text and the problem are coded respectively, head and tail entities in the triple are marked by adopting a mask label, the label is not exposed when context information is used, the structured information is used as the problem, and the generalization capability of the model is enhanced; the bidirectional long and short term memory layer can integrate forward and backward information of the text, so that the model fully captures the time sequence characteristics and semantic characteristics of the text; and a word boundary characteristic enhancement method is used for helping the model to capture the boundary characteristics of the attribute values, adding complete vocabulary information at the starting position and the ending position of the vocabulary based on the word segmentation result, and then utilizing a multi-head attention mechanism to enable the word segmentation vectors to interact with the global characteristics of the text so as to be fused into the vocabulary information. The word boundary characteristics strengthen the judgment of the model on the head and tail positions of the attribute values, deepen the control of the model on the attribute value boundaries, help the model to understand sentence structures and help the model to identify more unknown words. The invention utilizes the word boundary information and the text attribute characteristics to relieve the problems that no answer data in the machine reading understanding model is difficult to utilize and the unknown words in the attribute extraction task are difficult to extract, thereby effectively improving the effect of entity attribute extraction.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the present disclosure taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of an implementation of the attribute extraction of the present invention;
FIG. 2 is a detailed block diagram of the vocabulary enhancement module of the present invention;
FIG. 3 is an overall block diagram of the attribute extraction model of the present invention;
fig. 4 is a block diagram of an attribute extraction apparatus according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a method, a device, equipment and a computer storage medium for extracting attributes, fully acquire context information and improve the generalization capability of a model.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of attribute extraction according to the present invention; the specific operation steps are as follows:
s101, inputting the preprocessed problem and the text into a pre-trained attribute extraction model, wherein the problem is a triple after MASK marks replace head and tail entities, and the triple is structured information;
the questions and texts are preprocessed into the forms of "[ CLS ] + text + [ SEP ]", "[ CLS ] + question + [ SEP ]", before being input;
triples are head entity-attribute-tail entity (attribute value);
the question is marked with MASK to replace the head and tail entities with "[ CLS ] + [ MASK ] attribute [ MASK ] + [ SEP ]".
S102: calculating by using a BERT model to obtain problem global vector representation and first text global vector representation, and encoding the first text global vector representation through a Bi-directional long-short term memory layer Bi-LSTM to obtain second text global vector representation;
the BERT adopts a stacked 12-layer same transform Encoder architecture, can acquire global features of a text from different angles, each layer consists of two substructures of a multi-head attention layer and a feedforward neural network, and the output of each substructure is respectively subjected to residual connection and layer regularization (layer normal). The multi-head attention can obtain self-attention vectors of a plurality of subspaces, and the vectors of all the subspaces are spliced to obtain the output of the multi-head attention. U denotes a multi-head attention vector, and denotes an attention vector of a subspace. The feed-forward neural network projects the output of the multi-head attention layer.
U=Concat(u 1 ,u 2 …u h )W u ,
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 ,
Segmenting the question Q and the text S, outputting words after segmenting the Chinese, converting the words into corresponding IDs (identities) as input of the model, wherein each word is a marked word vector TE (w) i ) A word vector SE (w) distinguishing two different sentences i ) And a location word vector PE (w) i ) Composition, i.e. E (w) i )=TE(w i )+SE(w i )+PE(w i ) Obtaining a vector representation of the question and the text;
inputting the vector representation of the question and text into the BERT model to obtain the encoded global vector representation of the questionAnd the first text global vector representation WhereinFor the vector representation of each character in the post BERT encoded problem,vector representation for each character in the text after BERT coding;
considering that the recurrent neural network is suitable for modeling serialized data, the problems of gradient disappearance or gradient explosion exist in the training process, and the long-distance dependence is difficult to process. Therefore, the Bi-LSTM is adopted to further integrate the text time sequence characteristics. Representing X for the first text global vector using the Bi-directional long-short term memory layer Bi-LSTM s Coding to obtain the global vector representation of the second textWhereinVector representation for each character in the Bi-LSTM encoded text;
computing the first text global vector X s To obtain a coded global vector representation of said second text
The hidden state o at each time i i Hidden state by forward LSTMHidden state of backward LSTMAnd splicing to obtain the following calculation formula:
s103, interacting the second text global vector representation and the problem global vector representation by using a multi-head attention mechanism to obtain a text global vector representation with problem structural information generalization characteristics;
s104, inputting the text into an automatic word segmentation tool to obtain a word segmentation result and a word segmentation vector representation of the text;
s105, adding word segmentation vector representation to the corresponding position of the text global vector representation with the problem structured information generalization characteristics according to the absolute position index of the word head-tail label in the word segmentation result to obtain final text vector representation;
the word position in the text word segmentation result is represented as:
[a i ,t i ]={p 1 [a 1 ,t 1 ],p 2 [a 2 ,t 2 ]…p n [a n ,t n ]in which a is i 、t i An absolute position index, p, representing the head and tail labels of each word in the text, respectively n Represents the nth word;
in the text global vector representation with problem structured information generalization characteristics Is added into the corresponding position and is normalized by Bi-LSTMThe word segmentation vector representation V containing word time sequence characteristics obtains a final text vector representation H v 。
The decoding operation of the model relies on locating the start and end positions of the attribute values to complete the extraction of the attribute values. Therefore, the word boundary characteristic enhancement method adopts the characteristic of strengthening the head and tail boundary of the attribute value to help the model to judge the boundary of the attribute value. The vocabulary enhancement module aims at integrating vocabulary information on the basis of BERT-Bi-LSTM coding and improving the grasp of the model on the attribute value boundary. The vocabulary enhancement module consists of two parts, namely word boundary characteristic enhancement and text-word segmentation information interaction. The word boundary characteristic enhancement method is based on word segmentation results, and complete word information is added at the starting position and the ending position of words. The text-word segmentation information interaction mainly utilizes a multi-head attention mechanism to enable word segmentation vectors to interact with the global features of the text so as to integrate vocabulary information.
And S106, predicting the attribute value boundary to be extracted in the final text vector representation to obtain a target attribute value.
And respectively predicting the probability of each word in the final text vector as a starting position s and an ending position e by adopting two linear layers, wherein the probability value is closer to 1, and the probability that the word is taken as the starting position or the ending position is higher:
s i =sigmoid(FNN(H v ))
e i =sigmoid(FNN(H v )
wherein s is i Representing the probability of the ith word of text as the starting position of the attribute value, e i The probability of the ith character of the text as the ending position of the attribute value is represented, the probability is calculated through full connection layers with different parameters and the same structure, each character is independently classified to judge whether the character is the initial position, a plurality of initial positions can be obtained, and a plurality of ending positions can be obtained in the same way;
and taking the starting position and the corresponding ending position as the coordinates of the target attribute value.
The attribute extraction task is converted into a segment extraction type reading understanding task, the model takes BERT-Bi-LSTM as a coding module, the input text and the problem are coded respectively, head and tail entities in the triple are marked by adopting a mask label, the label is not exposed when context information is used, the structured information is used as the problem, and the generalization capability of the model is enhanced; the bidirectional long and short term memory layer can integrate forward and backward information of the text, so that the model fully captures the time sequence characteristics and semantic characteristics of the text; and a word boundary characteristic enhancement method is used for helping the model to capture the boundary characteristics of the attribute values, adding complete vocabulary information at the starting position and the ending position of the vocabulary based on the word segmentation result, and then utilizing a multi-head attention mechanism to enable the word segmentation vectors to interact with the global characteristics of the text so as to be fused into the vocabulary information. The word boundary characteristics strengthen the judgment of the model on the head and tail positions of the attribute values, deepen the control of the model on the attribute value boundaries, help the model to understand sentence structures and help the model to identify more unknown words. The invention utilizes the word boundary information and the text attribute characteristics to relieve the problems that no answer data in the machine reading understanding model is difficult to utilize and the unknown words in the attribute extraction task are difficult to extract, thereby effectively improving the effect of entity attribute extraction.
Based on the above embodiments, the present embodiment further describes in detail the word enhancing module of S104-S105, specifically as follows:
the segmented word vector input Bi-LSTM is normalized to obtain a segmented word vector representation V containing word time sequence characteristics. And then adding vector representation of word information word boundary feature enhancement at the corresponding position of the global feature vector based on the absolute position index of the head and tail labels of the words.
Taking fig. 2 as an example, the input text is "beige sweater", and the text is first input and encoded by the BERT-Bi-LSTM module. Meanwhile, the text is segmented by using LTP, and segmentation results { "beige", "sweater" } and corresponding segmentation vector representations V ═ V { (V) are obtained 1 ,v 2 }. Then, to facilitate determining the actual position of the lexical boundaries in the text, an absolute positional representation P [ s ] of each word is derived based on the segmentation results i ,e i ]={p 1 [1,2],p 2 [3,5]}. Finally, adding completeness to the corresponding text feature vector based on the absolute position of the boundary of the wordTo obtain a final text vector representation
Based on the above embodiments, this embodiment further describes in detail the training process of the model of the present invention, which is specifically as follows:
the training attribute extraction model comprises an attribute value boundary prediction task:
cleaning corpora and constructing a corresponding training set;
training a model using the training set until a loss function converges, the loss function including a loss with each word as a starting position s And loss of termination location e :
Wherein, the first and the second end of the pipe are connected with each other,andis a boundary representation of the true attribute values.
Since the prior art fails to combine external knowledge to enhance the understanding of the attribute types by the model, we train the attribute extraction model to also include the text attribute type classification task:
representing the texts and the problems in the training set in the CLS Token output of the BRET modelAnd withAs text feature representations and attribute classesA type feature representation;
by using a multi-head attention mechanismAndinteracting to obtain comprehensive classification characteristics h Att ;
And training and judging whether the attribute value to be extracted related to the attribute type to be extracted exists in the text according to the comprehensive classification characteristic by using a classifier so as to enable the model to pay more attention to the attribute value related to the attribute to be extracted in the text, wherein the loss function is as follows:
wherein, y j Representing true class true value, P j A predictor representing a class j attribute type.
Performing joint training by using an attribute value boundary prediction task and a text attribute type classification task, wherein the Loss is Loss class +loss s +loss e 。
Model (as shown in fig. 3) prediction is performed on a test set, and by comparing with other baseline models, the method based on word boundary feature enhancement is found to achieve the best effect, and the generalization capability of the model is obviously improved.
The invention converts the attribute extraction task into a segment extraction type reading understanding task, and adopts a multi-task model of combined training of attribute extraction and text attribute judgment. The model takes BERT-Bi-LSTM as a coding module, codes input texts and questions respectively, and takes structured information as the questions to enhance the generalization capability of the model. Then, a word boundary feature enhancement method is used to help the model capture the boundary features of the attribute values, and the word features are merged on the basis of the global vector features in combination with a multi-head attention mechanism. Meanwhile, a text feature interaction method is designed for judging whether an attribute value corresponding to a problem exists in a text or not, and the method is used as an auxiliary task and an attribute value boundary prediction task for combined training. On one hand, the word boundary characteristics strengthen the judgment of the model on the head and tail positions of the attribute values, deepen the control of the model on the attribute value boundaries and help the model to identify more unknown words; on the other hand, the method is combined with a text attribute feature perception task auxiliary model, the model is further helped to improve the sensitivity degree of the model to the attribute type, the problem that the model has insufficient understanding on the attribute type is solved, and the model can pay more attention to the attribute value related to the attribute to be extracted in the text. In conclusion, the overall performance of the attribute extraction system is beneficially improved.
Referring to fig. 4, fig. 4 is a block diagram of an attribute extraction apparatus according to an embodiment of the present invention; the specific device may include:
the input module 100 is configured to input the preprocessed problem and the text into a pre-trained attribute extraction model, where the problem is a triple after a MASK mark replaces a head-tail entity, and the triple is structured information;
the encoding module 200 is used for calculating by using a BERT model to obtain problem global vector representation and first text global vector representation, and encoding the first text global vector representation by a Bi-directional long-short term memory layer Bi-LSTM to obtain second text global vector representation;
an interaction module 300, configured to use a multi-head attention mechanism to interact the second text global vector representation with the question global vector representation to obtain a text global vector representation with a problem structured information generalization feature;
a word segmentation module 400, configured to input the text into an automatic word segmentation tool to obtain a word segmentation result and a word segmentation vector representation of the text;
a word boundary enhancing module 500, configured to add word segmentation vector representations to corresponding positions of the text global vector representation having the problem structured information generalization feature according to the absolute position indexes of the word head and tail labels in the word segmentation result, so as to obtain final text vector representations;
and the extraction module 600 is configured to predict the boundary of the attribute value to be extracted in the final text vector representation, and obtain a target attribute value.
The machine vision-based surface defect detection apparatus of this embodiment is used to implement the foregoing attribute extraction method, and therefore a specific implementation of the attribute extraction apparatus may be found in the foregoing embodiment parts of the attribute extraction method, for example, the input module 100, the encoding module 200, the interaction module 300, the word segmentation module 400, the word boundary enhancement module 500, and the extraction module 600 are respectively used to implement steps S101, S102, S103, S104, S105, and S106 in the foregoing attribute extraction method, so that the specific implementation thereof may refer to descriptions of corresponding respective part embodiments, and will not be described herein again.
The specific embodiment of the present invention further provides an attribute extraction device, including: a memory for storing a computer program; and the processor is used for realizing the steps of the attribute extraction method when the computer program is executed.
The specific embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the above-mentioned attribute extraction method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.
Claims (10)
1. An attribute extraction method, comprising:
inputting the preprocessed problem and the text into a pre-trained attribute extraction model, wherein the problem is that the triple after the MASK marks the head entity and the tail entity is the structured information;
calculating by using a BERT model to obtain problem global vector representation and first text global vector representation, and encoding the first text global vector representation through a Bi-directional long-short term memory layer Bi-LSTM to obtain second text global vector representation;
interacting the second text global vector representation and the problem global vector representation by using a multi-head attention mechanism to obtain a text global vector representation with problem structural information generalization characteristics;
inputting the text into an automatic word segmentation tool to obtain a word segmentation result and a word segmentation vector representation of the text;
adding word segmentation vector representation at the corresponding position of the text global vector representation with problem structured information generalization characteristics according to the absolute position index of the word head-tail label in the word segmentation result to obtain final text vector representation;
and predicting the attribute value boundary to be extracted in the final text vector representation to obtain a target attribute value.
2. The method of extracting attributes as claimed in claim 1, wherein the calculating a problem global vector representation and a first text global vector representation by using a BERT model, and encoding the first text global vector representation by a Bi-directional long short term memory layer Bi-LSTM to obtain a second text global vector representation comprises:
segmenting the question Q and the text S into words, each word being represented by a tagged word vector TE (w) i ) A word vector SE (w) distinguishing two different sentences i ) And a location word vector PE (w) i ) Composing to obtain a vector representation of the question and the text;
inputting the vector representation of the question and text into the BERT model to obtain the encoded global vector representation of the questionAnd the first text global vector representationWhereinFor the vector representation of each character in the post BERT encoded problem,vector representation for each character in the text after BERT coding;
3. The method of claim 2, wherein the first text global vector representation X is represented by using the Bi-directional long-short term memory layer Bi-LSTM s Coding to obtain the global vector representation of the second textThe method comprises the following steps:
calculating the first text global vector X s To obtain a coded global vector representation of said second text
The hidden state o at each time i i Hidden state by forward LSTMAnd hidden state of backward LSTMAnd splicing to obtain the following calculation formula:
4. the method according to claim 1, wherein the adding the word segmentation vector representation to the corresponding position of the global text vector representation having the generalization feature of the problem structured information according to the absolute position index of the head and tail labels of the words in the word segmentation result to obtain the final text vector representation comprises:
the word position in the text word segmentation result is represented as:
P[a i ,t i ]={p 1 [a 1 ,t 1 ],p 2 [a 2 ,t 2 ]...p n [a n ,t n ]in which a is i 、t i An absolute position index, p, representing the head-to-tail label of each word in the text n Represents the nth word;
in the text global vector representation with problem structured information generalization characteristics Adding the word segmentation vector representation V containing word time sequence characteristics after Bi-LSTM and normalization into the corresponding position to obtain the final text vector representation H v 。
5. The method of claim 4, wherein the predicting the boundary of the attribute value to be extracted in the final text vector representation and obtaining the target attribute value comprises:
and respectively predicting the probability of each word in the final text vector as a starting position s and an ending position e by adopting two linear layers:
s i =sigmoid(FNN(H v ))
e i =sigmoid(FNN(H v )
wherein s is i Representing the probability of the ith word of text as the starting position of the attribute value, e i Representing the probability that the ith character of the text is used as the ending position of the attribute value;
and taking the starting position and the corresponding ending position as the coordinates of the target attribute value.
6. The attribute extraction method according to claim 5, wherein the training process of the attribute extraction model includes an attribute value boundary prediction task, and the attribute value boundary prediction task includes the following specific steps:
constructing a corresponding training set;
training a model using the training set until a loss function converges, the loss function including a loss with each word as a starting position s And loss of termination location e :
7. The method for extracting attributes of claim 6, wherein the training process of the attribute extraction model includes a text attribute type classification task, and the text attribute type classification task includes the following specific steps:
representing the texts and the problems in the training set in the CLS Token output of the BRET modelAndas a text feature representation and an attribute type feature representation;
by using a multi-head attention mechanismAndinteracting to obtain comprehensive classification characteristics h Att ;
And training and judging whether the attribute value to be extracted related to the attribute type to be extracted exists in the text according to the comprehensive classification characteristic by using a classifier so as to enable the model to pay more attention to the attribute value related to the attribute to be extracted in the text, wherein the loss function is as follows:
wherein, y j Representing true class true value, P j Indicating a predicted value for the jth class attribute type.
8. An apparatus for attribute extraction, comprising:
the input module is used for inputting the preprocessed problem and the text into a pre-trained attribute extraction model, wherein the problem is that the triple after the MASK marks replace head and tail entities, namely structured information;
the coding module is used for calculating by using a BERT model to obtain problem global vector representation and first text global vector representation, and coding the first text global vector representation through a Bi-directional long-short term memory layer Bi-LSTM to obtain second text global vector representation;
the interaction module is used for interacting the second text global vector representation with the problem global vector representation by utilizing a multi-head attention mechanism to obtain a text global vector representation with problem structural information generalization characteristics;
the word segmentation module is used for inputting the text into an automatic word segmentation tool to obtain a word segmentation result and a word segmentation vector representation of the text;
the word boundary enhancing module is used for adding word segmentation vector representation to the corresponding position of the text global vector representation with the problem structured information generalization characteristics according to the absolute position index of the word head tag and the word tail tag in the word segmentation result to obtain final text vector representation;
and the extraction module is used for predicting the boundary of the attribute value to be extracted in the final text vector representation to obtain a target attribute value.
9. An apparatus for attribute extraction, comprising:
a memory for storing a computer program;
a processor for implementing the steps of a method of attribute extraction as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of a method of attribute extraction as claimed in any one of the claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210458635.0A CN114817564A (en) | 2022-04-15 | 2022-04-15 | Attribute extraction method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210458635.0A CN114817564A (en) | 2022-04-15 | 2022-04-15 | Attribute extraction method and device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114817564A true CN114817564A (en) | 2022-07-29 |
Family
ID=82509632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210458635.0A Pending CN114817564A (en) | 2022-04-15 | 2022-04-15 | Attribute extraction method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114817564A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245078A (en) * | 2022-11-30 | 2023-06-09 | 荣耀终端有限公司 | Structured information extraction method and electronic equipment |
CN116756624A (en) * | 2023-08-17 | 2023-09-15 | 中国民用航空飞行学院 | Text classification method for civil aviation supervision item inspection record processing |
-
2022
- 2022-04-15 CN CN202210458635.0A patent/CN114817564A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245078A (en) * | 2022-11-30 | 2023-06-09 | 荣耀终端有限公司 | Structured information extraction method and electronic equipment |
CN116756624A (en) * | 2023-08-17 | 2023-09-15 | 中国民用航空飞行学院 | Text classification method for civil aviation supervision item inspection record processing |
CN116756624B (en) * | 2023-08-17 | 2023-12-12 | 中国民用航空飞行学院 | Text classification method for civil aviation supervision item inspection record processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804530B (en) | Subtitling areas of an image | |
CN110321563B (en) | Text emotion analysis method based on hybrid supervision model | |
Yang et al. | Multimodal sentiment analysis with unidirectional modality translation | |
CN113448477B (en) | Interactive image editing method and device, readable storage medium and electronic equipment | |
CN111680484B (en) | Answer model generation method and system for visual general knowledge reasoning question and answer | |
CN113268586A (en) | Text abstract generation method, device, equipment and storage medium | |
CN114817564A (en) | Attribute extraction method and device and storage medium | |
CN113221571B (en) | Entity relation joint extraction method based on entity correlation attention mechanism | |
CN113449801B (en) | Image character behavior description generation method based on multi-level image context coding and decoding | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN114997181A (en) | Intelligent question-answering method and system based on user feedback correction | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN115311465A (en) | Image description method based on double attention models | |
CN113705207A (en) | Grammar error recognition method and device | |
CN112836062A (en) | Relation extraction method of text corpus | |
CN116341519A (en) | Event causal relation extraction method, device and storage medium based on background knowledge | |
CN116258147A (en) | Multimode comment emotion analysis method and system based on heterogram convolution | |
CN116402066A (en) | Attribute-level text emotion joint extraction method and system for multi-network feature fusion | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism | |
CN116561305A (en) | False news detection method based on multiple modes and transformers | |
CN113779244B (en) | Document emotion classification method and device, storage medium and electronic equipment | |
CN115577072A (en) | Short text sentiment analysis method based on deep learning | |
CN114580397A (en) | Method and system for detecting < 35881 > and cursory comments | |
CN113642630A (en) | Image description method and system based on dual-path characteristic encoder | |
CN113255360A (en) | Document rating method and device based on hierarchical self-attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |