CN113536803A - Text information processing device and method, computer equipment and readable storage medium - Google Patents
Text information processing device and method, computer equipment and readable storage medium Download PDFInfo
- Publication number
- CN113536803A CN113536803A CN202010639599.9A CN202010639599A CN113536803A CN 113536803 A CN113536803 A CN 113536803A CN 202010639599 A CN202010639599 A CN 202010639599A CN 113536803 A CN113536803 A CN 113536803A
- Authority
- CN
- China
- Prior art keywords
- text
- analyzed
- semantic feature
- word
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 79
- 230000006870 function Effects 0.000 claims description 73
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000003672 processing method Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 abstract description 10
- 230000008569 process Effects 0.000 description 10
- 238000011160 research Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 235000013305 food Nutrition 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a text information processing device and method, a computer device and a readable storage medium. One embodiment of the apparatus comprises: the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed; and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word. According to the implementation mode, an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of multi-aspect words and information polarity classification is built, and the accuracy and the recall rate of fine-grained information processing can be improved.
Description
Technical Field
The invention relates to the technical field of text analysis. And more particularly, to a text information processing apparatus and method, a computer device, and a readable storage medium.
Background
With the rise of online social networks, a large number of users express their experiences and evaluations of life, events, products and the like on the internet by publishing characters. These word expressions provide a data basis for textual information processing research. Textual information processing studies the attitudes and opinions that people express in text. Fine-grained information processing is one of the subdivided fields, and the attitude and the viewpoint of fine granularity are researched. Fine-grained information processing still faces many difficulties and challenges in task definition, data preparation, and method effectiveness. Firstly, the classification research of text information with fine-grained information polarity can be used for extracting information words with attitudes and viewpoints from the text, and the related research has great application value in public opinion monitoring. Most of the previous information classification researches assume that only one viewpoint exists in the text, and ignore the phenomenon that the social network text contains multiple viewpoints. It is challenging how to identify all kinds of opinions that text contains, especially for short text scenes. Secondly, how to correspond various viewpoints and aspects is an aspect level information processing problem, which is a research on fine-grained information and can be further divided into two categories, namely information processing for aspect words and information processing for aspect categories. How to design a unified method and solve two aspect level information processing problems simultaneously is challenging.
With the rapid development of deep learning, a new method is provided for fine-grained information processing. The Google open-source pre-trained language model BERT achieves optimal results on 11 natural language processing tasks. The BERT model is called as Bidirectional Encoder responses from transducers, and is a novel language model. It is a new language model because it trains a pre-trained deep bi-directional representation by jointly adjusting bi-directional transformers in all layers. The pre-trained language model plays an important role for many natural language processing questions, such as SQuAD question-answering tasks, named entity recognition, and opinion recognition. At present, two strategies mainly exist for applying a pre-trained language model to an NLP task, wherein one strategy is a characteristic-based language model such as an ELMo model; another is a fine-tuned based language model, such as OpenAI GPT. The two types of language models have advantages and disadvantages, and the appearance of BERT combines the advantages of the two types of language models, so that the optimal effect can be achieved on a plurality of subsequent specific tasks. However, the existing BERT model has problems of low accuracy and recall rate when applied to information processing of texts such as social network texts of restaurant reviews.
Therefore, it is desirable to provide a new text information processing apparatus and method, a computer device, and a readable storage medium.
Disclosure of Invention
The invention aims to provide a text information processing device and method, a computer device and a readable storage medium, so as to solve at least one of the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a first aspect of the present invention provides a text information processing apparatus comprising:
the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed;
and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
According to the text information processing device provided by the first aspect of the invention, an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of words and information polarity classification is constructed, and the accuracy and recall rate of fine-grained information processing can be improved.
Optionally, the first loss function and the second loss function are cross entropy loss functions, respectively.
Optionally, the first loss function is:
wherein,is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,n is the word length of the text to be analyzed.
Alternatively,
probability distribution of starting position of aspect word in each position of text to be analyzedExpressed as:
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzedExpressed as:
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,the binary sequence with the end position marked.
The optional mode adopts two binary classified sequences to judge the confidence coefficient that each sequence position of the semantic feature vector output by the L-th layer Transformer network of the BERT model is the starting and stopping position of the aspect word, so that the aspect word in the text to be analyzed can be accurately extracted.
Optionally, the second loss function is:
wherein k is the number of classification labels of the information polarity;a known correct classification label;the result probability is predicted for the information polarity classification of the aspect word, expressed ashpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
According to the extracted aspect words and the boundary thereof, the optional mode can obtain the aspect word representation from the semantic feature vector output by the transform network of the L-th layer of the BERT model acquired from the shared layer, and the aspect word representation is spliced with the whole semantic representation to carry out information polarity classification, so that the information polarity classification of the aspect words in the text to be analyzed can be accurately realized.
The second aspect of the present invention provides a text information processing method, including:
adopting a BERT model as an encoder to extract semantic feature vectors of a text to be analyzed;
and according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function comprising a first loss function and a second loss function to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
Optionally, the first loss function and the second loss function are cross entropy loss functions, respectively.
Optionally, the first loss function is:
wherein,is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,n is the word length of the text to be analyzed.
Alternatively,
probability distribution of starting position of aspect word in each position of text to be analyzedExpressed as:
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzedExpressed as:
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,the binary sequence with the end position marked.
Optionally, the second loss function is:
wherein k is the number of classification labels of the information polarity;a known correct classification label;the result probability is predicted for the information polarity classification of the aspect word, expressed ashpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
A third aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the text information processing method provided by the second aspect of the present invention when executing the program.
A fourth aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the text information processing method provided by the second aspect of the present invention.
The invention has the following beneficial effects:
according to the technical scheme, an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of words and information polarity classification is constructed, and the accuracy and the recall rate of fine-grained information processing can be improved.
Drawings
The following detailed description of embodiments of the invention is provided in conjunction with the appended drawings:
fig. 1 is a schematic diagram illustrating an overall framework of an end-to-end fine-grained information processing model constructed by a text information processing apparatus according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a text information processing method according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer system that implements a text information processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the invention, the invention is further described below with reference to preferred embodiments and the accompanying drawings. Similar parts in the figures are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and is not to be taken as limiting the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a text information processing apparatus including:
the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed;
and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
The text information processing device provided by the embodiment constructs an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of words and information polarity classification, and can improve the accuracy and recall rate of fine-grained information processing.
The end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment utilizes the BERT model as an encoder to extract the semantic feature vector of the text to be analyzed as a shared layer, and can simultaneously realize two tasks of multi-aspect word extraction and information polarity classification based on the words in various aspects for the text to be analyzed. Taking the text to be analyzed as an example of an english commenting text, for a restaurant comment "It's sad that at least one commenting is about place was great (unfortunately, all things except steak in this place are very good (even service and decoration)), the end-to-end fine-grained information processing model constructed by the text information processing apparatus provided by the embodiment can simultaneously extract a plurality of aspect words in the text, namely service, decor, steak and their corresponding information polarities, namely positive, positive and negative, the text information processing apparatus provided by the embodiment is also applicable to chinese texts.
As shown in table 1, unlike the existing information processing model (ABSA) based on the Aspect, the end-to-end fine-grained information processing model (E2E-ABSA) scheme constructed by the text information processing apparatus provided in this embodiment only needs to input a text to be analyzed (sequence), for example, an APP comment short text, and can output all the aspects (Aspect) and information polarities (Sentiment polarity) corresponding to the aspects in the text:
TABLE 1
In some optional implementation manners of this embodiment, the extracting of the semantic feature vector of the text to be analyzed by the BERT model-based encoder specifically includes: at the data input end, firstly, the input text to be analyzed is participled through a participler to obtain a participleThe latter sequence X ═ X1,x2,...,xn) The dictionary of the word segmentation device is consistent with the size of the BERT model, and is provided with 30522 wordpientes (words or word fragments). The sequence X is then encoded into a Token Embedding, Segment Embedding and Position Embedding. Adding three vectors as a total input-embedded representation or input-vector representation h0Represents the input vector as h0Obtaining semantic feature vector h of text to be analyzed through a Transformer network of an L layer of a BERT modelLWherein the input vector represents h0And the output of the i-th layer of the Transformer network in the L-layer of the Transformer network is as follows:
h0=XWt+Ws+Wp
hi=Transformer(hi-1),i∈[1,L]
wherein, WtEmbedding matrices for words, WpFor position-embedding matrices, WsSentence embedding matrix, hiIs the output of the i-th layer Transformer network or the hidden layer vector.
In some optional implementation manners of this embodiment, the information processing module is configured to execute two tasks, which are specifically as follows:
for the multifaceted word extraction (Multi-Aspect Extractor) task:
different from the existing sequence table labeling scheme, the embodiment adopts a double-pointer labeling mode to realize an Aspect word (Aspect) extraction task. The method specifically comprises the following steps:
as shown in FIG. 1, two binary 0/1 sequences S are usedsAnd SeMarking the starting and ending positions of the Aspect word (Aspect) in the text to be analyzed, specifically determining the starting and ending positions of the Aspect word (Aspect) by judging the possibility that an input sequence (semantic feature vector output by an L-th layer Transformer network of a BERT model) is 0/1 at each position by adopting two binary classification sequences, namely determining the starting and ending positions of the Aspect word (Aspect) by SsAnd SeThe confidence level of the starting and ending positions of the Aspect word (Aspect) at each position in the system is possible to determine the Aspect word (Aspect).
Wherein the start position of a certain Aspect word (Aspect) isSsProbability distribution of possible occurrence at each position in(confidence) is expressed as:
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,to label the process value of the binary sequence of the starting position, hLSemantic feature vectors or coding expressions output by an L-th layer Transformer network of the BERT model;
similarly, the end position of a certain Aspect word (Aspect) is SeProbability distribution of possible occurrence at each position in(confidence) is expressed as:
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,the process value of the binary sequence for the start position is labeled.
Finally, two vectors can be obtainedAndthe objective function of the training (specifically the cross entropy loss function) is:
where n is the word length of the text to be analyzed (number of singles for English, number of words for Chinese).
For example, for the text to be analyzed illustrated in fig. 1, "it may be a bit packet on trees, but the week is good and it's the best French food you fine in the area" (the weekend may be somewhat crowded, but the atmosphere is good and this is the best French meal in the area), the positions corresponding to "packet", "video" and "French" in the binary sequence of the labeled start position have a value of 1, and the others are 0; the positions corresponding to the "packed", "vibe", and "food" in the two classification sequences labeled with the end positions have a value of 1, and the others have values of 0, whereby the headwords "packed", "vibe", and "French food" can be extracted.
The mode for realizing the aspect word extraction task adopts two binary classified sequences to judge the confidence degree that each sequence position of the semantic feature vector output by the L-th layer Transformer network of the BERT model is the starting and ending position of the aspect word, so that the aspect word in the text to be analyzed can be accurately extracted.
Information polarity Classification for facet words (sententimportarity Classifier) task:
for a fine-grained information processing task, the embodiment converts the information processing problem of the aspect words into a polarity classification problem, and the polarity of the aspect words is used as a classification label. In order to obtain the vector representation of the aspect words, the output result h is coded and output from the BERT model according to the boundaries of the aspect words (the starting and ending positions of the aspect words are the boundaries) obtained by the aspect word extraction taskLSemantic feature vector or coded representation h of mid-extraction aspect wordsaspAnd a semantic feature vector or a coded representation h integrating the overall semantic meaning and the meaning of the aspect wordpIn this embodiment, the BERT model is encoded and output as a special mark symbol [ CLS ]]Pooled outputs of corresponding final hidden states (i.e., by L-th layer transform of BERT model)Mark symbol [ CLS ] in semantic feature vector output by mer network]Corresponding semantic feature vector) hclsAnd according to aspect word boundaries from hLCoded representation h of extracted aspect wordsaspSplicing to obtain a comprehensive semantic representation hp:
hp=concatenate([hcls,hasp])
Wherein(s)i',ej) Representing the boundaries of the facet words.
Then, the information polarity classification prediction result probability of the aspect word is expressed as:
wherein, WpFor the parameter matrix involved in training, Wp∈Rk×HK is the number of classification labels of information polarity, H is the number of hidden layer units, and a parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
The cross entropy loss function is adopted as the objective function (loss function) of the polarity classification model:
wherein,a known correct classification label;classify the predicted result probability for the information polarity of the facet word, which is expressed inFor parameter matrix WpAll parameters are combined and finely adjusted to maximize the logarithmic probability of a correct label, and then loss can be realizedpAnd (4) minimizing.
According to the method for realizing the information polarity classification task of the aspect words, the aspect word representation can be obtained from the semantic feature vector output by the transform network of the L-th layer of the BERT model obtained from the shared layer according to the extracted aspect words and the boundary thereof, and the information polarity classification can be carried out by splicing with the whole semantic representation, so that the information polarity classification of the aspect words in the text to be analyzed can be accurately realized.
In summary, the text information processing apparatus provided in this embodiment may construct an end-to-end fine-grained information processing model having an overall framework as shown in fig. 1, and encode a text to be analyzed by using a BERT model to obtain a text expression vector hLExtracting all aspect words and their boundaries(s) in the text using two binary classificationsi',ej) Completing the task of extracting various words, passing through the boundary of the various words and hLObtaining a coded or semantic representation h of an aspect wordaspExpress the whole semantic meaning hclsSemantic representation h with aspect wordaspSplicing to obtain a comprehensive semantic representation or semantic vector h with aspect perceptionpAnd predicting the information polarity of the corresponding aspect word in the text to be analyzed through a multi-classification. Wherein h isLAs a shared coding layer, the overall objective function of the two tasks is expressed as: loss is lossasp+lossp。
Finally, only one text needs to be input at the input end of the end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment, all the aspect words and the corresponding information polarities of the aspect words in the text can be obtained, and therefore end-to-end fine-grained information processing is achieved.
The following provides a further explanation of the performance of the end-to-end fine-grained information processing model constructed by the text information processing apparatus according to this embodiment with data of comparative experiments.
As shown in table 2, a three-year data set (referred to as "reserve (total)) in the data set setup disclosed in SemEval 2014/2015/2016 and in the data set reserve disclosed in SemEval 2014/2015/2016 is used as an experimental object. These data give the start-stop position of the facet and one of the three information polarities (positive: +, negative: -, neutral: 0) for the facet.
TABLE 2
Dataset | #Sent | #Targets | #+ | #- | #o |
LAPtop | 1869 | 2936 | 1326 | 990 | 620 |
RESTAURANT(total) | 3900 | 6603 | 4134 | 1538 | 931 |
To verify the validity of the scheme, the experimental results of the end-to-end fine-grained information processing model constructed by the text information processing apparatus provided in this embodiment on the LAPTOP and restaring data sets are compared with the experimental results of several current end-to-end models with better performance, as shown in table 3, the initial value of the learning rate of BERT-Large of the pre-training model Google open source used in the experiment is 5e-5, the second epoch is reduced to 2e-5, 50 epochs are trained, and the optimal model is saved in the training process:
TABLE 3
Experimental results show that the experimental results of the end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment on two data sets are obviously improved compared with the results of several current end-to-end models such as UNIFIED, TAG-join, SPAN-join and the like, the F1 value on the LAPTOP data set is improved by 4+ percentage points, and the F1 value on the RESTURANT data set is improved by 7+ percentage points, so that the reliability and the effectiveness of the end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment are shown.
As shown in fig. 2, another embodiment of the present invention provides a text information processing method, including:
s1, extracting semantic feature vectors of the text to be analyzed by using a BERT model as an encoder;
s2, according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function including a first loss function and a second loss function to obtain at least one aspect word included in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function classifies the information polarity of the aspect word.
In some optional implementations of this embodiment, the first loss function and the second loss function are cross entropy loss functions, respectively.
In some optional implementations of this embodiment, the first loss function is:
wherein,is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,n is the word length of the text to be analyzed.
In some alternative implementations of the present embodiment,
probability distribution of starting position of aspect word in each position of text to be analyzedExpressed as:
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,to label the process value of the binary sequence of the starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzedExpressed as:
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,the process value of the binary sequence that marks the termination location.
In some optional implementations of this embodiment, the second loss function is:
wherein k is the number of classification labels of the information polarity;a known correct classification label;the result probability is predicted for the information polarity classification of the aspect word, expressed ashpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
It should be noted that the principle and the work flow of the text information processing method provided in this embodiment are similar to those of the text information processing apparatus, and reference may be made to the above description for relevant parts, which are not described herein again.
As shown in fig. 3, a computer system suitable for implementing the text information processing apparatus provided by the above-described embodiments includes a central processing module (CPU) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The CPU, ROM, and RAM are connected thereto via a bus. An input/output (I/O) interface is also connected to the bus.
An input section including a keyboard, a mouse, and the like; an output section including a speaker and the like such as a Liquid Crystal Display (LCD); a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary.
In particular, the processes described in the above flowcharts may be implemented as computer software programs according to the present embodiment. For example, the present embodiments include a computer program product comprising a computer program tangibly embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.
The flowchart and schematic diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to the present embodiments. In this regard, each block in the flowchart or schematic diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the schematic and/or flowchart illustration, and combinations of blocks in the schematic and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
On the other hand, the present embodiment also provides a nonvolatile computer storage medium, which may be the nonvolatile computer storage medium included in the apparatus in the foregoing embodiment, or may be a nonvolatile computer storage medium that exists separately and is not assembled into a terminal. The non-volatile computer storage medium stores one or more programs that, when executed by a device, cause the device to: adopting a BERT model as an encoder to extract semantic feature vectors of a text to be analyzed; and according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function comprising a first loss function and a second loss function to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
It is to be noted that, in the description of the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations and modifications can be made on the basis of the above description, and all embodiments cannot be exhaustive, and all obvious variations and modifications belonging to the technical scheme of the present invention are within the protection scope of the present invention.
Claims (12)
1. A text information processing apparatus characterized by comprising:
the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed;
and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
2. The apparatus of claim 1, wherein the first and second loss functions are cross-entropy loss functions, respectively.
4. The apparatus of claim 3,
probability distribution of starting position of aspect word in each position of text to be analyzedExpressed as:
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzedExpressed as:
5. The apparatus of claim 2, wherein the second loss function is:
wherein k is the number of classification labels of the information polarity;a known correct classification label;the result probability is predicted for the information polarity classification of the aspect word, expressed ashpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll ofParameters are used to be jointly refined to achieve minimization solution; bpIs the third bias term.
6. A text information processing method, comprising:
adopting a BERT model as an encoder to extract semantic feature vectors of a text to be analyzed;
and according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function comprising a first loss function and a second loss function to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
7. The method of claim 6, wherein the first and second loss functions are cross-entropy loss functions, respectively.
9. The method of claim 8,
probability distribution of starting position of aspect word in each position of text to be analyzedExpressed as:
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzedExpressed as:
10. The method of claim 7, wherein the second loss function is:
wherein k is the number of classification labels of the information polarity;a known correct classification label;the result probability is predicted for the information polarity classification of the aspect word, expressed ashpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 6-10 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 6-10.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010285983 | 2020-04-13 | ||
CN2020102859833 | 2020-04-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113536803A true CN113536803A (en) | 2021-10-22 |
CN113536803B CN113536803B (en) | 2024-08-13 |
Family
ID=78124125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010639599.9A Active CN113536803B (en) | 2020-04-13 | 2020-07-06 | Text information processing device and method, computer device, and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113536803B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832400A (en) * | 2017-11-01 | 2018-03-23 | 山东大学 | A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification |
US20190122145A1 (en) * | 2017-10-23 | 2019-04-25 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for extracting information |
CN109871444A (en) * | 2019-01-16 | 2019-06-11 | 北京邮电大学 | A kind of file classification method and system |
CN109918663A (en) * | 2019-03-04 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of semantic matching method, device and storage medium |
CN110046248A (en) * | 2019-03-08 | 2019-07-23 | 阿里巴巴集团控股有限公司 | Model training method, file classification method and device for text analyzing |
CN110222178A (en) * | 2019-05-24 | 2019-09-10 | 新华三大数据技术有限公司 | Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing |
CN110334210A (en) * | 2019-05-30 | 2019-10-15 | 哈尔滨理工大学 | A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN |
CN110362817A (en) * | 2019-06-04 | 2019-10-22 | 中国科学院信息工程研究所 | A kind of viewpoint proneness analysis method and system towards product attribute |
CN110377740A (en) * | 2019-07-22 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Feeling polarities analysis method, device, electronic equipment and storage medium |
CN110516245A (en) * | 2019-08-27 | 2019-11-29 | 蓝盾信息安全技术股份有限公司 | Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium |
CN110866117A (en) * | 2019-10-25 | 2020-03-06 | 西安交通大学 | Short text classification method based on semantic enhancement and multi-level label embedding |
CN110955750A (en) * | 2019-11-11 | 2020-04-03 | 北京三快在线科技有限公司 | Combined identification method and device for comment area and emotion polarity, and electronic equipment |
-
2020
- 2020-07-06 CN CN202010639599.9A patent/CN113536803B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122145A1 (en) * | 2017-10-23 | 2019-04-25 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for extracting information |
CN107832400A (en) * | 2017-11-01 | 2018-03-23 | 山东大学 | A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification |
CN109871444A (en) * | 2019-01-16 | 2019-06-11 | 北京邮电大学 | A kind of file classification method and system |
CN109918663A (en) * | 2019-03-04 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of semantic matching method, device and storage medium |
CN110046248A (en) * | 2019-03-08 | 2019-07-23 | 阿里巴巴集团控股有限公司 | Model training method, file classification method and device for text analyzing |
CN110222178A (en) * | 2019-05-24 | 2019-09-10 | 新华三大数据技术有限公司 | Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing |
CN110334210A (en) * | 2019-05-30 | 2019-10-15 | 哈尔滨理工大学 | A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN |
CN110362817A (en) * | 2019-06-04 | 2019-10-22 | 中国科学院信息工程研究所 | A kind of viewpoint proneness analysis method and system towards product attribute |
CN110377740A (en) * | 2019-07-22 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Feeling polarities analysis method, device, electronic equipment and storage medium |
CN110516245A (en) * | 2019-08-27 | 2019-11-29 | 蓝盾信息安全技术股份有限公司 | Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium |
CN110866117A (en) * | 2019-10-25 | 2020-03-06 | 西安交通大学 | Short text classification method based on semantic enhancement and multi-level label embedding |
CN110955750A (en) * | 2019-11-11 | 2020-04-03 | 北京三快在线科技有限公司 | Combined identification method and device for comment area and emotion polarity, and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113536803B (en) | 2024-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112685565B (en) | Text classification method based on multi-mode information fusion and related equipment thereof | |
CN107680579B (en) | Text regularization model training method and device, and text regularization method and device | |
CN111985239B (en) | Entity identification method, entity identification device, electronic equipment and storage medium | |
CN110489555A (en) | A kind of language model pre-training method of combination class word information | |
CN113051356B (en) | Open relation extraction method and device, electronic equipment and storage medium | |
CN110472235A (en) | A kind of end-to-end entity relationship joint abstracting method towards Chinese text | |
CN112257452B (en) | Training method, training device, training equipment and training storage medium for emotion recognition model | |
CN111339260A (en) | BERT and QA thought-based fine-grained emotion analysis method | |
CN109086265A (en) | A kind of semanteme training method, multi-semantic meaning word disambiguation method in short text | |
CN113723105A (en) | Training method, device and equipment of semantic feature extraction model and storage medium | |
CN112188311B (en) | Method and apparatus for determining video material of news | |
CN110874411A (en) | Cross-domain emotion classification system based on attention mechanism fusion | |
CN113553412A (en) | Question and answer processing method and device, electronic equipment and storage medium | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN110633475A (en) | Natural language understanding method, device and system based on computer scene and storage medium | |
CN114416995A (en) | Information recommendation method, device and equipment | |
CN112749556B (en) | Multi-language model training method and device, storage medium and electronic equipment | |
CN114529903A (en) | Text refinement network | |
CN113761190A (en) | Text recognition method and device, computer readable medium and electronic equipment | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
WO2023134085A1 (en) | Question answer prediction method and prediction apparatus, electronic device, and storage medium | |
CN117351336A (en) | Image auditing method and related equipment | |
CN115129862A (en) | Statement entity processing method and device, computer equipment and storage medium | |
CN114417874A (en) | Chinese named entity recognition method and system based on graph attention network | |
CN114092931A (en) | Scene character recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |