CN113536803A - Text information processing device and method, computer equipment and readable storage medium - Google Patents

Text information processing device and method, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113536803A
CN113536803A CN202010639599.9A CN202010639599A CN113536803A CN 113536803 A CN113536803 A CN 113536803A CN 202010639599 A CN202010639599 A CN 202010639599A CN 113536803 A CN113536803 A CN 113536803A
Authority
CN
China
Prior art keywords
text
analyzed
semantic feature
word
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010639599.9A
Other languages
Chinese (zh)
Other versions
CN113536803B (en
Inventor
王炳乾
梁天新
周希波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Publication of CN113536803A publication Critical patent/CN113536803A/en
Application granted granted Critical
Publication of CN113536803B publication Critical patent/CN113536803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text information processing device and method, a computer device and a readable storage medium. One embodiment of the apparatus comprises: the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed; and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word. According to the implementation mode, an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of multi-aspect words and information polarity classification is built, and the accuracy and the recall rate of fine-grained information processing can be improved.

Description

Text information processing device and method, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of text analysis. And more particularly, to a text information processing apparatus and method, a computer device, and a readable storage medium.
Background
With the rise of online social networks, a large number of users express their experiences and evaluations of life, events, products and the like on the internet by publishing characters. These word expressions provide a data basis for textual information processing research. Textual information processing studies the attitudes and opinions that people express in text. Fine-grained information processing is one of the subdivided fields, and the attitude and the viewpoint of fine granularity are researched. Fine-grained information processing still faces many difficulties and challenges in task definition, data preparation, and method effectiveness. Firstly, the classification research of text information with fine-grained information polarity can be used for extracting information words with attitudes and viewpoints from the text, and the related research has great application value in public opinion monitoring. Most of the previous information classification researches assume that only one viewpoint exists in the text, and ignore the phenomenon that the social network text contains multiple viewpoints. It is challenging how to identify all kinds of opinions that text contains, especially for short text scenes. Secondly, how to correspond various viewpoints and aspects is an aspect level information processing problem, which is a research on fine-grained information and can be further divided into two categories, namely information processing for aspect words and information processing for aspect categories. How to design a unified method and solve two aspect level information processing problems simultaneously is challenging.
With the rapid development of deep learning, a new method is provided for fine-grained information processing. The Google open-source pre-trained language model BERT achieves optimal results on 11 natural language processing tasks. The BERT model is called as Bidirectional Encoder responses from transducers, and is a novel language model. It is a new language model because it trains a pre-trained deep bi-directional representation by jointly adjusting bi-directional transformers in all layers. The pre-trained language model plays an important role for many natural language processing questions, such as SQuAD question-answering tasks, named entity recognition, and opinion recognition. At present, two strategies mainly exist for applying a pre-trained language model to an NLP task, wherein one strategy is a characteristic-based language model such as an ELMo model; another is a fine-tuned based language model, such as OpenAI GPT. The two types of language models have advantages and disadvantages, and the appearance of BERT combines the advantages of the two types of language models, so that the optimal effect can be achieved on a plurality of subsequent specific tasks. However, the existing BERT model has problems of low accuracy and recall rate when applied to information processing of texts such as social network texts of restaurant reviews.
Therefore, it is desirable to provide a new text information processing apparatus and method, a computer device, and a readable storage medium.
Disclosure of Invention
The invention aims to provide a text information processing device and method, a computer device and a readable storage medium, so as to solve at least one of the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a first aspect of the present invention provides a text information processing apparatus comprising:
the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed;
and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
According to the text information processing device provided by the first aspect of the invention, an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of words and information polarity classification is constructed, and the accuracy and recall rate of fine-grained information processing can be improved.
Optionally, the first loss function and the second loss function are cross entropy loss functions, respectively.
Optionally, the first loss function is:
Figure BDA0002571015410000021
wherein,
Figure BDA0002571015410000022
is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,
Figure BDA0002571015410000023
n is the word length of the text to be analyzed.
Alternatively,
probability distribution of starting position of aspect word in each position of text to be analyzed
Figure BDA0002571015410000024
Expressed as:
Figure BDA0002571015410000025
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,
Figure BDA0002571015410000026
for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzed
Figure BDA0002571015410000027
Expressed as:
Figure BDA0002571015410000028
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,
Figure BDA0002571015410000029
the binary sequence with the end position marked.
The optional mode adopts two binary classified sequences to judge the confidence coefficient that each sequence position of the semantic feature vector output by the L-th layer Transformer network of the BERT model is the starting and stopping position of the aspect word, so that the aspect word in the text to be analyzed can be accurately extracted.
Optionally, the second loss function is:
Figure BDA0002571015410000031
wherein k is the number of classification labels of the information polarity;
Figure BDA0002571015410000032
a known correct classification label;
Figure BDA0002571015410000033
the result probability is predicted for the information polarity classification of the aspect word, expressed as
Figure BDA0002571015410000034
hpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
According to the extracted aspect words and the boundary thereof, the optional mode can obtain the aspect word representation from the semantic feature vector output by the transform network of the L-th layer of the BERT model acquired from the shared layer, and the aspect word representation is spliced with the whole semantic representation to carry out information polarity classification, so that the information polarity classification of the aspect words in the text to be analyzed can be accurately realized.
The second aspect of the present invention provides a text information processing method, including:
adopting a BERT model as an encoder to extract semantic feature vectors of a text to be analyzed;
and according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function comprising a first loss function and a second loss function to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
Optionally, the first loss function and the second loss function are cross entropy loss functions, respectively.
Optionally, the first loss function is:
Figure BDA0002571015410000035
wherein,
Figure BDA0002571015410000036
is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,
Figure BDA0002571015410000037
n is the word length of the text to be analyzed.
Alternatively,
probability distribution of starting position of aspect word in each position of text to be analyzed
Figure BDA0002571015410000038
Expressed as:
Figure BDA0002571015410000039
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,
Figure BDA0002571015410000041
for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzed
Figure BDA0002571015410000042
Expressed as:
Figure BDA0002571015410000043
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,
Figure BDA0002571015410000044
the binary sequence with the end position marked.
Optionally, the second loss function is:
Figure BDA0002571015410000045
wherein k is the number of classification labels of the information polarity;
Figure BDA0002571015410000046
a known correct classification label;
Figure BDA0002571015410000047
the result probability is predicted for the information polarity classification of the aspect word, expressed as
Figure BDA0002571015410000048
hpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
A third aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the text information processing method provided by the second aspect of the present invention when executing the program.
A fourth aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the text information processing method provided by the second aspect of the present invention.
The invention has the following beneficial effects:
according to the technical scheme, an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of words and information polarity classification is constructed, and the accuracy and the recall rate of fine-grained information processing can be improved.
Drawings
The following detailed description of embodiments of the invention is provided in conjunction with the appended drawings:
fig. 1 is a schematic diagram illustrating an overall framework of an end-to-end fine-grained information processing model constructed by a text information processing apparatus according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a text information processing method according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer system that implements a text information processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the invention, the invention is further described below with reference to preferred embodiments and the accompanying drawings. Similar parts in the figures are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and is not to be taken as limiting the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a text information processing apparatus including:
the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed;
and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
The text information processing device provided by the embodiment constructs an end-to-end fine-grained information processing model capable of simultaneously realizing extraction of words and information polarity classification, and can improve the accuracy and recall rate of fine-grained information processing.
The end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment utilizes the BERT model as an encoder to extract the semantic feature vector of the text to be analyzed as a shared layer, and can simultaneously realize two tasks of multi-aspect word extraction and information polarity classification based on the words in various aspects for the text to be analyzed. Taking the text to be analyzed as an example of an english commenting text, for a restaurant comment "It's sad that at least one commenting is about place was great (unfortunately, all things except steak in this place are very good (even service and decoration)), the end-to-end fine-grained information processing model constructed by the text information processing apparatus provided by the embodiment can simultaneously extract a plurality of aspect words in the text, namely service, decor, steak and their corresponding information polarities, namely positive, positive and negative, the text information processing apparatus provided by the embodiment is also applicable to chinese texts.
As shown in table 1, unlike the existing information processing model (ABSA) based on the Aspect, the end-to-end fine-grained information processing model (E2E-ABSA) scheme constructed by the text information processing apparatus provided in this embodiment only needs to input a text to be analyzed (sequence), for example, an APP comment short text, and can output all the aspects (Aspect) and information polarities (Sentiment polarity) corresponding to the aspects in the text:
TABLE 1
Figure BDA0002571015410000061
In some optional implementation manners of this embodiment, the extracting of the semantic feature vector of the text to be analyzed by the BERT model-based encoder specifically includes: at the data input end, firstly, the input text to be analyzed is participled through a participler to obtain a participleThe latter sequence X ═ X1,x2,...,xn) The dictionary of the word segmentation device is consistent with the size of the BERT model, and is provided with 30522 wordpientes (words or word fragments). The sequence X is then encoded into a Token Embedding, Segment Embedding and Position Embedding. Adding three vectors as a total input-embedded representation or input-vector representation h0Represents the input vector as h0Obtaining semantic feature vector h of text to be analyzed through a Transformer network of an L layer of a BERT modelLWherein the input vector represents h0And the output of the i-th layer of the Transformer network in the L-layer of the Transformer network is as follows:
h0=XWt+Ws+Wp
hi=Transformer(hi-1),i∈[1,L]
wherein, WtEmbedding matrices for words, WpFor position-embedding matrices, WsSentence embedding matrix, hiIs the output of the i-th layer Transformer network or the hidden layer vector.
In some optional implementation manners of this embodiment, the information processing module is configured to execute two tasks, which are specifically as follows:
for the multifaceted word extraction (Multi-Aspect Extractor) task:
different from the existing sequence table labeling scheme, the embodiment adopts a double-pointer labeling mode to realize an Aspect word (Aspect) extraction task. The method specifically comprises the following steps:
as shown in FIG. 1, two binary 0/1 sequences S are usedsAnd SeMarking the starting and ending positions of the Aspect word (Aspect) in the text to be analyzed, specifically determining the starting and ending positions of the Aspect word (Aspect) by judging the possibility that an input sequence (semantic feature vector output by an L-th layer Transformer network of a BERT model) is 0/1 at each position by adopting two binary classification sequences, namely determining the starting and ending positions of the Aspect word (Aspect) by SsAnd SeThe confidence level of the starting and ending positions of the Aspect word (Aspect) at each position in the system is possible to determine the Aspect word (Aspect).
Wherein the start position of a certain Aspect word (Aspect) isSsProbability distribution of possible occurrence at each position in
Figure BDA0002571015410000071
(confidence) is expressed as:
Figure BDA0002571015410000072
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,
Figure BDA0002571015410000073
to label the process value of the binary sequence of the starting position, hLSemantic feature vectors or coding expressions output by an L-th layer Transformer network of the BERT model;
similarly, the end position of a certain Aspect word (Aspect) is SeProbability distribution of possible occurrence at each position in
Figure BDA0002571015410000074
(confidence) is expressed as:
Figure BDA0002571015410000075
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,
Figure BDA0002571015410000076
the process value of the binary sequence for the start position is labeled.
Finally, two vectors can be obtained
Figure BDA0002571015410000077
And
Figure BDA0002571015410000078
the objective function of the training (specifically the cross entropy loss function) is:
Figure BDA0002571015410000079
where n is the word length of the text to be analyzed (number of singles for English, number of words for Chinese).
For example, for the text to be analyzed illustrated in fig. 1, "it may be a bit packet on trees, but the week is good and it's the best French food you fine in the area" (the weekend may be somewhat crowded, but the atmosphere is good and this is the best French meal in the area), the positions corresponding to "packet", "video" and "French" in the binary sequence of the labeled start position have a value of 1, and the others are 0; the positions corresponding to the "packed", "vibe", and "food" in the two classification sequences labeled with the end positions have a value of 1, and the others have values of 0, whereby the headwords "packed", "vibe", and "French food" can be extracted.
The mode for realizing the aspect word extraction task adopts two binary classified sequences to judge the confidence degree that each sequence position of the semantic feature vector output by the L-th layer Transformer network of the BERT model is the starting and ending position of the aspect word, so that the aspect word in the text to be analyzed can be accurately extracted.
Information polarity Classification for facet words (sententimportarity Classifier) task:
for a fine-grained information processing task, the embodiment converts the information processing problem of the aspect words into a polarity classification problem, and the polarity of the aspect words is used as a classification label. In order to obtain the vector representation of the aspect words, the output result h is coded and output from the BERT model according to the boundaries of the aspect words (the starting and ending positions of the aspect words are the boundaries) obtained by the aspect word extraction taskLSemantic feature vector or coded representation h of mid-extraction aspect wordsaspAnd a semantic feature vector or a coded representation h integrating the overall semantic meaning and the meaning of the aspect wordpIn this embodiment, the BERT model is encoded and output as a special mark symbol [ CLS ]]Pooled outputs of corresponding final hidden states (i.e., by L-th layer transform of BERT model)Mark symbol [ CLS ] in semantic feature vector output by mer network]Corresponding semantic feature vector) hclsAnd according to aspect word boundaries from hLCoded representation h of extracted aspect wordsaspSplicing to obtain a comprehensive semantic representation hp
hp=concatenate([hcls,hasp])
Figure BDA0002571015410000081
Wherein(s)i',ej) Representing the boundaries of the facet words.
Then, the information polarity classification prediction result probability of the aspect word is expressed as:
Figure BDA0002571015410000082
wherein, WpFor the parameter matrix involved in training, Wp∈Rk×HK is the number of classification labels of information polarity, H is the number of hidden layer units, and a parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
The cross entropy loss function is adopted as the objective function (loss function) of the polarity classification model:
Figure BDA0002571015410000083
wherein,
Figure BDA0002571015410000084
a known correct classification label;
Figure BDA0002571015410000085
classify the predicted result probability for the information polarity of the facet word, which is expressed in
Figure BDA0002571015410000086
For parameter matrix WpAll parameters are combined and finely adjusted to maximize the logarithmic probability of a correct label, and then loss can be realizedpAnd (4) minimizing.
According to the method for realizing the information polarity classification task of the aspect words, the aspect word representation can be obtained from the semantic feature vector output by the transform network of the L-th layer of the BERT model obtained from the shared layer according to the extracted aspect words and the boundary thereof, and the information polarity classification can be carried out by splicing with the whole semantic representation, so that the information polarity classification of the aspect words in the text to be analyzed can be accurately realized.
In summary, the text information processing apparatus provided in this embodiment may construct an end-to-end fine-grained information processing model having an overall framework as shown in fig. 1, and encode a text to be analyzed by using a BERT model to obtain a text expression vector hLExtracting all aspect words and their boundaries(s) in the text using two binary classificationsi',ej) Completing the task of extracting various words, passing through the boundary of the various words and hLObtaining a coded or semantic representation h of an aspect wordaspExpress the whole semantic meaning hclsSemantic representation h with aspect wordaspSplicing to obtain a comprehensive semantic representation or semantic vector h with aspect perceptionpAnd predicting the information polarity of the corresponding aspect word in the text to be analyzed through a multi-classification. Wherein h isLAs a shared coding layer, the overall objective function of the two tasks is expressed as: loss is lossasp+lossp
Finally, only one text needs to be input at the input end of the end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment, all the aspect words and the corresponding information polarities of the aspect words in the text can be obtained, and therefore end-to-end fine-grained information processing is achieved.
The following provides a further explanation of the performance of the end-to-end fine-grained information processing model constructed by the text information processing apparatus according to this embodiment with data of comparative experiments.
As shown in table 2, a three-year data set (referred to as "reserve (total)) in the data set setup disclosed in SemEval 2014/2015/2016 and in the data set reserve disclosed in SemEval 2014/2015/2016 is used as an experimental object. These data give the start-stop position of the facet and one of the three information polarities (positive: +, negative: -, neutral: 0) for the facet.
TABLE 2
Dataset #Sent #Targets #+ #- #o
LAPtop 1869 2936 1326 990 620
RESTAURANT(total) 3900 6603 4134 1538 931
To verify the validity of the scheme, the experimental results of the end-to-end fine-grained information processing model constructed by the text information processing apparatus provided in this embodiment on the LAPTOP and restaring data sets are compared with the experimental results of several current end-to-end models with better performance, as shown in table 3, the initial value of the learning rate of BERT-Large of the pre-training model Google open source used in the experiment is 5e-5, the second epoch is reduced to 2e-5, 50 epochs are trained, and the optimal model is saved in the training process:
TABLE 3
Figure BDA0002571015410000091
Experimental results show that the experimental results of the end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment on two data sets are obviously improved compared with the results of several current end-to-end models such as UNIFIED, TAG-join, SPAN-join and the like, the F1 value on the LAPTOP data set is improved by 4+ percentage points, and the F1 value on the RESTURANT data set is improved by 7+ percentage points, so that the reliability and the effectiveness of the end-to-end fine-grained information processing model constructed by the text information processing device provided by the embodiment are shown.
As shown in fig. 2, another embodiment of the present invention provides a text information processing method, including:
s1, extracting semantic feature vectors of the text to be analyzed by using a BERT model as an encoder;
s2, according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function including a first loss function and a second loss function to obtain at least one aspect word included in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function classifies the information polarity of the aspect word.
In some optional implementations of this embodiment, the first loss function and the second loss function are cross entropy loss functions, respectively.
In some optional implementations of this embodiment, the first loss function is:
Figure BDA0002571015410000101
wherein,
Figure BDA0002571015410000102
is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,
Figure BDA0002571015410000103
n is the word length of the text to be analyzed.
In some alternative implementations of the present embodiment,
probability distribution of starting position of aspect word in each position of text to be analyzed
Figure BDA0002571015410000104
Expressed as:
Figure BDA0002571015410000105
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,
Figure BDA0002571015410000106
to label the process value of the binary sequence of the starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzed
Figure BDA0002571015410000107
Expressed as:
Figure BDA0002571015410000108
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,
Figure BDA0002571015410000109
the process value of the binary sequence that marks the termination location.
In some optional implementations of this embodiment, the second loss function is:
Figure BDA0002571015410000111
wherein k is the number of classification labels of the information polarity;
Figure BDA0002571015410000112
a known correct classification label;
Figure BDA0002571015410000113
the result probability is predicted for the information polarity classification of the aspect word, expressed as
Figure BDA0002571015410000114
hpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
It should be noted that the principle and the work flow of the text information processing method provided in this embodiment are similar to those of the text information processing apparatus, and reference may be made to the above description for relevant parts, which are not described herein again.
As shown in fig. 3, a computer system suitable for implementing the text information processing apparatus provided by the above-described embodiments includes a central processing module (CPU) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The CPU, ROM, and RAM are connected thereto via a bus. An input/output (I/O) interface is also connected to the bus.
An input section including a keyboard, a mouse, and the like; an output section including a speaker and the like such as a Liquid Crystal Display (LCD); a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary.
In particular, the processes described in the above flowcharts may be implemented as computer software programs according to the present embodiment. For example, the present embodiments include a computer program product comprising a computer program tangibly embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.
The flowchart and schematic diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to the present embodiments. In this regard, each block in the flowchart or schematic diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the schematic and/or flowchart illustration, and combinations of blocks in the schematic and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
On the other hand, the present embodiment also provides a nonvolatile computer storage medium, which may be the nonvolatile computer storage medium included in the apparatus in the foregoing embodiment, or may be a nonvolatile computer storage medium that exists separately and is not assembled into a terminal. The non-volatile computer storage medium stores one or more programs that, when executed by a device, cause the device to: adopting a BERT model as an encoder to extract semantic feature vectors of a text to be analyzed; and according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function comprising a first loss function and a second loss function to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
It is to be noted that, in the description of the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations and modifications can be made on the basis of the above description, and all embodiments cannot be exhaustive, and all obvious variations and modifications belonging to the technical scheme of the present invention are within the protection scope of the present invention.

Claims (12)

1. A text information processing apparatus characterized by comprising:
the coder based on the BERT model is used for extracting semantic feature vectors of the text to be analyzed;
and the information processing module is used for carrying out minimum solution on an objective function containing a first loss function and a second loss function according to the semantic feature vector of the text to be analyzed to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending position of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
2. The apparatus of claim 1, wherein the first and second loss functions are cross-entropy loss functions, respectively.
3. The apparatus of claim 2, wherein the first loss function is:
Figure FDA0002571015400000011
wherein,
Figure FDA0002571015400000012
is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,
Figure FDA0002571015400000013
n is the word length of the text to be analyzed.
4. The apparatus of claim 3,
probability distribution of starting position of aspect word in each position of text to be analyzed
Figure FDA0002571015400000014
Expressed as:
Figure FDA0002571015400000015
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,
Figure FDA0002571015400000016
for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzed
Figure FDA0002571015400000017
Expressed as:
Figure FDA0002571015400000018
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,
Figure FDA0002571015400000019
the binary sequence with the end position marked.
5. The apparatus of claim 2, wherein the second loss function is:
Figure FDA00025710154000000110
wherein k is the number of classification labels of the information polarity;
Figure FDA00025710154000000111
a known correct classification label;
Figure FDA00025710154000000112
the result probability is predicted for the information polarity classification of the aspect word, expressed as
Figure FDA00025710154000000113
hpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll ofParameters are used to be jointly refined to achieve minimization solution; bpIs the third bias term.
6. A text information processing method, comprising:
adopting a BERT model as an encoder to extract semantic feature vectors of a text to be analyzed;
and according to the semantic feature vector of the text to be analyzed, performing minimum solution on an objective function comprising a first loss function and a second loss function to obtain at least one aspect word contained in the text to be analyzed and the information polarity of the aspect word, wherein the objective of the first loss function is to label the starting and ending positions of the aspect word in the text to be analyzed, and the objective of the second loss function is to classify the information polarity of the aspect word.
7. The method of claim 6, wherein the first and second loss functions are cross-entropy loss functions, respectively.
8. The method of claim 7, wherein the first loss function is:
Figure FDA0002571015400000021
wherein,
Figure FDA0002571015400000022
is the probability distribution of the starting position of the facet word at each position in the text to be analyzed,
Figure FDA0002571015400000023
n is the word length of the text to be analyzed.
9. The method of claim 8,
probability distribution of starting position of aspect word in each position of text to be analyzed
Figure FDA0002571015400000024
Expressed as:
Figure FDA0002571015400000025
wherein, WstartAs a first trainable weight vector, bstartIs the first bias term, σ is the sigmoid activation function,
Figure FDA0002571015400000026
for the two-class sequence labeled starting position, hLSemantic feature vectors output by an L-th layer Transformer network of a BERT model, wherein the BERT model comprises the L-layer Transformer network;
probability distribution of termination position of aspect word in each position of text to be analyzed
Figure FDA0002571015400000027
Expressed as:
Figure FDA0002571015400000028
wherein, WendAs a second trainable weight vector, bendIn order to be a second bias term, the first bias term,
Figure FDA0002571015400000029
the binary sequence with the end position marked.
10. The method of claim 7, wherein the second loss function is:
Figure FDA00025710154000000210
wherein k is the number of classification labels of the information polarity;
Figure FDA00025710154000000211
a known correct classification label;
Figure FDA00025710154000000212
the result probability is predicted for the information polarity classification of the aspect word, expressed as
Figure FDA00025710154000000213
hpFor the markup symbols [ CLS ] in semantic feature vectors output by the L-th transform network of the BERT model]Corresponding semantic feature vector hclsSemantic feature vector h of aspect words in semantic feature vector output by L-th layer Transformer network of BERT modelaspSplicing the obtained comprehensive semantic feature vectors; wpFor the parameter matrix involved in training, Wp∈Rk×HH is the number of hidden layer units, parameter matrix WpAll parameters of (a) are used to be jointly refined to achieve a minimization solution; bpIs the third bias term.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 6-10 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 6-10.
CN202010639599.9A 2020-04-13 2020-07-06 Text information processing device and method, computer device, and readable storage medium Active CN113536803B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010285983 2020-04-13
CN2020102859833 2020-04-13

Publications (2)

Publication Number Publication Date
CN113536803A true CN113536803A (en) 2021-10-22
CN113536803B CN113536803B (en) 2024-08-13

Family

ID=78124125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010639599.9A Active CN113536803B (en) 2020-04-13 2020-07-06 Text information processing device and method, computer device, and readable storage medium

Country Status (1)

Country Link
CN (1) CN113536803B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832400A (en) * 2017-11-01 2018-03-23 山东大学 A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification
US20190122145A1 (en) * 2017-10-23 2019-04-25 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for extracting information
CN109871444A (en) * 2019-01-16 2019-06-11 北京邮电大学 A kind of file classification method and system
CN109918663A (en) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 A kind of semantic matching method, device and storage medium
CN110046248A (en) * 2019-03-08 2019-07-23 阿里巴巴集团控股有限公司 Model training method, file classification method and device for text analyzing
CN110222178A (en) * 2019-05-24 2019-09-10 新华三大数据技术有限公司 Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN110334210A (en) * 2019-05-30 2019-10-15 哈尔滨理工大学 A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN
CN110362817A (en) * 2019-06-04 2019-10-22 中国科学院信息工程研究所 A kind of viewpoint proneness analysis method and system towards product attribute
CN110377740A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Feeling polarities analysis method, device, electronic equipment and storage medium
CN110516245A (en) * 2019-08-27 2019-11-29 蓝盾信息安全技术股份有限公司 Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium
CN110866117A (en) * 2019-10-25 2020-03-06 西安交通大学 Short text classification method based on semantic enhancement and multi-level label embedding
CN110955750A (en) * 2019-11-11 2020-04-03 北京三快在线科技有限公司 Combined identification method and device for comment area and emotion polarity, and electronic equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122145A1 (en) * 2017-10-23 2019-04-25 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for extracting information
CN107832400A (en) * 2017-11-01 2018-03-23 山东大学 A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification
CN109871444A (en) * 2019-01-16 2019-06-11 北京邮电大学 A kind of file classification method and system
CN109918663A (en) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 A kind of semantic matching method, device and storage medium
CN110046248A (en) * 2019-03-08 2019-07-23 阿里巴巴集团控股有限公司 Model training method, file classification method and device for text analyzing
CN110222178A (en) * 2019-05-24 2019-09-10 新华三大数据技术有限公司 Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN110334210A (en) * 2019-05-30 2019-10-15 哈尔滨理工大学 A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN
CN110362817A (en) * 2019-06-04 2019-10-22 中国科学院信息工程研究所 A kind of viewpoint proneness analysis method and system towards product attribute
CN110377740A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Feeling polarities analysis method, device, electronic equipment and storage medium
CN110516245A (en) * 2019-08-27 2019-11-29 蓝盾信息安全技术股份有限公司 Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium
CN110866117A (en) * 2019-10-25 2020-03-06 西安交通大学 Short text classification method based on semantic enhancement and multi-level label embedding
CN110955750A (en) * 2019-11-11 2020-04-03 北京三快在线科技有限公司 Combined identification method and device for comment area and emotion polarity, and electronic equipment

Also Published As

Publication number Publication date
CN113536803B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN107680579B (en) Text regularization model training method and device, and text regularization method and device
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN110489555A (en) A kind of language model pre-training method of combination class word information
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
CN110472235A (en) A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN112257452B (en) Training method, training device, training equipment and training storage medium for emotion recognition model
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN109086265A (en) A kind of semanteme training method, multi-semantic meaning word disambiguation method in short text
CN113723105A (en) Training method, device and equipment of semantic feature extraction model and storage medium
CN112188311B (en) Method and apparatus for determining video material of news
CN110874411A (en) Cross-domain emotion classification system based on attention mechanism fusion
CN113553412A (en) Question and answer processing method and device, electronic equipment and storage medium
CN113392209A (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN110633475A (en) Natural language understanding method, device and system based on computer scene and storage medium
CN114416995A (en) Information recommendation method, device and equipment
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN114529903A (en) Text refinement network
CN113761190A (en) Text recognition method and device, computer readable medium and electronic equipment
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
WO2023134085A1 (en) Question answer prediction method and prediction apparatus, electronic device, and storage medium
CN117351336A (en) Image auditing method and related equipment
CN115129862A (en) Statement entity processing method and device, computer equipment and storage medium
CN114417874A (en) Chinese named entity recognition method and system based on graph attention network
CN114092931A (en) Scene character recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant