CN110688485A

CN110688485A - Word vector language model based on emergency

Info

Publication number: CN110688485A
Application number: CN201910915299.6A
Authority: CN
Inventors: 赵鑫; 朱秋昱; 张明
Original assignee: Hangzhou Pen Sound Intelligent Technology Co Ltd; Renmin University of China
Current assignee: Hangzhou Pen Sound Intelligent Technology Co Ltd; Renmin University of China
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2020-01-14
Anticipated expiration: 2039-09-26
Also published as: CN110688485B

Abstract

The invention provides a Word vector language model based on an emergency, which uses a traditional Word2Vec model to train a context, wherein the training comprises calculating input in an input layer of the model

Hidden layer information of

Simultaneous addition of vector representations of incident events at the input layerWill be provided with

Andobtaining a final hidden layer representation by weighted summation

The two are made to jointly influence the final hidden layer representation, the generated hidden layer representation being not only related to the context but also to the emergency. The invention provides a new emergency-related word vector model, which is used for modeling text stream data containing an emergency. The method can learn a word vector model with the characteristics of the emergency to identify semantic changes, and the emergency vector representation is added to improve semantic relevance.

Description

Word vector language model based on emergency

Technical Field

The invention relates to the technical field of dynamic word vector generation models, in particular to a word vector language model based on an emergency.

Background

Word semantics are changing over time, and many factors affect word semantics, including cultural changes, creation of new technologies, etc., such as the word "amazon" referring initially to tropical rainforest, and then the word "amazon" is referred to more as an e-business due to the appearance of the same name of e-business.

The emergency event refers to an event which is suddenly referred to several times in a short time, such as "a fire in the san diese, paris", "an earthquake in taiwan", and the like. In the incident detection model, an incident is generally represented as < target word, start time, end time >. In previous work, people only focused on the expression of semantics in emergencies, and we propose to combine emergencies with word vectors to capture and understand word correlations and semantic changes between emergencies.

Inspired by recent research, the dynamic word vector model is used for learning semantic representations of words in different time periods, and can find dynamic changes of word semantics. The dynamic word vector model is a time vector sequence for constructing words in a learning continuous time interval, and can track semantic changes generated in the use of the words. Furthermore, the dynamic word vector model makes it possible to find different words with the same meaning in different time periods, for example by retrieving word vectors of similar regions in the vector space corresponding to different time periods. Due to the randomness of the neural network training process, if we apply a word vector model on each time slice, the output vectors of each time slice will be placed in a vector space with a different coordinate system. In order to compare semantic changes of words, they need to be aligned and mapped into the same vector space. The existing methods for exploring semantic changes of words include the following methods:

1) comparing word similarity: the method calculates the similarity of two entities at the same time, and compares the changes of the similarity at different times, thereby finding out semantic changes and avoiding the comparison of word vectors at different times.

2) Alignment of linear transformation: the method is that the distance of the same word at different time is minimized and a transformation matrix is learned to align word vectors on the assumption that the semantics of most words are not changed.

3) Non-random initial alignment: the method is to initialize the t moment by utilizing the word vector at the t-1 moment so as to lead the word vector to change smoothly along the time track.

These dynamic word vector methods have the following problems when applied to an emergency:

1) although the method of comparing word similarity is simple and feasible, it can only compare the semantics between fixed entities, and lacks flexibility.

2) The basis for linear transformation alignment is to assume that most words do not change their meaning over time. This approach works well, but for some words, it may result in excessively smooth differences between word senses, which have changed over time.

3) Although the non-random initialization method solves the alignment problem, words are enabled to have rich semantic information of the previous corpus, and the semantic information expression capacity of the current emergency is weakened.

In order to solve the above problems, the present invention proposes a new sudden word vector model for capturing word semantic transitions related to sudden events in a text stream. Unlike previous methods that split the entire time span on average, the present invention uses multiple burst segments of the word detected by the burst detection model as the time span given a target word. We assume that each word has a unique representation for each incident. These burst-specific embeddings can represent very different semantics at different bursts of the target word without using linear dependencies or alignment constraints. Intuitively, the words in the text that are related to the emergency typically have some common meaning, which collectively represent the semantics of the emergency. In this regard, the present invention further incorporates an incident vector to force all related words to share similar semantics.

The information disclosed in this background section is only for enhancement of understanding of the general background of the application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The present invention is directed to provide a word vector language model based on emergency events, so as to solve the technical problems in the prior art.

In order to solve the technical problem, the invention provides a Word vector language model based on an emergency, which is characterized in that the language model uses a traditional Word2Vec model to train a context, and the training comprises the step of calculating an input in an input layer of the model

Hidden layer information of

Simultaneous addition of vector representations of incident events at the input layer

Will be provided withAnd

obtaining a final hidden layer representation by weighted summation

The two are made to jointly influence the final hidden layer representation, the generated hidden layer representation being not only related to the context but also to the emergency.

As a further technical solution, the word vector of the input layerThe method is characterized by comprising two parts, wherein one part of word vector representation is static word representation generated by utilizing linguistic data at non-burst time, namely normal time, and the other part of word vector representation is dynamic word vector representation with the characteristics of an emergency.

As a further technical solution, the hidden layer represents

The calculation formula of (a) is as follows:

is the word w_jIs used to represent the context word vector of (a),

is and

an emergency representation in the same vector space; wherein

Is composed of a word w_j+mThe static representation and the emergency representation of (1) are composed of two parts.

As a further technical solution, the static representation refers to Word vectors obtained by Word2Vec training using linguistic data of non-emergency events; the emergency representation refers to vector representation of words in the emergency, and is obtained by training and learning as a parameter;

the calculation formula of (a) is as follows:

and

are respectively the word w_j+mStatic representation and representation in an emergency; gamma is a weight value that measures two expressions, and the word w_j+mThe calculation formula is as follows:

and

each represents b_iBurst period and whole time period w_jThe number of tweets present per day is averaged.

As a further technical solution, the vector representation of the emergency event

Is affected by the relevance of other emergencies; the relevance depends on the time at which the emergency event coincides, and the similarity of the words that occur in the emergency event.

As a further technical scheme, the similarity is expressed as co-occurrence information among words in the emergency corpus and is expressed by PMI, the calculation formula is as follows,

#(w₁,w₂) Represents the word w₁And w₂Number of co-occurrences, # (w)₁) And # (w)₂) Respectively represents w₁And w₂The number of times of occurrence in the corpus, | D | represents the number of documents in the corpus.

By adopting the technical scheme, the invention has the following beneficial effects:

the word vector learned by the invention can improve the performance on text classification and emergency summary tasks. In the text classification task, the average of the word vectors is used as a vector representation of the text and classification is performed using an SVM classifier. For the emergency summarizing task, the invention selects the top10 neighbor words of the words as the keywords of the emergency, and the keywords can be seen to well summarize the emergency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a diagram of a prior art CBOW;

FIG. 2 is a block diagram of a prior art LSTM;

FIG. 3 is a functional image of LSTM selected activation cells as a function of tanh function;

FIG. 4 is a functional image that distinguishes whether the target word is from a true distribution or a noise distribution by logistic regression.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present invention will be further explained with reference to specific embodiments.

The method of the invention relates to technologies such as intelligent analysis and language model, and can be used for generating word vectors in text stream data, and the generated word vectors are enabled to have information of emergency events.

The invention provides a Word vector model (BWE) based on an emergency on the basis of a CBOW structure in a Word2Vec model.

The invention provides a Word vector language model based on an emergency, which is characterized in that the language model uses a traditional Word2Vec model to train a context, and the training comprises the step of calculating input in an input layer of the model

Hidden layer information of

Will be provided withAnd

obtaining a final hidden layer representation by weighted summation

As shown in fig. 1, CBOW is composed of three parts, an input layer, a hidden layer, and an output layer. At the input level, the model represents the vector of each word corresponding to the context

Input into the model. In the hidden layer, its input is a context word vector

The output is a vector h which can represent historical information_t. The output layer is of size | V |, and the accepted input is a vector representation h of history information_tThe output is the posterior probability of each word in the vocabulary, and a softmax classifier is used, and the calculation formula is as follows:

y_t＝softmax(Oh_t+b) (1)

wherein the vector y is output_tIs a probability distribution whose k-dimension is the posterior probability of the k-th word appearing in the vocabulary, and O is the direct weight from the hidden layer to the output layer. O is also called the output word embedding matrix, and each row in the matrix can also be regarded as a word vector.

Given history information h_tConditional on the k-th word V in the vocabulary V_kThe posterior probability of occurrence is:

p_θ(v_k|h_t)＝softmax(s(v_k,h_t；θ)) (2)

wherein, s (v)_k,h_t(ii) a Theta) is an unnormalized score calculated by a neural network; θ represents all parameters in the network, including the word vector and weights and biases of the neural network.

The BWE model is based on a CBOW structure, learning word vector representations using incident corpora. At the model input layer, a hidden layer representation is generated by using the context words and the emergency representation together. Each incident may convey some associated semantics that may affect the words in the text associated with the incident, thus adding an incident representation at the input layer. The calculation formula is as follows:

in order to hide the layer representation,

is the word w_jIs used to represent the context word vector of (a),

is and

and (3) emergency representation in the same vector space.

WhereinIs composed of a word w_j+mThe static representation and the emergency representation of (1) are composed of two parts. Static representation refers to Word vectors trained by Word2Vec using non-bursty-time corpora. The emergency representation refers to a vector representation of words in the emergency, which is obtained by training and learning as a parameter.

The calculation formula of (a) is as follows:

and

are respectively the word w_j+mStatic representation of (2) and representation in an emergency. Gamma is a weight value that measures two expressions, and the word w_j+mThe calculation formula is as follows:

and

each represents b_iBurst period and whole time period w_jThe number of tweets present per day is averaged. It can be seen that

Above 1, γ is greater than 0.5. For a given context word, the present invention uses both static and bursty word vectors to represent its semantics. A burst indication should have more weight if its word frequency is significantly larger than a normal word.

As shown in FIG. 2, in formula (1), the emergency event is represented

Is to generate an output representation of input X using the LSTM structure. LSTM is used to learn the emergency representation since the current emergency may be related to other previous emergencies. Input X Each line X_tAn emergency representation is represented that is a vector representation derived from the word's PMI information, and the incoming emergency representations are ordered by time of occurrence. The formula for LSTM is as follows:

h_t＝o_t·tanh(c_t) (10)

i_t、o_t、f_t、c_t、h_thidden layer information respectively showing input gate information, output gate information, forgetting gate information, cell state information and history, forgetting gate f_tControlling the internal state c of the last moment_t-1How much information needs to be forgotten, input gate i_tAn output gate O for controlling the information of the candidate state at the current time_tControlling the internal state c at the present moment_tHow much information needs to be output to the external state h_t。W_i、W_o、W_f、W_c、U_i、U_o、U_fAnd U_cInputs x representing input gate, output gate, forgetting gate, and cell status, respectively_tThe weight and the input

Weight of (a), b_i、b_o、b_fAnd b_cIndicating the corresponding offset for each control gate. In addition, the LSTM selection activation unit is a tanh function that maps a real input to [ -1,1]Within the range, the curve is shown in FIG. 3.

In the training process, the invention firstly learns the vector representation of the emergency event by using the LSTM

Will be provided with

Initialization is a Word vector learned using the standard Word2Vec model over all corpora. To learn burst perception embedding

The invention optimizes the parameters according to the time sequence of the emergency. In this way, the emergency vector may be used as a fixed constant to learn the word vector representation that occurs in the current emergency. For an emergency, a single tweet is sampled first, and then the parameters of random gradient descent are optimized using negative sampling.

When training is carried out by using a negative sampling method, each positive case is subjected to

Using randomly sampled k negative examples of noise distributionThe objective function of the negative sampling is

The objective function of the negative sampling method is also a two-class classification problem, which distinguishes whether the target word comes from a true distribution or a noise distribution by logistic regression, and the function image is shown in fig. 4.

Vector representation of emergency events in the present invention

Is affected by the relevance of other emergencies; the relevance depends on the time at which the emergency event coincides, and the similarity of the words that occur in the emergency event. The similarity is expressed as co-occurrence information among words in the emergency corpus and is expressed by PMI, and the calculation formula is as follows,

In addition, the word vector learned by the method can improve the performance on text classification and emergency summary tasks. In the text classification task, the average of the word vectors is used as a vector representation of the text and classification is performed using an SVM classifier. For the emergency summarizing task, the invention selects the top10 neighbor words of the words as the keywords of the emergency, and the keywords can be seen to well summarize the emergency.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An emergency-based Word vector language model, wherein the language model uses a traditional Word2Vec model to train a context, and the training comprises calculating an input at an input layer of the model

Hidden layer information of

Will be provided with

And

obtaining a final hidden layer representation by weighted summation

2. The emergency-based word vector language model of claim 1, wherein the word vectors of the input layerThe method is characterized by comprising two parts, wherein one part of word vector representation is static word representation generated by utilizing linguistic data at non-burst time, namely normal time, and the other part of word vector representation is dynamic word vector representation with the characteristics of an emergency.

3. The emergency-based word vector language model of claim 1, wherein the hidden layer representation

The calculation formula of (a) is as follows:

is the word w_jIs used to represent the context word vector of (a),is and

an emergency representation in the same vector space; wherein

4. The emergency-based Word vector language model of claim 3, wherein the static representation is a Word vector obtained by Word2Vec training using a corpus of non-emergency events; the emergency representation refers to vector representation of words in the emergency, and is obtained by training and learning as a parameter;

the calculation formula of (a) is as follows:

and

andeach represents b_iBurst period and whole time period w_jThe number of tweets present per day is averaged.

5. The emergency-based word vector language model of claim 1, wherein the vector representation of the emergency is

6. The emergency-based word vector language model according to claim 5, wherein the similarity is expressed as co-occurrence information between words in the emergency corpus, and is expressed by PMI, and the calculation formula is as follows: