CN107273355A

CN107273355A - A kind of Chinese word vector generation method based on words joint training

Info

Publication number: CN107273355A
Application number: CN201710435279.XA
Authority: CN
Inventors: 张宪超; 刘世柯; 梁文新; 刘馨月
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2017-06-12
Filing date: 2017-06-12
Publication date: 2017-10-20
Anticipated expiration: 2037-06-12
Also published as: CN107273355B

Abstract

The invention discloses a kind of Chinese word vector generation method based on words joint training, belong to natural language processing technique field.Using the Chinese character information in word as key character, with reference to cliction and Chinese character up and down, the term vector of joint training Chinese is represented.On the basis of the word-based term vector model of itself, we are by introducing the composition Chinese character information of word in itself, while based on cliction prediction target word up and down, based on Word-predictor target word up and down.The word-based model words joint training model of itself is applied respectively, compares the validity and robustness of the training term vector of two models, it is found that the Chinese term vector of words joint training model generation more meets the Chinese feature of semanteme, while robustness is also more preferable.The invention provides a kind of new method of Chinese term vector generation, generation and application work for Chinese term vector provide a kind of new solution.

Description

A kind of Chinese word vector generation method based on words joint training

Technical field

The invention belongs to natural language processing technique field, it is related to a kind of Chinese term vector based on words joint training Generation method.

Background technology

In recent years, the word that natural language processing technique has been applied in the various aspects of ours at one's side, text represents research It is most basic research in natural language processing field.Meanwhile, word represents it is one kind that data are represented, and data are denoted as machine Learn mid-early stage preparation work, its quality has a great impact to the performance of machine learning model.For Chinese natural language Process field problems faced, it is intended that computer can be directly automatically from the text data middle school without mark on a large scale Acquistion is to corresponding text representation, while the semantic information in word and text also is intended to embody by this expression Come.The conventional word such as Word2Vec, GloVe incorporation model can not meet the characteristic of speech sounds of Chinese, and for Chinese, performance is more Good, more accurately term vector model awaits researcher and further explored for semantic information capture.

The content of the invention

In place of the purpose of the present invention is mainly for some shortcomings of existing research, propose a kind of based on words joint training Chinese word vector generation method, i.e. ECWE models, Chinese character and external context and Chinese character together obtain height inside models coupling The Chinese word insertion of quality.ECWE, which is combined internal word with outside word by a simple but general method, together to be learnt Chinese term vector.We cause there is more contacts, mould between originally isolated word using internal word and external context word Type passes through to strengthening effective modeling to Chinese character so that between Chinese character and Chinese character is strengthened with the relation between word, simultaneously The contextual information of word is enriched, so that word represents to contain more semantic informations, the effect that word is represented is improved.

Technical scheme：

A kind of Chinese word vector generation method based on words joint training, step is as follows：

(1) Chinese text data processing stage

Word represents the generation of vector, it is necessary to which big corpus supports that corpus can voluntarily be built, and can also pass through fund Purchase, possesses after corpus, and we will carry out word segmentation processing to corpus first.There are many participle instruments to use at present, This step is not as this method right characteristic.

(2) Chinese word represents vectorial generation phase

For Chinese, a word is usually made up of several Chinese characters, and contains abundant inside implication.One word The meaning of a word is also usually relevant with the Chinese character for constituting it.For example, Chinese word " science and technology ", his meaning of a word can be by the literature up and down in language material Acquistion is arrived, while we can see that coming, his meaning of a word can be inferred by the Chinese character " section " and " skill " for constituting him to be obtained, therefore I Obtain an idea, Chinese word incorporation model is improved using Chinese character information, the word that learns Chinese represent vector.

In the starting stage, we generate the vector representation w of word, Chinese character at random, and c, dimension size is 100, each dimension Value is the random decimal between one 0 to 1.

2.1) based on cliction prediction target word up and down

For giving sentence D={ x₁,…,x_M, M represents sentence length, x_jJ-th of word in sentence is represented, passes through one The cliction up and down of (window size is K) predicts target word in individual stationary window, it is contemplated that Chinese characteristic, the steps characteristic exists In, using term vector and composition word internal word vector vector add and be averaging the vector table as target word w cliction up and down Show；It is characterised by, for each Chinese character, different according to position, he can there are three different vector representation (c^B,c^M,c^E), The beginning that they are located among word, middle and ending are represented respectively.The vector representation formula of cliction is as follows up and down：

Where j=w-K ... w-1, w+1 ... w+K

Wherein, w_jRepresent x_jTerm vector itself, N_jRepresent x_jIn Chinese character number, c_kRepresent word x_jIn k-th Chinese character Vector representation；

By above formula we obtain above and below cliction vector representation x_w, thus predict target word x_i, its target is most Conditional probability function of the bigization target word on cliction up and down：

Wherein M represents the length of sentence, and K represents window size.

2.2) based on Word-predictor target word up and down

For sentence D={ x₁,…,x_M, the sentence is traveled through first, tables look-up and the Chinese character in each word is mapped to vector, remove Go target word；Target word is predicted by the cliction up and down in a stationary window, it is contemplated that Chinese characteristic, the steps characteristic exists The vector representation of internal word adds the vector representation as word up and down with average value in the cliction using above and below；It is characterised by, for Each Chinese character, different according to position, he can have three different vector representation (c^B,c^M,c^E), represent that they are located at respectively Beginning among word, middle and ending.The vector representation formula of word is as follows up and down：

Where j=w-K ... w-1, w+1 ... w+K

By above formula obtain above and below word vector representation c_w, thus predict target word x_i, its target is to maximize target word In literal conditional probability function up and down：

Wherein M represents the length of sentence, and K represents window size.

2.3) it is based on words associated prediction target word

We have obtained predicting the object function of target word based on word and word in above-mentioned steps, in the step, for Sentence D={ x₁,…,x_M, it is characterised by, word up and down will be based on to predict that the object function of target word is same based on cliction up and down To predict that the object function of target word combines, joint training word and word；It is exactly the condition in optimization context to target word While probability, conditional probability of the Chinese character of each in cliction to target word above and below optimization：

Wherein, M represents the length of sentence, and W represents word dictionary, and w represents target word, i.e., x above_i, Context (w) w context words, i.e., x above are represented_w, Circum (w) represent w context in Chinese character, i.e., c above_w, β is the decimal between one 0 to 1, represents the ratio modeled based on Chinese character；

2.4) iteration updates

In order to reduce computation complexity, the steps characteristic is, optimizes calculating by the negative method of sampling, specifically It is come design conditions probability with following formula：

NEG (w) represents negative sampling set in above formula, and negative sample size is set to 5, L^w(u) be one sampling u label, when u is During target word w, L^w(u)=1, otherwise L^w(u)=0, x_wIt is the vector representation of above and below target word w clictions, c_wIt is above and below target word w The vector representation of word, θ^uIt is the vector representation of parameter；

Object function is finally solved using stochastic gradient descent algorithm, specifically more new-standard cement is：

After model repetitive exercise terminates, parameter term vector represents to collect the Chinese word vector representation that w is exactly our model generations.

The beneficial effects of the present invention are disclose a kind of Chinese term vector generation side based on words joint training Method, using the Chinese character information in word as key character, with reference to cliction and Chinese character up and down, the term vector of joint training Chinese is represented. On the basis of the word-based term vector model of itself, we are by introducing the composition Chinese character information of word in itself, based on above and below While cliction predicts target word, based on Word-predictor target word up and down.By the word-based model words joint training mould of itself Type is applied respectively, is compared the validity and robustness of the training term vector of two models, is found the generation of words joint training model Chinese term vector more meet the Chinese feature of semanteme, while robustness is also more preferable.The invention provides the generation of Chinese term vector A kind of new method, generation and application work for Chinese term vector provide a kind of new solution.

Brief description of the drawings

Fig. 1 is the major architectural figure of the inventive method.

Fig. 2 is evaluation result of the inventive method in semantic similarity task, and ECWE is model of the present invention abbreviation, thus Figure can determine that the Chinese term vector that the present invention is generated contains more accurately semantic information.

Fig. 3 is evaluation result of the inventive method in analogism task, during thus figure can determine that the present invention is generated Cliction vector contains more accurately semantic information.

Fig. 4 is evaluation result of the inventive method on text categorization task, during thus figure can determine that the present invention is generated Cliction vector is more suitable for Chinese from speech language processing tasks.

Fig. 5 is evaluation result of the inventive method in different language material sizes, compares the present invention and has more robustness.

Fig. 6 is evaluation result of the inventive method in different Chinese character modeling ratio, compares the present invention and has more robustness.

Embodiment

In order that the object, technical solutions and advantages of the present invention are clearer, below by the specific embodiment party of the present invention Formula is described in further detail.

The invention provides a kind of Chinese word vector generation method based on words joint training, this method includes：

(1) Chinese text data processing stage

Word represents the generation of vector, it is necessary to which big corpus supports that corpus can voluntarily be built, and can also pass through fund Purchase, we are by taking wikipedia Chinese data collection as an example here.

1.1) wikipedia Chinese data collection is chosen as training corpus, and wikipedia Chinese data collection Covering domain is wide, This language material has 1.82 hundred million Chinese words, and word dictionary size is 45.7 ten thousand, and Chinese character dictionary size is 9000.

Enter pretreatment to wikipedia Chinese data collection, the Chinese data of wikipedia be it is complicated and simple mix, the inside is included A variety of different data such as continent is simplified, TaiWan, China traditional font, Hongkong and Macro's traditional font.Sometimes between the different paragraphs of an article Different complicated and simple words can be used.The complex form of Chinese characters in language material is converted into simplified Chinese character we used open source projects opencc.Institute With the complex form of Chinese characters to be removed, Normalization is allowed for, simplified and traditional font exists simultaneously, for same word, conflict can be caused.

1.2) possess after corpus, we will carry out word segmentation processing to corpus.Participle has many methods, and we introduce one Plant the Chinese word cutting method marked based on word.

Based on word mark Chinese word cutting method basic assumption be a word internal text high cohesion, and word boundary with Outside word lower coupling.Grammatical term for the character circle is learnt by statistical machine learning method, BMES marks are performed using sequence labelling model. For monosyllabic word, its label is S；For multi-character words, first Chinese character label in word is B, and last Chinese character label is E, The label of middle word is M.After being labeled to each word of training data, using a kind of 3 layers of neural network structure to each word It is trained, for the labeling task of each word in sentence, chooses in current word and contextual window, common win Word is used as feature.Wherein it is (win-1)/2 word above and below.The urtext of win word is converted into its word first Vector representation e (w), and win word is connected into a win* | e | the vector x of dimension, the vector is the input layer of neutral net, Hidden layer h design is consistent with common feedforward neural network, each node of input layer and hidden layer | h | between individual node There is side connection two-by-two.Hidden layer is used as activation primitive from tanh functions.

Assuming that the training corpus before participle is：" on April -7 on the 6th, under the working closely of the multiple departments in school district, development zone School district smoothly completes work of unpaid blood donation in 2017.The people of teaching and administrative staff 5, the people of postgraduate 9, the people of undergraduate 463 that early stage registration is donated blood, Final successfully donate blood 420 people, the wherein people of teaching and administrative staff 4, the people of postgraduate 6, the people of undergraduate 410." be changed into after word segmentation processing：" April - 7 days on the 6th, under the working closely of the multiple departments in school district, development zone school district smoothly completed work of unpaid blood donation in 2017.Early stage Register the people of teaching and administrative staff 5, the people of postgraduate 9, undergraduate, 463 people donated blood, and finally successfully donate blood 420 people, wherein the people of teaching and administrative staff 4, research Raw 6 people, the people of undergraduate 410.”

1.3) it is last we to remove stop words (" ", " " etc.) and punctuation mark etc..

(2) Chinese word represents vectorial generation phase

For Chinese, a word is usually made up of several Chinese characters, and contains abundant inside implication.One word The meaning of a word is also usually relevant with the Chinese character for constituting it.For example, Chinese word " science and technology ", his meaning of a word can be by the literature up and down in language material Acquistion is arrived, while we can see that coming, his meaning of a word can be inferred by the Chinese character " section " and " skill " for constituting him to be obtained, therefore I Obtain an idea, Chinese word incorporation model is improved using Chinese character information, the word that learns Chinese represent vector.Fig. 1 is us The block schematic illustration of model.Word, which is embedded in (white grey square frame in figure) and word and is embedded in (white box) and combines, obtains one newly Vectorial (grey square frame), these new vectors add and obtain predicting the vector (left side Dark grey square frame) of target word.Meanwhile, word is embedding Enter also plus and obtain a new vector (the right Dark grey square frame) to predict target word.

Starting stage, we travel through language material, and vocabulary is added in a vocabulary, while by vocabulary according to word frequency size Sequence, the vocabulary for word frequency less than 5, we are deleted.Then the vector representation of random generation word, Chinese character and parameter (dimension size is typically set to 100) w, c and θ.Next, we design an object function, pass through stochastic gradient descent algorithm Iteration optimization parameters.

2.1) based on cliction prediction target word up and down

Where j=w-K ... w-1, w+1 ... w+K

Wherein M represents the length of sentence, and K represents window size.

2.2) based on Word-predictor target word up and down

Our invention not only will predict target word based on cliction up and down, while will be based on Word-predictor target up and down Word.Similarly for sentence D={ x₁,…,x_M, the sentence is traveled through first, tables look-up and the Chinese character in each word is mapped to vector, remove Go target word；Target word is predicted by the cliction up and down in a stationary window, it is contemplated that Chinese characteristic, the steps characteristic exists The vector representation of internal word adds the vector representation as word up and down with average value in the cliction using above and below；It is characterised by, for Each Chinese character, different according to position, he can have three different vector representation (c^B,c^M,c^E), represent that they are located at respectively Beginning among word, middle and ending.The vector representation formula of word is as follows up and down：

Where j=w-K ... w-1, w+1 ... w+K

Wherein M represents the length of sentence, and K represents window size.

2.3) it is based on words associated prediction target word

Based on up and down cliction prediction target word in, the meaning of word is added directly into the meaning of a word by we, can so cause for Word containing same word, model tends to obtain similar term vector, thus we add above and below text information weaken inside Negative effect of the word to the meaning of a word, in order that word contains more rich semantic information, we introduce the word information of external context. We by this method, word are put into the language of word by the use of the expression being distributed as this word of each word in word context In adopted space, more effectively word is modeled.For sentence D={ x₁,…,x_M, in the step, it is characterised by, context will be based on Word predicts the object function of target word with predicting that the object function of target word combines based on word up and down, joint training Word and word；Be exactly while context is optimized to the conditional probability of target word, above and below optimization in cliction each Chinese character to target The conditional probability of word：

2.4) iteration updates

It is characterised by, the vector table of word and word is shown with different expression formulas, can so obtain the vector representation of more preferable word, Model is further promoted to obtain more effective word insertion.

(3) experimental result

By Semantic Similarity Measurement, analogism is evaluated the characteristic of speech sounds of term vector, the knot in Fig. 2 and Fig. 3 Fruit shows the term vector of (ECWE models) of the invention generation is all better than what other models to be showed in which task.It is logical The result crossed in performance scores of the text categorization task to term vector, Fig. 4 shows that the term vector for generating the present invention is used as Chinese Feature in natural language processing task, can obtain more preferable result.Fig. 5 is gradually increases training corpus, and model is in semantic phase Like the performance situation on degree, it can be observed how, the present invention still has preferable performance when language material is less, because logical Cross and introduce external context Chinese character, expanded the contextual information of word so that model in the case of smaller training corpus, Also it can guarantee that word is effectively trained, while when language material size changes from small to large, the performance of ECWE models is better than always Other models, and quickly reach a good performance.Fig. 6 is gradually increase Chinese character modeling ratio, and model is similar in semanteme Performance situation on degree, it can be observed how, the present invention has more preferable performance in varied situations.This illustrates that the present invention is strictly One performance more preferably, semantic information capture more accurately Chinese word vector model, while more for stability, in each task Assessment performance it can also be seen that the Chinese word vector generation method proposed by the present invention based on words joint training it is feasible Property.

The technical principle for being the specific embodiment of the present invention and being used above, if conception under this invention institute The change of work, during the spirit that function produced by it is still covered without departing from specification and accompanying drawing, should belong to the present invention's Protection domain.

Claims

1. a kind of Chinese word vector generation method based on words joint training, it is characterised in that believe the Chinese character in Chinese word Breath is as key character, and with reference to cliction and Chinese character joint training Chinese word vector representation up and down, step is as follows：

(1) Chinese text data processing stage

Word represents that the generation of vector is based on corpus, carries out word segmentation processing to corpus first；

(2) Chinese word represents vectorial generation phase

For Chinese, a word is made up of several Chinese characters, and the meaning of a word is relevant with the Chinese character for constituting it；This method is believed using Chinese character Cease to improve Chinese word incorporation model, the word that learns Chinese represents vector；

In the starting stage, the vector representation w of word, Chinese character is generated at random, c, dimension size is 100, and each dimension values are one 0 Random decimal between to 1；

2.1) based on cliction prediction target word up and down

For giving sentence D={ x₁,…,x_M, M represents sentence length, x_jJ-th of word in sentence is represented, it is solid by one Determine the cliction up and down in window to predict target word, window size is K, it is contemplated that Chinese characteristic, by term vector and composition word The vector of internal word vector adds and is averaging the vector representation of the cliction up and down as target word w；For each Chinese character, according to Position is different, can all there is three different vector representation (c^B,c^M,c^E), represent respectively they be located at word among beginning, in Between and end up；The vector representation formula of cliction is as follows up and down：

<mrow> <msub> <mi>x</mi> <mi>w</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>K</mi> </mrow> </mfrac> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>+</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>j</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>j</mi> </msub> </munderover> <mo>(</mo> <mrow> <msubsup> <mi>c</mi> <mn>1</mn> <mi>B</mi> </msubsup> <mo>+</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> <msub> <mi>N</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> </munderover> <msubsup> <mi>c</mi> <mi>k</mi> <mi>M</mi> </msubsup> <mo>+</mo> <msubsup> <mi>c</mi> <msub> <mi>N</mi> <mi>j</mi> </msub> <mi>E</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>

Where j=w-K ... w-1, w+1 ... w+K

Wherein, w_jRepresent x_jTerm vector itself, N_jRepresent x_jIn Chinese character number, c_kRepresent word x_jIn k-th of Chinese character vector Represent；

By above formula obtain above and below cliction vector representation x_w, thus predict target word x_i, its target is to maximize target Conditional probability function of the word on cliction up and down：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mi>K</mi> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mi>K</mi> </mrow> </munderover> <mi>log</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>w</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein M represents sentence length, and K represents window size；

2.2) based on Word-predictor target word up and down

For sentence D={ x₁,…,x_M, the sentence is traveled through first, tables look-up and the Chinese character in each word is mapped to vector, remove mesh Mark word；Target word is predicted by the cliction up and down in a stationary window, the vector representation of internal word in cliction up and down is added With vector representation of the average value as word up and down；It is different according to position for each Chinese character, can all have three it is different to Amount represents (c^B,c^M,c^E), represent that they are located at starting among word, middle and ending respectively；The vector representation of word up and down Formula is as follows：

<mrow> <msub> <mi>c</mi> <mi>w</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>K</mi> </mrow> </mfrac> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>j</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>j</mi> </msub> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>c</mi> <mn>1</mn> <mi>B</mi> </msubsup> <mo>+</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> <msub> <mi>N</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> </munderover> <msubsup> <mi>c</mi> <mi>k</mi> <mi>M</mi> </msubsup> <mo>+</mo> <msubsup> <mi>c</mi> <msub> <mi>N</mi> <mi>j</mi> </msub> <mi>E</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Where j=w-K ... w-1, w+1 ... w+K

By above formula obtain above and below word vector representation c_w, thus predict target word x_i, its target is to maximize target word upper Under literal conditional probability function：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mi>K</mi> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mi>K</mi> </mrow> </munderover> <mi>log</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>c</mi> <mi>w</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, M represents sentence length, and K represents window size；

2.3) it is based on words associated prediction target word

For sentence D={ x₁,…,x_M, it will be risen based on cliction up and down to predict that the object function of target word is same based on word up and down To predict that the object function of target word combines, joint training word and word；In optimization context to the conditional probability of target word Meanwhile, conditional probability of the Chinese character of each in cliction to target word above and below optimization：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <mi>W</mi> </mrow> </munder> <mo>&lsqb;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> <mi>log</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> <mo>(</mo> <mi>w</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <mi>&beta;</mi> <mi>log</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>C</mi> <mi>i</mi> <mi>r</mi> <mi>c</mi> <mi>u</mi> <mi>m</mi> <mo>(</mo> <mi>w</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

Wherein, M represents sentence length, and W represents word dictionary, and w represents target word, i.e., x above_i, Context (w) expressions w Context words, i.e., x above_w, Circum (w) represent w context in Chinese character, i.e., c above_w, β is one 0 Decimal between to 1, represents the ratio modeled based on Chinese character；

2.4) iteration updates

Calculating, design conditions probability are optimized by the negative method of sampling：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> <mo>(</mo> <mi>w</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Pi;</mo> <mrow> <mi>u</mi> <mo>&Element;</mo> <mo>{</mo> <mi>w</mi> <mo>}</mo> <mo>&cup;</mo> <mi>N</mi> <mi>E</mi> <mi>G</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow> </munder> <msup> <mrow> <mo>&lsqb;</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msup> <mi>&theta;</mi> <mi>u</mi> </msup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <msup> <mi>L</mi> <mi>w</mi> </msup> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>&CenterDot;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msup> <mi>&theta;</mi> <mi>u</mi> </msup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <mn>1</mn> <mo>-</mo> <msup> <mi>L</mi> <mi>w</mi> </msup> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow>

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>C</mi> <mi>i</mi> <mi>r</mi> <mi>c</mi> <mi>u</mi> <mi>m</mi> <mo>(</mo> <mi>w</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Pi;</mo> <mrow> <mi>u</mi> <mo>&Element;</mo> <mo>{</mo> <mi>w</mi> <mo>}</mo> <mo>&cup;</mo> <mi>N</mi> <mi>E</mi> <mi>G</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow> </munder> <msup> <mrow> <mo>&lsqb;</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>c</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msup> <mi>&theta;</mi> <mi>u</mi> </msup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <msup> <mi>L</mi> <mi>w</mi> </msup> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>&CenterDot;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>c</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msup> <mi>&theta;</mi> <mi>u</mi> </msup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <mn>1</mn> <mo>-</mo> <msup> <mi>L</mi> <mi>w</mi> </msup> <mrow> <mo>(</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow>

In above formula, NEG (w) represents negative sampling set, and negative sample size is set to 5, L^w(u) be one sampling u label, when u is target During word w, L^w(u)=1, otherwise L^w(u)=0, x_wIt is the vector representation of above and below target word w clictions, c_wIt is above and below target word w words Vector representation, θ^uIt is the vector representation of parameter；

<mrow> <mi>v</mi> <mrow> <mo>(</mo> <mover> <mi>w</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> <mo>:</mo> <mo>=</mo> <mi>v</mi> <mrow> <mo>(</mo> <mover> <mi>w</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mrow> <mi>u</mi> <mo>&Element;</mo> <mo>{</mo> <mi>w</mi> <mo>}</mo> <mo>&cup;</mo> <mi>N</mi> <mi>E</mi> <mi>G</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow> </munder> <mfrac> <mrow> <mo>&part;</mo> <mi>L</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mo>&part;</mo> <msub> <mi>x</mi> <mi>w</mi> </msub> </mrow> </mfrac> <mo>,</mo> <mover> <mi>w</mi> <mo>~</mo> </mover> <mo>&Element;</mo> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>v</mi> <mrow> <mo>(</mo> <mover> <mi>c</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> <mo>:</mo> <mo>=</mo> <mi>v</mi> <mrow> <mo>(</mo> <mover> <mi>c</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> <mo>+</mo> <mi>&eta;</mi> <munder> <mo>&Sigma;</mo> <mrow> <mi>u</mi> <mo>&Element;</mo> <mo>{</mo> <mi>w</mi> <mo>}</mo> <mo>&cup;</mo> <mi>N</mi> <mi>E</mi> <mi>G</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow> </munder> <mfrac> <mrow> <mo>&part;</mo> <mi>L</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>u</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mo>&part;</mo> <msub> <mi>c</mi> <mi>w</mi> </msub> </mrow> </mfrac> <mo>,</mo> <mover> <mi>c</mi> <mo>~</mo> </mover> <mo>&Element;</mo> <mi>C</mi> <mi>i</mi> <mi>r</mi> <mi>c</mi> <mi>u</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow>