CN106897265A - Term vector training method and device - Google Patents

Term vector training method and device Download PDF

Info

Publication number
CN106897265A
CN106897265A CN201710022458.0A CN201710022458A CN106897265A CN 106897265 A CN106897265 A CN 106897265A CN 201710022458 A CN201710022458 A CN 201710022458A CN 106897265 A CN106897265 A CN 106897265A
Authority
CN
China
Prior art keywords
vocabulary
lexicon
term vector
huffman
old
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710022458.0A
Other languages
Chinese (zh)
Other versions
CN106897265B (en
Inventor
李建欣
刘垚鹏
彭浩
张日崇
陈汉腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710022458.0A priority Critical patent/CN106897265B/en
Publication of CN106897265A publication Critical patent/CN106897265A/en
Application granted granted Critical
Publication of CN106897265B publication Critical patent/CN106897265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Abstract

The present invention provides a kind of term vector training method and device, belongs to machine learning techniques field.The term vector training method includes:Newly-increased lexicon is obtained, the vocabulary increased newly in lexicon constitutes new term storehouse with the vocabulary in old lexicon, the corresponding term vector of haveing been friends in the past of the vocabulary in old lexicon;Initialization process is carried out to the vocabulary in new term storehouse so that the term vector for belonging to the vocabulary in old lexicon in new term storehouse is old term vector, it is random term vector that the vocabulary term vector in newly-increased lexicon is belonged in new term storehouse;Term vector according to corresponding first Huffman tree in new term storehouse and corresponding second Huffman tree of old lexicon respectively to vocabulary in new term storehouse is updated.Term vector training method and device that the present invention is provided, improve the training effectiveness of term vector.

Description

Term vector training method and device
Technical field
The present invention relates to machine learning techniques field, more particularly to a kind of term vector training method and device.
Background technology
In machine learning techniques, in order that machine understands the implication of human language, the vocabulary of neutral net language model Show that each vocabulary in human language is converted into instrument the form of term vector so that computer can be learnt by term vector The implication of each vocabulary in human language.
Using prior art, after new vocabulary is added in lexicon, it usually needs in relearning new lexicon All of vocabulary, the term vector new to obtain each vocabulary.But, using which so that the training effectiveness of term vector is relatively low.
The content of the invention
The present invention provides a kind of term vector training method and device, improves the training effectiveness of term vector.
The embodiment of the present invention provides a kind of term vector training method, including:
Newly-increased lexicon is obtained, the vocabulary in the newly-increased lexicon constitutes new term storehouse with the vocabulary in old lexicon, The corresponding term vector of haveing been friends in the past of vocabulary in the old lexicon;
Initialization process is carried out to the vocabulary in the new term storehouse so that belong to the old vocabulary in the new term storehouse The term vector of the vocabulary in storehouse is old term vector, and the vocabulary term vector belonged in the new term storehouse in the newly-increased lexicon is Random term vector;
According to corresponding first Huffman tree in the new term storehouse and corresponding second Huffman tree point of the old lexicon The other term vector to vocabulary in the new term storehouse is updated.
In an embodiment of the present invention, it is described according to corresponding first Huffman tree in the new term storehouse and the old vocabulary Term vector of corresponding second Huffman tree in storehouse respectively to vocabulary in the new term storehouse is updated, including:
The corresponding goal-selling function of first vocabulary is obtained, first vocabulary is the word in the new term storehouse Converge;
According to first vocabulary in the attribute of first Huffman tree and the attribute pair in second Huffman tree The goal-selling function carries out gradient treatment, obtains the corresponding term vector of first vocabulary.
In an embodiment of the present invention, the corresponding goal-selling function of acquisition first vocabulary, including:
If first vocabulary belongs to the old lexicon, the primal objective function according to Skip-gram models is to institute Stating the first vocabulary carries out factorization, obtains the corresponding goal-selling function of first vocabulary;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is institute State the primal objective function of Skip-gram models.
In an embodiment of the present invention, the corresponding goal-selling function of acquisition first vocabulary, including:
If first vocabulary belongs to the old lexicon, the primal objective function according to CBOW models is to described first Vocabulary carries out factorization, obtains the corresponding goal-selling function of first vocabulary;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is institute State the primal objective function of CBOW models.
In an embodiment of the present invention, the primal objective function according to Skip-gram models is carried out to first vocabulary Factorization, obtains the corresponding goal-selling function of first vocabulary, including:
If first vocabulary belongs to the old lexicon, basis Factorization is carried out to first vocabulary, the corresponding goal-selling function of first vocabulary is obtained;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is Skip- The primal objective function of gram models
Wherein, w represents first vocabulary, and W represents the old lexicon, and Δ W represents the newly-increased lexicon, C (w) tables Show the lexicon that the corresponding vocabulary of w contexts is constituted, u represents the corresponding vocabulary of w contexts,N omicronn-leaf child node w is represented Two Huffman trees and the length of the Huffman encoding matched on the first Huffman tree, i represent that first vocabulary is described the I-th node on two Huffman trees, j represents that first vocabulary is j-th node on second Huffman tree, The term vector of -1 node of jth on the corresponding first Huffman paths of u is represented,Represent on the second Huffman path that u is represented J-th Huffman encoding of node,Activation primitive is represented, v (w) represents the corresponding term vectors of w.
In an embodiment of the present invention, the primal objective function according to CBOW models first vocabulary is carried out because Formula is decomposed, and obtains the corresponding goal-selling function of first vocabulary, including:
If first vocabulary belongs to the old lexicon, basis Factorization is carried out to first vocabulary, the corresponding goal-selling function of first vocabulary is obtained;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is The primal objective function of CBOW models
Wherein,J-th Huffman encoding of node on the second Huffman path that w is represented is represented,In expression C (w) The corresponding term vector sum of all vocabulary.
In an embodiment of the present invention, it is described according to first vocabulary in the attribute of first Huffman tree and in institute The attribute for stating the second Huffman tree carries out gradient treatment to the goal-selling function, obtain the corresponding word of first vocabulary to Amount, including:
If first vocabulary belongs to the old lexicon, and first vocabulary in the coding of first Huffman tree There is same prefix part with the coding in second Huffman tree, then to first vocabulary in second Huffman tree On Huffman encoding different piece corresponding node vectorial basis Perform stochastic gradient rising treatment;To the different piece of Huffman encoding of first vocabulary on first Huffman tree The vectorial basis of correspondence node on second Huffman treePerform with The decline of machine gradient is processed;
If first vocabulary belongs to the newly-increased lexicon, to first vocabulary according toStochastic gradient rising treatment is performed, first vocabulary is obtained corresponding Term vector;
Wherein, η ' represents learning rate.
In an embodiment of the present invention, it is described according to first vocabulary in the attribute of first Huffman tree and in institute The attribute for stating the second Huffman tree carries out gradient treatment to the goal-selling function, obtain the corresponding word of first vocabulary to Amount, including:
If first vocabulary belongs to the old lexicon, and first vocabulary in the coding of first Huffman tree There is same prefix part with the coding in second Huffman tree, then to first vocabulary in second Huffman tree On Huffman encoding different piece corresponding node vectorial basisPerform Stochastic gradient rising is processed;To the different piece correspondence of Huffman encoding of first vocabulary on first Huffman tree The vectorial basis of node on second Huffman treePerform stochastic gradient Decline is processed;
If first vocabulary belongs to the newly-increased lexicon, to first vocabulary according toPerform stochastic gradient rising treatment, obtain the corresponding word of first vocabulary to Amount;
Wherein,Represent the i-th -1 term vector of node on the corresponding first Huffman paths of w.
The embodiment of the present invention also provides a kind of term vector trainer, including:
Acquisition module, for obtaining newly-increased lexicon, the vocabulary in vocabulary in the newly-increased lexicon and old lexicon New term storehouse is constituted, the corresponding term vector of haveing been friends in the past of the vocabulary in the old lexicon;
Initialization module, for carrying out initialization process to the vocabulary in the new term storehouse so that the new term storehouse In to belong to the term vector of vocabulary in the old lexicon be old term vector, the newly-increased lexicon is belonged in the new term storehouse In vocabulary term vector be random term vector;
Update module, for according to corresponding first Huffman tree in the new term storehouse and the old lexicon corresponding Term vector of two Huffman trees respectively to vocabulary in the new term storehouse is updated.
In an embodiment of the present invention, the update module, specifically for obtaining the corresponding default mesh of first vocabulary Scalar functions, first vocabulary is the vocabulary in the new term storehouse;According to first vocabulary in first Huffman tree Attribute and gradient treatment is carried out to the goal-selling function in the attribute of second Huffman tree, obtain first word Converge corresponding term vector.
Term vector training method provided in an embodiment of the present invention and device, by obtaining newly-increased lexicon, and to new term Vocabulary in storehouse carries out initialization process so that belong in new term storehouse the term vector of vocabulary in old lexicon for old word to Amount, it is random term vector that the vocabulary term vector in newly-increased lexicon is belonged in new term storehouse;Further according to new term storehouse corresponding The term vector of one Huffman tree and corresponding second Huffman tree of old lexicon respectively to vocabulary in new term storehouse is updated, and carries The training effectiveness of term vector high.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description does one and simply introduces, it should be apparent that, drawings in the following description are this hairs Some bright embodiments, for those of ordinary skill in the art, without having to pay creative labor, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of term vector training method provided in an embodiment of the present invention;
Fig. 2 is that the flow that is updated of term vector of vocabulary in a kind of storehouse to new term provided in an embodiment of the present invention is illustrated Figure;
Fig. 3 is a kind of structural representation of term vector trainer provided in an embodiment of the present invention.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Term " first ", " second ", " the 3rd ", " in description and claims of this specification and above-mentioned accompanying drawing Four " etc. (if present) is for distinguishing similar object, without for describing specific order or precedence.Should manage Solution so data for using can be exchanged in the appropriate case, so as to embodiments of the invention described herein, for example can be with Order in addition to those for illustrating herein or describing is implemented.Additionally, term " comprising " and " having " and they appoint What deforms, it is intended that covering is non-exclusive to be included, for example, contain the process of series of steps or unit, method, system, Product or equipment are not necessarily limited to those steps clearly listed or unit, but may include not list clearly or for These processes, method, product or other intrinsic steps of equipment or unit.
It should be noted that these specific embodiments can be combined with each other below, for same or analogous concept Or process may be repeated no more in certain embodiments.
Fig. 1 is a kind of schematic flow sheet of term vector training method provided in an embodiment of the present invention, the term vector training side Method can be performed by term vector trainer, the term vector trainer can with it is integrated within a processor, it is also possible to be separately provided, Here, the present invention is not particularly limited.Specifically, shown in Figure 1, the term vector training method can include:
S101, the newly-increased lexicon of acquisition.
Wherein, the vocabulary increased newly in lexicon constitutes new term storehouse, the word in old lexicon with the vocabulary in old lexicon The corresponding term vector of haveing been friends in the past of remittance.
In embodiments of the present invention, the vocabulary in old lexicon has been trained to corresponding old term vector, increases lexicon newly Vocabulary do not train corresponding term vector.For example:It is the existing lexicon for having trained term vector, new epexegesis in old lexicon Remittance storehouse includes newly-increased vocabulary, will now train the vocabulary in the old lexicon of term vector to merge into neologisms with newly-increased vocabulary Remittance storehouse.
S102, initialization process is carried out to the vocabulary in new term storehouse so that in belonging to old lexicon in new term storehouse The term vector of vocabulary is old term vector, and it is random term vector that the vocabulary term vector in newly-increased lexicon is belonged in new term storehouse.
Example, in embodiments of the present invention, remember that old lexicon is W, wherein, the vocabulary in old lexicon has been trained and obtained Corresponding term vector is designated as v (w), and it is Δ W to increase lexicon newly, then new term storehouse is W'=W+ Δ W, remembers that old lexicon W is corresponding Second Huffman tree is T, and W' corresponding first Huffman trees in new term storehouse are T'.Then judge the first vocabulary w in new term storehouse, If w is in old lexicon W, it was demonstrated that w trained corresponding term vector in old lexicon, then no longer the word is instructed Practice, but inherit original v (w);If the first vocabulary w in new term storehouse belongs to newly-increased vocabulary, then in newly-increased lexicon The corresponding term vectors of random initializtion w.
For example, by taking the first vocabulary as an example, the first vocabulary is any vocabulary in new term storehouse, and the first vocabulary is in the first Hough Distribution on Man Shu can include two kinds of situations.The first situation:First vocabulary is the leaf node on the first Huffman tree;The Two kinds of situations:First vocabulary is the non-leaf nodes on the first Huffman tree.
The first situation:If the first vocabulary is the leaf node on the first Huffman tree, can be according to equation below 1 pair First vocabulary is initialized:
Wherein, w represents the first vocabulary, and v (w) represents term vectors of the w on the second Huffman tree T;V'(w) represent w the Term vector on one Huffman tree T'.
With reference to formula 1 as can be seen that if the first vocabulary belongs to old lexicon W, now the term vector of the first vocabulary is first Vocabulary corresponding old term vector in old lexicon, if it is newly-increased word that the first vocabulary is not belonging to old lexicon W, i.e. the first vocabulary Converge, at this point it is possible to the term vector that random initializtion, i.e. term vector now the first vocabulary are carried out to the term vector of first vocabulary is Random term vector.
Second situation:If the first vocabulary is the non-leaf nodes on the first Huffman tree, the n omicronn-leaf child node has one Individual parameter vector.For differentiation parameter vector, we set W1Parameter on i-th node on the first Huffman path of correspondence Vector isW2Parameter vector on the first Huffman path of correspondence on i-th node isWork as W1And W2On correspondence tree During same node, haveAssuming that vocabulary w being encoded to " 0010 " and being breathed out first on the second Huffman tree Coding on Fu Man trees is changed into " 00011 ", because the Huffman encoding of the two has same prefix " 00 ", this same prefix " 00 " Vector on corresponding node keeps constant.Here need to set mark L simultaneouslywAnd L'wRepresent the first vocabulary w second respectively The code length of code length and the first vocabulary w on Huffman tree on the first Huffman tree.Then can be according to equation below 2 First vocabulary is initialized:
Wherein,I-th Huffman encoding of node on the first Huffman path that non-leaf nodes w is represented is represented, Represent i-th Huffman encoding of node on the second Huffman path that non-leaf nodes w is represented.At this point it is possible to non-leaf Node w corresponding Huffman encodings on the first Huffman tree are divided into prefix matching part With other nodes N omicronn-leaf child node w is represented in the second Huffman tree and first The length of the Huffman encoding matched on Huffman tree.
With reference to formula 2 as can be seen that if the first vocabulary is the non-leaf nodes on the first Huffman tree, the first vocabulary exists Vector corresponding with the prefix part matched on the second Huffman tree is existing parameter vector on first Huffman tree And the corresponding vector of coded portion is mismatched, it is initialized with null vector.
What deserves to be explained is, in embodiments of the present invention, for the first vocabulary, if the first vocabulary is the first Huffman Leaf node on tree, then using random initializtion;If n omicronn-leaf child node, then null vector is initialized as, specially:Initial word vector can be so set to fall into intervalWherein, m refers to word The length of vector.
After vocabulary in new term storehouse carries out initialization process, it is possible to the vocabulary correspondence in the new term storehouse Term vector be updated.
It is S103, right respectively according to corresponding first Huffman tree in new term storehouse and corresponding second Huffman tree of old lexicon The term vector of vocabulary is updated in new term storehouse.
Term vector training method provided in an embodiment of the present invention, by obtaining newly-increased lexicon, and in new term storehouse Vocabulary carries out initialization process so that the term vector for belonging to the vocabulary in old lexicon in new term storehouse is old term vector, neologisms It is random term vector to belong to the vocabulary term vector in newly-increased lexicon in remittance storehouse;Further according to corresponding first Huffman in new term storehouse Term vector of corresponding with old lexicon the second Huffman tree of tree respectively to vocabulary in new term storehouse is updated, improve word to The training effectiveness of amount.
Optionally, in embodiments of the present invention, S103 is according to corresponding first Huffman tree in new term storehouse and old lexicon Corresponding second Huffman tree the term vector of vocabulary in new term storehouse is updated respectively can by following possible realization, Specific shown in Figure 2, Fig. 2 is that the term vector of vocabulary in a kind of storehouse to new term provided in an embodiment of the present invention is updated Schematic flow sheet.
S201, the corresponding goal-selling function of the first vocabulary of acquisition.
Wherein, the first vocabulary is the vocabulary in new term storehouse.
Optionally, S201 obtains the corresponding goal-selling function of the first vocabulary, can be obtained by following two models:
For the first Skip-gram model, if the first vocabulary belongs to old lexicon, according to Skip-gram moulds The primal objective function of type carries out factorization to the first vocabulary, obtains the corresponding goal-selling function of the first vocabulary;If first Vocabulary belongs to newly-increased lexicon, then the corresponding goal-selling function of the first vocabulary is the primal objective function of Skip-gram models.
Example, in embodiments of the present invention, if the first vocabulary belongs to old lexicon, according to assembler code identical part Factorization is carried out to each word in W with different parts and can be obtained by the corresponding goal-selling function of the first vocabulary, i.e.,: According toFactorization is carried out to the first vocabulary, Obtain the corresponding goal-selling function of the first vocabulary;
If the first vocabulary belongs to newly-increased lexicon, the corresponding goal-selling function of the first vocabulary is Skip-gram models Primal objective function:
Wherein, w represents the first vocabulary, and W represents old lexicon, and Δ W represents newly-increased lexicon, and C (w) represents w contexts pair The lexicon that the vocabulary answered is constituted, u represents the corresponding vocabulary of w contexts,N omicronn-leaf child node w is represented in the second Huffman tree and The length of the Huffman encoding matched on the first Huffman tree, i represents that the first vocabulary is i-th section on the second Huffman tree Point, j represents that the first vocabulary is j-th node on the second Huffman tree,Represent on the corresponding first Huffman paths of u the The j-1 term vector of node,J-th Huffman encoding of node on the second Huffman path that u is represented is represented,Activation primitive is represented, v (w) represents the corresponding term vectors of w,The summation of same prefix coding is represented, Represent that the summation of zero n omicronn-leaf child node is inherited and be initialized as to other vocabulary in new term storehouse.
For second CBOW model, if the first vocabulary belongs to old lexicon, according to the original mesh of CBOW models Scalar functions carry out factorization to the first vocabulary, obtain the corresponding goal-selling function of the first vocabulary;If the first vocabulary belongs to new Increase lexicon, then the corresponding goal-selling function of the first vocabulary is the primal objective function of CBOW models.
Example, in embodiments of the present invention, if the first vocabulary belongs to old lexicon, according to the first vocabulary, then basis Coding identical part and different parts carry out factorization to each word in W, and to can be obtained by the first vocabulary corresponding pre- If object function, i.e.,:
According toFactorization is carried out to the first vocabulary, is obtained To the corresponding goal-selling function of the first vocabulary.
If the first vocabulary belongs to newly-increased lexicon, the corresponding goal-selling function of the first vocabulary is original for CBOW models Object function l (w, i):
Wherein,J-th Huffman encoding of node on the second Huffman path that w is represented is represented,In expression C (w) The corresponding term vector sum of all vocabulary.
What deserves to be explained is, in embodiments of the present invention, by according to coding identical part and different parts in W Each word carry out factorization, the amount of calculation that can be saved during term vector, so as to improve computational efficiency.
After the corresponding goal-selling function of the first vocabulary is got, it is possible to according to the first vocabulary in the first Huffman The attribute of tree and gradient treatment is carried out to goal-selling function in the attribute of the second Huffman tree, so as to obtain the first vocabulary correspondence Term vector.
S202, according to the first vocabulary the attribute of the first Huffman tree and the second Huffman tree attribute to goal-selling Function carries out gradient treatment, obtains the corresponding term vector of the first vocabulary.
Incorporated by reference to step S201, can be by the following two kinds model realization:
For the first Skip-gram model, if the first vocabulary belongs to old lexicon, and the first vocabulary is breathed out first The coding of Fu Man trees has same prefix part with the coding in the second Huffman tree, then to the first vocabulary in the second Huffman tree On Huffman encoding different piece corresponding node vectorial basis Perform stochastic gradient rising treatment;To the different piece of Huffman encoding of first vocabulary on the first Huffman tree correspondence the The vectorial basis of node on two Huffman treesPerform at stochastic gradient descent Reason.
If the first vocabulary belongs to newly-increased lexicon, to the first vocabulary according to Stochastic gradient rising treatment is performed, the corresponding term vector of the first vocabulary is obtained;Wherein, η ' represents learning rate.
For example:Can be expressed as:
For second CBOW model, if the first vocabulary belongs to old lexicon, and the first vocabulary in the first Huffman The coding of tree has same prefix part with the coding in the second Huffman tree, then to the first vocabulary on the second Huffman tree The vectorial basis of the different piece corresponding node of Huffman encodingPerform random Gradient rising is processed;Different piece to Huffman encoding of first vocabulary on the first Huffman tree is corresponding in the second Huffman The vectorial basis of node on treePerform stochastic gradient descent treatment.
If the first vocabulary belongs to newly-increased lexicon, to the first vocabulary according to Stochastic gradient rising treatment is performed, the corresponding term vector of the first vocabulary is obtained.
Wherein,Represent the i-th -1 term vector of node on the corresponding first Huffman paths of w.
For example, can be expressed as:
Wherein, what η ' was represented is learning rate.Example, initial learning rate η is set0=0.025, often process 1000 Word, is once adjusted according to below equation to learning rate:
Wherein, word_count_actual represents current processed word number, and train_words+1 is to prevent denominator It is zero.A threshold value η is introduced simultaneouslymin=10-40, i.e. the minimum η of ηmin, prevent the too small situation of learning rate.In increment During study hair, word number counter needs the word number plus original language material and combines ηminLimitation calculate η.
Fig. 3 is a kind of structural representation of term vector trainer 30 provided in an embodiment of the present invention, and certainly, the present invention is real It is to be illustrated by taking Fig. 3 as an example to apply example, but is not represented present invention is limited only by this.Please join shown in Fig. 3, term vector training Device 30 can include:
Acquisition module 301, for obtaining newly-increased lexicon, increases the vocabulary structure in vocabulary and the old lexicon in lexicon newly Into new term storehouse, the corresponding term vector that has been friends in the past of vocabulary in old lexicon
Initialization module 302, for carrying out initialization process to the vocabulary in new term storehouse so that new term belongs in storehouse The term vector of the vocabulary in old lexicon be old term vector, belong in new term storehouse the vocabulary term vector in newly-increased lexicon be with Machine term vector.
Update module 303, for being breathed out according to corresponding first Huffman tree in new term storehouse and old lexicon corresponding second Term vector of the Fu Man trees respectively to vocabulary in new term storehouse is updated.
Optionally, update module 303, specifically for obtaining the corresponding goal-selling function of the first vocabulary, the first vocabulary is Vocabulary in new term storehouse;According to the first vocabulary the attribute of the first Huffman tree and the second Huffman tree attribute to default Object function carries out gradient treatment, obtains the corresponding term vector of the first vocabulary.
Term vector trainer 30 shown in the embodiment of the present invention, can perform the term vector shown in above method embodiment The corresponding technical scheme of training method, its realization principle and beneficial effect are similar to, and are no longer repeated herein.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey Sequence upon execution, performs the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent Pipe has been described in detail with reference to foregoing embodiments to the present invention, it will be understood by those within the art that:Its according to The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered Row equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme.

Claims (10)

1. a kind of term vector training method, it is characterised in that including:
Newly-increased lexicon is obtained, the vocabulary in the newly-increased lexicon constitutes new term storehouse with the vocabulary in old lexicon, described The corresponding term vector of haveing been friends in the past of vocabulary in old lexicon;
Initialization process is carried out to the vocabulary in the new term storehouse so that in belonging to the old lexicon in the new term storehouse Vocabulary term vector be old term vector, the vocabulary term vector belonged in the new term storehouse in the newly-increased lexicon is random Term vector;
It is right respectively according to corresponding first Huffman tree in the new term storehouse and corresponding second Huffman tree of the old lexicon The term vector of vocabulary is updated in the new term storehouse.
2. method according to claim 1, it is characterised in that described according to corresponding first Huffman in the new term storehouse The term vector of tree and corresponding second Huffman tree of the old lexicon respectively to vocabulary in the new term storehouse is updated, bag Include:
The corresponding goal-selling function of first vocabulary is obtained, first vocabulary is the vocabulary in the new term storehouse;
According to first vocabulary the attribute of first Huffman tree and second Huffman tree attribute to described Goal-selling function carries out gradient treatment, obtains the corresponding term vector of first vocabulary.
3. method according to claim 2, it is characterised in that the corresponding goal-selling letter of acquisition first vocabulary Number, including:
If first vocabulary belongs to the old lexicon, the primal objective function according to Skip-gram models is to described One vocabulary carries out factorization, obtains the corresponding goal-selling function of first vocabulary;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is described The primal objective function of Skip-gram models.
4. method according to claim 2, it is characterised in that the corresponding goal-selling letter of acquisition first vocabulary Number, including:
If first vocabulary belongs to the old lexicon, the primal objective function according to CBOW models is to first vocabulary Factorization is carried out, the corresponding goal-selling function of first vocabulary is obtained;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is described The primal objective function of CBOW models.
5. method according to claim 3, it is characterised in that the primal objective function according to Skip-gram models is to institute Stating the first vocabulary carries out factorization, obtains the corresponding goal-selling function of first vocabulary, including:
If first vocabulary belongs to the old lexicon, basis Factorization is carried out to first vocabulary, the corresponding goal-selling function of first vocabulary is obtained;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is Skip-gram The primal objective function of model
Wherein, w represents first vocabulary, and W represents the old lexicon, and Δ W represents the newly-increased lexicon, and C (w) represents w The lexicon that the corresponding vocabulary of context is constituted, u represents the corresponding vocabulary of w contexts,N omicronn-leaf child node w is represented to be breathed out second Fu Man trees and the length of the Huffman encoding matched on the first Huffman tree, i represent that first vocabulary is breathed out for described second I-th node on Fu Man trees, j represents that first vocabulary is j-th node on second Huffman tree,Represent u The term vector of -1 node of jth on corresponding first Huffman path,Represent j-th on the second Huffman path that u is represented The Huffman encoding of node,Activation primitive is represented, v (w) represents the corresponding term vectors of w.
6. method according to claim 4, it is characterised in that the primal objective function according to CBOW models is to described First vocabulary carries out factorization, obtains the corresponding goal-selling function of first vocabulary, including:
If first vocabulary belongs to the old lexicon, basis Factorization is carried out to first vocabulary, the corresponding goal-selling function of first vocabulary is obtained;
If first vocabulary belongs to the newly-increased lexicon, the corresponding goal-selling function of first vocabulary is CBOW moulds The primal objective function of type
Wherein,J-th Huffman encoding of node on the second Huffman path that w is represented is represented,Own in expression C (w) The corresponding term vector sum of vocabulary.
7. method according to claim 5, it is characterised in that it is described according to first vocabulary in first Huffman The attribute of tree and gradient treatment is carried out to the goal-selling function in the attribute of second Huffman tree, obtain described first The corresponding term vector of vocabulary, including:
If first vocabulary belongs to the old lexicon, and first vocabulary first Huffman tree coding with The coding of second Huffman tree has same prefix part, then to first vocabulary on second Huffman tree The vectorial basis of the different piece corresponding node of Huffman encodingPerform Stochastic gradient rising is processed;To the different piece correspondence of Huffman encoding of first vocabulary on first Huffman tree The vectorial basis of node on second Huffman treePerform boarding steps Degree decline treatment;
If first vocabulary belongs to the newly-increased lexicon, to first vocabulary according to Stochastic gradient rising treatment is performed, the corresponding term vector of first vocabulary is obtained;
Wherein, η ' represents learning rate.
8. method according to claim 6, it is characterised in that it is described according to first vocabulary in first Huffman The attribute of tree and gradient treatment is carried out to the goal-selling function in the attribute of second Huffman tree, obtain described first The corresponding term vector of vocabulary, including:
If first vocabulary belongs to the old lexicon, and first vocabulary first Huffman tree coding with The coding of second Huffman tree has same prefix part, then to first vocabulary on second Huffman tree The vectorial basis of the different piece corresponding node of Huffman encodingPerform boarding steps Degree rising treatment;Different piece to Huffman encoding of first vocabulary on first Huffman tree is corresponding described The vectorial basis of node on second Huffman treePerform at stochastic gradient descent Reason;
If first vocabulary belongs to the newly-increased lexicon, to first vocabulary according to Stochastic gradient rising treatment is performed, the corresponding term vector of first vocabulary is obtained;
Wherein,Represent the i-th -1 term vector of node on the corresponding first Huffman paths of w.
9. a kind of term vector trainer, it is characterised in that including:
Acquisition module, for obtaining newly-increased lexicon, the vocabulary in the newly-increased lexicon is constituted with the vocabulary in old lexicon New term storehouse, the corresponding term vector of haveing been friends in the past of vocabulary in the old lexicon;
Initialization module, for carrying out initialization process to the vocabulary in the new term storehouse so that belong in the new term storehouse The term vector of the vocabulary in the old lexicon is old term vector, in belonging to the newly-increased lexicon in the new term storehouse Vocabulary term vector is random term vector;
Update module, for being breathed out according to corresponding first Huffman tree in the new term storehouse and the old lexicon corresponding second Term vector of the Fu Man trees respectively to vocabulary in the new term storehouse is updated.
10. device according to claim 9, it is characterised in that
The update module, specifically for obtaining the corresponding goal-selling function of first vocabulary, first vocabulary is institute State the vocabulary in new term storehouse;According to first vocabulary in the attribute of first Huffman tree and in second Huffman The attribute of tree carries out gradient treatment to the goal-selling function, obtains the corresponding term vector of first vocabulary.
CN201710022458.0A 2017-01-12 2017-01-12 Word vector training method and device Active CN106897265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710022458.0A CN106897265B (en) 2017-01-12 2017-01-12 Word vector training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710022458.0A CN106897265B (en) 2017-01-12 2017-01-12 Word vector training method and device

Publications (2)

Publication Number Publication Date
CN106897265A true CN106897265A (en) 2017-06-27
CN106897265B CN106897265B (en) 2020-07-10

Family

ID=59198669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710022458.0A Active CN106897265B (en) 2017-01-12 2017-01-12 Word vector training method and device

Country Status (1)

Country Link
CN (1) CN106897265B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509422A (en) * 2018-04-04 2018-09-07 广州荔支网络技术有限公司 A kind of Increment Learning Algorithm of term vector, device and electronic equipment
WO2019095836A1 (en) * 2017-11-14 2019-05-23 阿里巴巴集团控股有限公司 Method, device, and apparatus for word vector processing based on clusters
CN110020303A (en) * 2017-11-24 2019-07-16 腾讯科技(深圳)有限公司 Determine the alternative method, apparatus and storage medium for showing content
CN110210557A (en) * 2019-05-31 2019-09-06 南京工程学院 A kind of online incremental clustering method of unknown text under real-time streams tupe
CN111325026A (en) * 2020-02-18 2020-06-23 北京声智科技有限公司 Training method and system for word vector model
US10769383B2 (en) 2017-10-23 2020-09-08 Alibaba Group Holding Limited Cluster-based word vector processing method, device, and apparatus
US11822447B2 (en) 2020-10-06 2023-11-21 Direct Cursus Technology L.L.C Methods and servers for storing data associated with users and digital items of a recommendation system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930318A (en) * 2016-04-11 2016-09-07 深圳大学 Word vector training method and system
CN106055623A (en) * 2016-05-26 2016-10-26 《中国学术期刊(光盘版)》电子杂志社有限公司 Cross-language recommendation method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930318A (en) * 2016-04-11 2016-09-07 深圳大学 Word vector training method and system
CN106055623A (en) * 2016-05-26 2016-10-26 《中国学术期刊(光盘版)》电子杂志社有限公司 Cross-language recommendation method and system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769383B2 (en) 2017-10-23 2020-09-08 Alibaba Group Holding Limited Cluster-based word vector processing method, device, and apparatus
WO2019095836A1 (en) * 2017-11-14 2019-05-23 阿里巴巴集团控股有限公司 Method, device, and apparatus for word vector processing based on clusters
US10846483B2 (en) 2017-11-14 2020-11-24 Advanced New Technologies Co., Ltd. Method, device, and apparatus for word vector processing based on clusters
CN110020303A (en) * 2017-11-24 2019-07-16 腾讯科技(深圳)有限公司 Determine the alternative method, apparatus and storage medium for showing content
CN108509422A (en) * 2018-04-04 2018-09-07 广州荔支网络技术有限公司 A kind of Increment Learning Algorithm of term vector, device and electronic equipment
CN108509422B (en) * 2018-04-04 2020-01-24 广州荔支网络技术有限公司 Incremental learning method and device for word vectors and electronic equipment
CN110210557A (en) * 2019-05-31 2019-09-06 南京工程学院 A kind of online incremental clustering method of unknown text under real-time streams tupe
CN110210557B (en) * 2019-05-31 2024-01-12 南京工程学院 Online incremental clustering method for unknown text in real-time stream processing mode
CN111325026A (en) * 2020-02-18 2020-06-23 北京声智科技有限公司 Training method and system for word vector model
CN111325026B (en) * 2020-02-18 2023-10-10 北京声智科技有限公司 Training method and system for word vector model
US11822447B2 (en) 2020-10-06 2023-11-21 Direct Cursus Technology L.L.C Methods and servers for storing data associated with users and digital items of a recommendation system

Also Published As

Publication number Publication date
CN106897265B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN106897265A (en) Term vector training method and device
CN103620624B (en) For the method and apparatus causing the local competition inquiry learning rule of sparse connectivity
CN106802888A (en) Term vector training method and device
CN108229582A (en) Entity recognition dual training method is named in a kind of multitask towards medical domain
CN108665175A (en) A kind of processing method, device and the processing equipment of insurance business risk profile
CN108334496A (en) Human-computer dialogue understanding method and system and relevant device for specific area
CN109784474A (en) A kind of deep learning model compression method, apparatus, storage medium and terminal device
Cho et al. Exponentially increasing the capacity-to-computation ratio for conditional computation in deep learning
CN106980650A (en) A kind of emotion enhancing word insertion learning method towards Twitter opinion classifications
CN108021908A (en) Face age bracket recognition methods and device, computer installation and readable storage medium storing program for executing
CN109299264A (en) File classification method, device, computer equipment and storage medium
CN107273352A (en) A kind of word insertion learning model and training method based on Zolu functions
CN108197653A (en) A kind of time series classification method based on convolution echo state network
CN107025463A (en) Based on the bedroom apparatus for grouping and method for merging grouping algorithm
CN107194151A (en) Determine the method and artificial intelligence equipment of emotion threshold value
CN109242089B (en) Progressive supervised deep learning neural network training method, system, medium and device
CN114154839A (en) Course recommendation method based on online education platform data
CN111324736B (en) Man-machine dialogue model training method, man-machine dialogue method and system
CN110069781B (en) Entity label identification method and related equipment
CN107886163A (en) Single-object problem optimization method and device based on AGN and CNN
KR20180127890A (en) Method and apparatus for user adaptive speech recognition
CN109871448A (en) A kind of method and system of short text classification
CN110516228A (en) Name entity recognition method, device, computer installation and computer readable storage medium
Tsihrintzis et al. Surveys in artificial intelligence-based technologies
Shinde et al. Mining classification rules from fuzzy min-max neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant