CN107402914A - Natural language deep learning system and method - Google Patents
Natural language deep learning system and method Download PDFInfo
- Publication number
- CN107402914A CN107402914A CN201610341719.0A CN201610341719A CN107402914A CN 107402914 A CN107402914 A CN 107402914A CN 201610341719 A CN201610341719 A CN 201610341719A CN 107402914 A CN107402914 A CN 107402914A
- Authority
- CN
- China
- Prior art keywords
- msub
- word
- sample
- mrow
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
Abstract
The present invention relates to natural language deep learning system and method.The system includes:Error calculation unit, it is configured to when being trained to natural language deep learning system, the error amount of sample is calculated according to based on the loss function of sample pair, loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss function is defined based on following criterion:When the true classification of sample pair is identical, difference between its class prediction vector value should be smaller, and when the true classification difference of sample pair, the difference between its class prediction vector value should be larger, and classification error of the classification loss function based on sample pair defines.Within the system, the cost to loss study based on sample is reduced to allowable loss function based on sample.
Description
Technical field
The present invention relates to field of information processing, relates more specifically to a kind of natural language deep learning system and method.
Background technology
The combination of deep learning and natural language processing technique is study hotspot in recent years.In existing deep learning model
It is the common form of natural language processing deep learning framework using word as essential characteristic unit in system.Shown by research, from
Right Language Processing feature can effectively lift the learning performance of various tasks, so researcher would generally use a variety of different words
Feature more can learning study to introduce.But following two aspect in practical operation can be present:
1st, the natural language instrument for generating word feature all relies on participle technique, and different participle techniques causes identical nature
Language text can produce different word sequences, so that word feature has differences.The problem of so bringing is that multi-source word feature is deposited
In fusion error.
2nd, word insertion is an important step of the deep learning field in natural language processing field.Its main function is
Word is mapped to a word and represents vector.Under normal circumstances, a good word represents that vector should make in semantically similar word
Distance is near in vector space;Conversely, distance is more remote.Due to being represented by machine learning from random vector to good vector representation
Generally require substantial amounts of language material.Therefore, in the task of sample size deficiency, the vocabulary often trained with forefathers is shown as
The initial value of word insertion.Thus unavoidably occur and have no word phenomenon.Although the insertion of some words trained will have no word
All it is initialised to an identical vector, but different has no that word to the close cost of similar import vocabulary is clearly different.Letter
It is single to have no that word is initialised to an identical vector and neutral net local convergence be caused excessively slow by all.
In addition to the problem of handling word characteristic aspect, lack of training samples is also that deep learning is combined with natural language processing
A big obstacle.In the case of without adequate sample, how to describe error by using more preferable loss function turns into the heat of research
Door problem.
Accordingly, it is desirable to be able to provide a kind of natural language deep learning system and method that can solve the problem that above mentioned problem.
The content of the invention
The brief overview on the present invention is given below, to provide the basic reason on certain aspects of the invention
Solution.It should be appreciated that this general introduction is not the exhaustive general introduction on the present invention.It is not intended to determine the key of the present invention
Or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides some concepts in simplified form, with
This is as the preamble in greater detail discussed later.
A primary object of the present invention is, there is provided a kind of natural language deep learning system, including:Error calculation
Unit, it is configured to when being trained to natural language deep learning system, according to based on the loss function of sample pair come
The error amount of sample is calculated, loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss
Function is defined based on following criterion:When the true classification of sample pair is identical, the difference between its class prediction vector value should
When smaller, and when the true classification difference of sample pair, the difference between its class prediction vector value should be larger, classification loss
Classification error of the function based on sample pair defines.
According to an aspect of the present invention, there is provided a kind of natural language deep learning method, including:Deep to natural language
Degree learning system is when being trained, and according to the error amount of sample is calculated based on the loss function of sample pair, loss function is phase
Like the combination of degree loss function and classification loss function, wherein, similarity loss function is defined based on following criterion:Work as sample
To true classification it is identical when, the difference between its class prediction vector value should be smaller, and when sample pair true classification not
Meanwhile the difference between its class prediction vector value should be larger, classification error of the classification loss function based on sample pair is determined
Justice.
In addition, embodiments of the invention additionally provide the computer program for realizing the above method.
In addition, embodiments of the invention additionally provide the computer program product of at least computer-readable medium form, its
Upper record has the computer program code for realizing the above method.
By excellent below in conjunction with detailed description of the accompanying drawing to highly preferred embodiment of the present invention, these and other of the invention
Point will be apparent from.
Brief description of the drawings
Below with reference to the accompanying drawings illustrate embodiments of the invention, the above of the invention and its can be more readily understood that
Its objects, features and advantages.Part in accompanying drawing is intended merely to show the principle of the present invention.In the accompanying drawings, identical or similar
Technical characteristic or part will be represented using same or similar reference.
Fig. 1, which is shown, realizes the exemplary of natural language deep learning system 100 according to an embodiment of the invention
The block diagram of configuration;
Fig. 2 is to show that the exemplary of natural language deep learning system 200 according to another embodiment of the invention is matched somebody with somebody
The block diagram put;
Fig. 3 is to show that the exemplary of natural language deep learning system 300 according to still another embodiment of the invention is matched somebody with somebody
The block diagram put;
Fig. 4 shows the example process of natural language deep learning method 400 according to an embodiment of the invention
Flow chart;
Fig. 5 is the exemplary mistake for showing natural language deep learning method 500 according to another embodiment of the invention
The flow chart of journey;
Fig. 6 is the exemplary mistake for showing natural language deep learning method 600 according to still another embodiment of the invention
The flow chart of journey;And
Fig. 7 is to show to can be used for implement showing for the computing device of the natural language deep learning system and method for the present invention
Example property structure chart.
Embodiment
The one exemplary embodiment of the present invention is described hereinafter in connection with accompanying drawing.For clarity and conciseness,
All features of actual embodiment are not described in the description.It should be understood, however, that developing any this actual implementation
It must be made during example much specific to the decision of embodiment, to realize the objectives of developer, for example, symbol
Those restrictive conditions related to system and business are closed, and these restrictive conditions may have with the difference of embodiment
Changed.In addition, it will also be appreciated that although development is likely to be extremely complex and time-consuming, to having benefited from the disclosure
For those skilled in the art of content, this development is only routine task.
Herein, it is also necessary to which explanation is a bit, in order to avoid having obscured the present invention because of unnecessary details, in the accompanying drawings
It illustrate only and according to the closely related device structure of the solution of the present invention and/or processing step, and eliminate and the present invention
The little other details of relation.
The present invention proposes a kind of natural language deep learning system and method, can solve natural language processing and depth
Practise the problems with during being combined:
1st, the design problem of the loss function in the case of sample deficiency
Accuracy is influenceed by sample size based on the loss function of sample and receives limitation, can be by further describing sample
This relation between more accurately portrays loss.But simple sample may cause study renewal efficiency to loss again
Decline the problem of.For this problem, the invention solves the loss function how designed based on sample pair and base is reduced
Cost problem in sample to loss study.
2nd, the initialization matter of word is had no
By analysis it can be found that having no that word can be subdivided into following several situations:
A, by capital and small letter, symbol it is different introduce difference lead to not matching (such as:" worker. ", " worker " are embedding in word
Have in entering, and " worker. " is to have no word);
B, numeral can not match (such as:" 15 ", " 1802 ", " 1000 " have in word insertion, and " 199 " are to have no word);
C, there are word similar in lemma (Chairman, Chairwoman or video, videotape, videomask etc.
Deng).
It is possible thereby to consider to be embedded in initialize word with most similar vocabulary.
3rd, word sequence matching problem
By analysis, it can be found that the fusion error of multi-source word feature comes from what is segmented between different system or instrument
Otherness.Therefore, the invention also provides a kind of multi-source word sequence matching algorithm to reduce the error of word Fusion Features.
Describe natural language deep learning system and method according to an embodiment of the invention in detail below in conjunction with the accompanying drawings.Under
Description in text is carried out in the following order:
1. natural language deep learning system
2. natural language deep learning method
3. to implement the computing device of the system and method for the application
[1. natural language deep learning system]
Fig. 1, which is shown, realizes the exemplary of natural language deep learning system 100 according to an embodiment of the invention
The block diagram of configuration.
As shown in figure 1, natural language deep learning system 100 includes error calculation unit 102.Error calculation unit 102
It is configured to when being trained to natural language deep learning system, sample is calculated according to based on the loss function of sample pair
Error amount, loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss function is based on
Following criterion defines:When the true classification of sample pair is identical, the difference between its class prediction vector value should be smaller, and
When the true classification difference of sample pair, the difference between its class prediction vector value should be larger, and classification loss function is based on
The classification error of sample pair defines.
Traditional loss function be using single sample class error as judgment criteria, can in the case of limited sample size
The information of study is also limited, influences whether final learning performance.The problem of in order to overcome the loss function based on sample, propose
A kind of loss function based on sample pair.
In one example, under mini-batch SGD (stochastic gradient descent in batches) learning framework, at each
Under batch (batch), the similarity between neutral net output vector between sample and sample is calculated.Similarity loss function
A pair_simi_cost example formula is as follows:
A classification loss function pair_label_cost example formula is as follows:
Loss function pair_cost can be defined as:Pair_cost=pair_simi_cost+pair_label_
cost。
Wherein, function abs represents to take absolute value, and argmax represents the maximum dimension index of orientation value, and sgn is symbol letter
Number, i are the indexes of the first sample of sample centering, and j is the index of the second sample of sample centering, yprediRepresent first sample i
Class prediction vector value, ypredjRepresent the second sample j class prediction vector value, yiFirst sample i true classification is represented,
yjRepresent the second sample j true classification.
In another example, similarity loss function pair_simi_cost can be defined as based on distance metric:
Classification loss function can use and identical function in above-mentioned example:
Loss function pair_cost can be then defined as:
paircost=λ1*pair_simi_cost+λ2*pair_label_cost。
That is loss function pair_cost is the linear weighted function of similarity loss function and classification loss function, λ1And λ2It is each
From weight, λ1+λ2=1.
Loss function is defined as to the combination of similarity loss function and classification loss function, Ke Yi by above-mentioned criterion
In the case of without adequate sample, error is described with more preferable loss function, and based on sample to loss can be reduced
The cost of habit.
Fig. 2 is to show that the exemplary of natural language deep learning system 200 according to another embodiment of the invention is matched somebody with somebody
The block diagram put.
As shown in Fig. 2 natural language deep learning system 200 includes error calculation unit 202 and initialization unit 204.
Error calculation unit 202 in Fig. 2 is similar with the function of the error calculation unit 102 in Fig. 1, will not be described here.
Natural language deep learning system 200 shown in Fig. 2 in addition to including error calculation unit 102, in addition to just
Beginningization unit 204.
Initialization unit 204 is suitable for natural language deep learning systematic procedure, when the word to be mapped for study
Be embedded in for existing word be not present in dictionary have no the word to be mapped initialized during word.
Initialization unit 204 is configured as:If finding word to be mapped in stereotype dictionary, using right in stereotype dictionary
The vector answered initializes to word to be mapped, otherwise, if finding word to be mapped in stem dictionary, uses stem dictionary
In corresponding vector word to be mapped is initialized.
In one example, specifically, two new dictionary stem_dict and lemma_dict are initialized first.
Then, in third-party Word Embedding (word insertion) dictionary word perform extraction stem (stem) and
Lemma (original shape) is operated.
Then, the term vector that will have identical stem (original shape) takes out, and asks for the barycenter of these term vectors.Respectively by word
Stem_dict and lemma_dict is arrived in dry, original shape and the storage of corresponding centroid vector.
For the entry of numeric type, barycenter is asked for after being found out by regular expression.The barycenter is saved as
NUM。
Upon initialization:
1. if when having current mapping word in original dictionary, carried out using vector corresponding to the entry in original dictionary initial
Change, otherwise carry out step 2;
2. if current mapping word original shape can be found in stereotype dictionary, use current morphology pair in lemma_dict
The vector answered is initialized, and otherwise carries out step 3;
3. if current mapping word stem can be found in stem dictionary, use current morphology pair in stem_dict
The vector answered is initialized, and otherwise carries out step 4;
4. if currently mapping word is numeral, NUM is initialized as, otherwise carries out step 5;
5. entry, which is mapped in Word Embedding, has no term vector (if it is not, being mapped to one at random
Vector).
By above-mentioned initialization process, can avoid having no that word is initialised to caused by an identical vector by all
The problem of neutral net local convergence is excessively slow.
Fig. 3 is to show that the exemplary of natural language deep learning system 300 according to still another embodiment of the invention is matched somebody with somebody
The block diagram put.
As shown in figure 3, natural language deep learning system 300 includes error calculation unit 302 and matching unit 306.Fig. 3
In error calculation unit 302 it is similar with the function of the error calculation unit 102 in Fig. 1, will not be described here.
Natural language deep learning system 300 shown in Fig. 3 in addition to including error calculation unit 102, in addition to
With unit 306.
Matching unit 306 is configured to:Different two of a sentence to being obtained by different participle techniques segment
Sequence, dynamic programming matching is carried out based on the similarity between each of which word, so as to carry out word Fusion Features.
The natural language instrument of generation word feature all relies on participle technique, and different participle techniques causes identical natural language
Speech text can produce different word sequences, so that word feature has differences.The problem of so bringing is that multi-source word feature is present
Merge error.The fusion error of multi-source word feature comes from the otherness segmented between different system or instrument.Pass through nature
Matching unit in language deep learning system 300 can reduce the error of word Fusion Features.
In one example, specifically, it is assumed that have two segmentation sequences A and B, make A=[a1,a2,a3,…am] to treat
With segmentation sequence;B=[b1,b2,b3,…bn] it is target segmentation sequence.
First,.Below equation can be used to calculate each word a in AiWith each word b in BjBetween Levenshtein ratios:
Wherein len () represents the length of sequence;Editdistance represent class editing distance, i.e., insertion, deletion action away from
From+1, replacement operation distance+2.
Thus, it is possible to obtain a m * n matrix S, matrix element is S (i, j), notes i, j index is since 0.
Thus will be asked based on the similarity between each word of two different segmentation sequences to carry out dynamic programming matching
Topic is converted into:In matrix S, the path that a length is equal to sequence A to be matched is found so that the institute on path is a little (i, j)
Levenshtein it is more maximum than sum.Following method can be taken in order to reduce the cost of route searching:
A, define a set walked to be used to record whether node was searched for, when initializing walked, by S
The node (i, j) of (i, j)=0 is added in walked set.
B, a sequence path is defined to be used to record the sequence node currently passed by.It is initialized as sky.
C, a current search is defined to the maximum Levenshtein in path candidate than sum max_weight, initially
Turn to 0.
D, the hunting zone cscope=of a next both candidate nodes is defined | len (aj)-len(bj) |+δ, wherein δ are
Constant, value are an integers, and preferably value is 4 or bigger.The bigger volumes of searches of δ also can be more.
E, defined nucleotide sequence paths, for storing the path candidate found.Each element in paths is (path sequence
Row, path Levenshtein is than sum).
Specific optimal word sequence matching algorithm can be described as follows:
The path that Levenshtein is more maximum than sum in paths is finally taken as coupling path.
In above-mentioned algorithm, the similarity between each word of different two segmentation sequences is by between two words of calculating
Levenshtein distances obtain.
Based on above-mentioned algorithm, matching unit 306 is configured to:Two segmentation sequences are respectively set as to be matched
Segmentation sequence and target segmentation sequence;With based on each word in each word in segmentation sequence to be matched and target segmentation sequence
Similarity between any two usually builds matrix as member;And News Search path in a matrix, find a length and be equal to
The path of the length of sequence to be matched, and the similarity sum of all elements on the path is maximum.
In one example, preferentially to the direction searching route of segmentation sequence to be matched in News Search path.
Based on above-mentioned algorithm, matching unit 306 is configured to:I index values and j indexes based on currentElement
The number of value, the length of each self-corresponding word and the word in target segmentation sequence comes limit search section;Matrix is asked to search for
The average and standard deviation of all elements in section;And by element in the region of search, more than average and standard deviation sum
As candidate's element to be matched.
In above-mentioned algorithm, for currentElement (i, j), the region of search is:I+1 row jth is arranged to end and arranged, wherein,
End=min (j+cscope- | i-j |, n) and, wherein cscope is the length and target point of i-th of word in segmentation sequence to be matched
The absolute value of the difference of the length of j-th of word in word sequence adds constant predetermined amount, and n is of the word in target segmentation sequence
Number.
The matching of polynary word sequence is realized by above-mentioned algorithm, it is possible to reduce the error of word Fusion Features.
Herein it should be noted that natural language deep learning system 100-300 and its component units shown in Fig. 1-3
What structure was merely exemplary, those skilled in the art can modify to the structured flowchart shown in Fig. 1-3 as needed.Example
Such as, the structure of Fig. 2 and Fig. 3 natural language deep learning system can be combined, forming one includes error calculation unit, just
The natural language deep learning system of beginningization unit and matching unit.
[2. natural language deep learning method]
Fig. 4 shows the example process of natural language deep learning method 400 according to an embodiment of the invention
Flow chart.
As shown in figure 4, natural language deep learning method 400 includes error calculating step S402, in step S402,
When being trained to natural language deep learning system, according to calculating the error amount of sample based on the loss function of sample pair,
Loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss function is based on following criterion
To define:When the true classification of sample pair is identical, the difference between its class prediction vector value should be smaller, and works as sample pair
True classification difference when, the difference between its class prediction vector value should be larger, and classification loss function is based on sample pair
Classification error defines.
Specifically, loss function pair_cost can be:
Pair_cost=pair_simi_cost+pair_label_cost,
Wherein, similarity loss function pair_simi_cost is:
Classification loss function pair_label_cost is:
Wherein, i is the index of the first sample of sample centering, and j is the index of the second sample of sample centering, yprediRepresent
First sample i class prediction vector value, ypredjRepresent the second sample j class prediction vector value, yiRepresent first sample i's
True classification, yjRepresent the second sample j true classification.
In another example, similarity loss function pair_simi_cost can be defined as based on distance metric:
Classification loss function can use and identical function in above-mentioned example:
Loss function pair_cost can be then defined as:
paircost=λ1*pair_simi_cost+λ2*pair_label_cost。
Loss function pair_cost is defined as to the linear weighted function of similarity loss function and classification loss function, λ1
And λ2It is respective weight, λ1+λ2=1.
Wherein, natural language deep learning system is learnt in stochastic gradient descent learning framework in batches, by sample
This is to selected from each batch.
Fig. 5 is the exemplary mistake for showing natural language deep learning method 500 according to another embodiment of the invention
The flow chart of journey.
Step S502 in Fig. 5 is similar with the step S402 that reference picture 4 describes, and will not be repeated here.
Step S504 in Fig. 5 is that the word to be mapped for being used to learn in natural language deep learning system is that existing word is embedding
Enter be not present in dictionary have no the step of being initialized during word to the word to be mapped.
In step S504, if finding word to be mapped in stereotype dictionary, corresponding vector in stereotype dictionary is used
Word to be mapped is initialized, otherwise, if finding word to be mapped in stem dictionary, corresponding in stem dictionary
Vector initializes to word to be mapped, wherein, stereotype dictionary, which is used to store in existing word insertion dictionary, has identical original shape
Multiple words term vector centroid vector and corresponding original shape, stem dictionary be used to store having in existing word insertion dictionary
The centroid vector of the term vector of multiple words of identical stem and corresponding stem.
Fig. 6 is the exemplary mistake for showing natural language deep learning method 600 according to still another embodiment of the invention
The flow chart of journey.
Step S602 in Fig. 6 is similar with the step S402 that reference picture 4 describes, and will not be repeated here.
Step S606 in Fig. 6 is matching step, in S606, to a sentence being obtained by different participle techniques
Two different segmentation sequences, dynamic programming matching is carried out based on the similarity between each of which word, so as to carry out word feature
Fusion.
Wherein, similarity is obtained by calculating the levenshtein distances between two words.
Wherein, step S606 further comprises:Two segmentation sequences are respectively set as segmentation sequence to be matched and target
Segmentation sequence;With based on each word in segmentation sequence to be matched and each word in target segmentation sequence between any two similar
Degree usually builds matrix as member;And News Search path in a matrix, find the length that a length is equal to sequence to be matched
The path of degree, and the similarity sum of all elements on the path is maximum.
Wherein, preferentially to the direction searching route of segmentation sequence to be matched in News Search path.
Wherein, step S606 further comprises:I index values and j index values, each self-corresponding word based on currentElement
Length and the number of the word in target segmentation sequence come limit search section;Seek all elements of the matrix in the region of search
Average and standard deviation;And using it is in the region of search, more than average and standard deviation sum element as candidate's element to be matched.
Wherein, it is for currentElement (i, j), the region of search:I+1 row jth is arranged to end and arranged, wherein, end=min
(j+cscope- | i-j |, n), wherein cscope be segmentation sequence to be matched in i-th of word length and target segmentation sequence in
The absolute value of difference of length of j-th of word add constant predetermined amount, n is the number of the word in target segmentation sequence.
Operation and details on natural language deep learning method 400-600 each step be referred to combine Fig. 1-
The embodiment of the natural language deep learning system of the invention of 3 descriptions, is not detailed herein.
The present invention proposes a kind of natural language deep learning system and method, by the invention it is obtained that following advantage:
1st, the loss function how designed based on sample pair and the generation reduced based on sample to the study of loss are solved
Valency problem.
2nd, solves the initialization matter to having no word in the task of sample size deficiency.
3rd, a kind of multi-source word sequence matching algorithm is proposed to reduce the error of word Fusion Features.
[3. to implement the computing device of the present processes and device]
The general principle of the present invention is described above in association with specific embodiment, however, it is desirable to, it is noted that to this area
For those of ordinary skill, it is to be understood that the whole either any steps or part of methods and apparatus of the present invention, Ke Yi
In any computing device (including processor, storage medium etc.) or the network of computing device, with hardware, firmware, software or
Combinations thereof is realized that this is that those of ordinary skill in the art use them in the case where having read the explanation of the present invention
Basic programming skill can be achieved with.
Therefore, the purpose of the present invention can also by run on any computing device a program or batch processing come
Realize.The computing device can be known fexible unit.Therefore, the purpose of the present invention can also include only by offer
The program product of the program code of methods described or device is realized to realize.That is, such program product is also formed
The present invention, and the storage medium for being stored with such program product also forms the present invention.Obviously, the storage medium can be
Any known storage medium or any storage medium developed in the future.
In the case where realizing embodiments of the invention by software and/or firmware, from storage medium or network to
The computer of specialized hardware structure, for example, shown in Fig. 7 all-purpose computer 700 installation form the software program, the computer
When being provided with various programs, various functions etc. are able to carry out.
In the figure 7, CPU (CPU) 701 is according to the program stored in read-only storage (ROM) 702 or from depositing
The program that storage part 708 is loaded into random access memory (RAM) 703 performs various processing.In RAM 703, also according to need
Store the data required when CPU 701 performs various processing etc..CPU 701, ROM 702 and RAM 703 are via bus
704 links each other.Input/output interface 705 also link to bus 704.
Components described below link is to input/output interface 705:Importation 706 (including keyboard, mouse etc.), output section
Points 707 (including displays, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.), storage part
708 (including hard disks etc.), communications portion 709 (including NIC such as LAN card, modem etc.).Communications portion 709
Communication process is performed via network such as internet.As needed, driver 710 also can link to input/output interface 705.
Detachable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in driver 710 as needed
On so that the computer program read out is installed in storage part 708 as needed.
It is such as removable from network such as internet or storage medium in the case where realizing above-mentioned series of processes by software
Unload the program that the installation of medium 711 forms software.
It will be understood by those of skill in the art that this storage medium be not limited to wherein having program stored therein shown in Fig. 7,
Separately distribute with equipment to provide a user the detachable media 711 of program.The example of detachable media 711 includes disk
(including floppy disk (registration mark)), CD (including compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk
(including mini-disk (MD) (registration mark)) and semiconductor memory.Or storage medium can be ROM 702, storage part
Hard disk included in 708 etc., wherein computer program stored, and user is distributed to together with the equipment comprising them.
The present invention also proposes a kind of program product for the instruction code for being stored with machine-readable.Instruction code is read by machine
When taking and performing, above-mentioned method according to embodiments of the present invention can perform.
Correspondingly, the storage medium of the program product for carrying the above-mentioned instruction code for being stored with machine-readable is also wrapped
Include in disclosure of the invention.Storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..
It should be appreciated by those skilled in the art that this enumerated be it is exemplary, the invention is not limited in this.
In this manual, the statement such as " first ", " second " and " n-th " be in order to by described feature in word
On distinguish, so that the present invention is explicitly described.Therefore, it should not serve to that there is any limited implication.
As an example, each step of the above method and all modules and/or unit of the said equipment can
To be embodied as software, firmware, hardware or its combination, and as the part in relevant device.Each composition mould in said apparatus
Workable specific means or mode are ability when block, unit are configured by way of software, firmware, hardware or its combination
Known to field technique personnel, it will not be repeated here.
As an example, in the case where being realized by software or firmware, can from storage medium or network to
Computer (such as all-purpose computer 700 shown in Fig. 7) installation of specialized hardware structure forms the program of the software, the computer
When being provided with various programs, various functions etc. are able to carry out.
In the feature in the description of the specific embodiment of the invention, describing and/or showing for a kind of embodiment above
It can be used in a manner of same or similar in one or more other embodiments, with the feature in other embodiment
It is combined, or substitute the feature in other embodiment.
It should be emphasized that term "comprises/comprising" refers to the presence of feature, key element, step or component when being used herein, but simultaneously
It is not excluded for the presence or additional of other one or more features, key element, step or component.
In addition, the method for the present invention be not limited to specifications described in time sequencing perform, can also according to it
His time sequencing, concurrently or independently perform.Therefore, the execution sequence of the method described in this specification is not to this hair
Bright technical scope is construed as limiting.
The present invention and its advantage it should be appreciated that without departing from the essence of the invention being defined by the claims appended hereto
Various changes, replacement and conversion can be carried out in the case of god and scope.Moreover, the scope of the present invention is not limited only to specification institute
The process of description, equipment, means, the specific embodiment of method and steps.One of ordinary skilled in the art is from the present invention's
Disclosure will readily appreciate that, can be used according to the present invention perform the function essentially identical to corresponding embodiment in this or
Obtain result, existing and in the future to be developed process, equipment, means, method or the step essentially identical with it.Cause
This, appended claim includes such process, equipment, means, method or step in the range of being directed at them.
Explanation based on more than, it is known that open at least to disclose following technical scheme:
1st, a kind of natural language deep learning system, including:
Error calculation unit, it is configured to when being trained to the natural language deep learning system, according to base
The error amount of sample is calculated in the loss function of sample pair, wherein, the loss function is similarity loss function and classification
The combination of loss function,
Wherein, the similarity loss function is defined based on following criterion:When the true classification of the sample pair is identical
When, the difference between its class prediction vector value should be smaller, and when the true classification difference of the sample pair, its classification is pre-
Difference between direction finding value should be larger,
The classification loss function is defined based on the classification error of the sample pair.
2nd, the system according to note 1, the loss function pair_cost are:
Pair_cost=pair_simi_cost+pair_label_cost,
Wherein, similarity loss function pair_simi_cost is:
Classification loss function pair_label_cost is:
Wherein, i is the index of the first sample of the sample centering, and j is the index of the second sample of the sample centering,
yprediRepresent first sample i class prediction vector value, ypredjRepresent the second sample j class prediction vector value, yiRepresent the
One sample i true classification, yjRepresent the second sample j true classification.
3rd, the system according to note 1, wherein, the natural language deep learning system is under stochastic gradient in batches
Learnt in drop learning framework, the sample is to selected from each batch.
4th, the system according to note 1, it is additionally included in the natural language deep learning system and is used for treating for study
Mapping word has no the initialization unit initialized during word to the word to be mapped for what is be not present in existing word insertion dictionary, its
In, the initialization unit is configured as:
If finding the word to be mapped in stereotype dictionary, using corresponding vector in the stereotype dictionary to described
Word to be mapped is initialized, and otherwise, if finding the word to be mapped in the stem dictionary, uses the stem word
Corresponding vector initializes to the word to be mapped in allusion quotation,
Wherein, the stereotype dictionary is used to store multiple words with identical original shape in the existing word insertion dictionary
The centroid vector of term vector and corresponding original shape, the stem dictionary, which is used to store in the existing word insertion dictionary, has phase
Centroid vector and corresponding stem with the term vector of multiple words of stem.
5th, the system according to note 1, in addition to matching unit, the matching unit are configured to:To passing through difference
Different two segmentation sequence for the sentence that participle technique obtains, enters Mobile state based on the similarity between each of which word
Planning matching, so as to carry out word Fusion Features.
6th, the system according to note 5, wherein, the similarity is by calculating the levenshtein between two words
Distance obtains.
7th, the system according to note 5, wherein, the matching unit is configured to:
Described two segmentation sequences are respectively set as segmentation sequence to be matched and target segmentation sequence;
With based on each word in each word in the segmentation sequence to be matched and the target segmentation sequence two-by-two it
Between similarity as member usually build matrix;And
The News Search path in the matrix, path of the length equal to the length of the sequence to be matched is found,
And the similarity sum of all elements on the path is maximum.
8th, the system according to note 7, wherein, preferentially to the segmentation sequence to be matched in News Search path
Direction searching route.
9th, the system according to note 7, wherein, the matching unit is configured to:
The length of i index values and j index values, each self-corresponding word based on currentElement and the target segmentation sequence
In the number of word come limit search section;
Seek the average and standard deviation of all elements of the matrix in the region of search;And
Using it is in the region of search, more than the average and standard deviation sum element as candidate's element to be matched.
10th, the system according to note 9, wherein, for currentElement (i, j), the region of search is:I+1 row
Jth is arranged to end and arranged, wherein, end=min (j+cscope- | i-j |, n) and, wherein cscope is the segmentation sequence to be matched
In the length of i-th word and the absolute value of difference of length of j-th of word in the target segmentation sequence add constant predetermined amount,
N is the number of the word in the target segmentation sequence.
11st, a kind of natural language deep learning method, including:
When being trained to the natural language deep learning system, calculated according to based on the loss function of sample pair
The error amount of sample, wherein, the loss function is the combination of similarity loss function and classification loss function,
Wherein, the similarity loss function is defined based on following criterion:When the true classification of the sample pair is identical
When, the difference between its class prediction vector value should be smaller, and when the true classification difference of the sample pair, its classification is pre-
Difference between direction finding value should be larger,
The classification loss function is defined based on the classification error of the sample pair.
12nd, the method according to note 11, the loss function pair_cost are:
Pair_cost=pair_simi_cost+pair_label_cost,
Wherein, similarity loss function pair_simi_cost is:
Classification loss function pair_label_cost is:
Wherein, i is the index of the first sample of the sample centering, and j is the index of the second sample of the sample centering,
yprediRepresent first sample i class prediction vector value, ypredjRepresent the second sample j class prediction vector value, yiRepresent the
One sample i true classification, yjRepresent the second sample j true classification.
13rd, according to the method for claim 11, wherein, the natural language deep learning system is random in batches
Gradient declines to be learnt in learning framework, and the sample is to selected from each batch.
14th, according to the method for claim 11, it is additionally included in the natural language deep learning system and is used to learn
The word to be mapped practised has no the initialization initialized during word to the word to be mapped for what is be not present in existing word insertion dictionary
Step, wherein, the initialization step includes:
If finding the word to be mapped in stereotype dictionary, using corresponding vector in the stereotype dictionary to described
Word to be mapped is initialized, and otherwise, if finding the word to be mapped in the stem dictionary, uses the stem word
Corresponding vector initializes to the word to be mapped in allusion quotation,
Wherein, the stereotype dictionary is used to store multiple words with identical original shape in the existing word insertion dictionary
The centroid vector of term vector and corresponding original shape, the stem dictionary, which is used to store in the existing word insertion dictionary, has phase
Centroid vector and corresponding stem with the term vector of multiple words of stem.
15th, according to the method for claim 11, in addition to matching step, the matching step include:To by not
Different two segmentation sequence of the sentence obtained with participle technique, enters action based on the similarity between each of which word
State planning matching, so as to carry out word Fusion Features.
16th, according to the method for claim 15, wherein, the similarity is by between two words of calculating
Levenshtein distances obtain.
17th, according to the method for claim 15, wherein, the matching step further comprises:
Described two segmentation sequences are respectively set as segmentation sequence to be matched and target segmentation sequence;
With based on each word in each word in the segmentation sequence to be matched and the target segmentation sequence two-by-two it
Between similarity as member usually build matrix;And
The News Search path in the matrix, path of the length equal to the length of the sequence to be matched is found,
And the similarity sum of all elements on the path is maximum.
18th, the method according to claim 11, wherein, preferentially to the participle to be matched in News Search path
The direction searching route of sequence.
19th, according to the method for claim 17, wherein, the matching step further comprises:
The length of i index values and j index values, each self-corresponding word based on currentElement and the target segmentation sequence
In the number of word come limit search section;
Seek the average and standard deviation of all elements of the matrix in the region of search;And
Using it is in the region of search, more than the average and standard deviation sum element as candidate's element to be matched.
20th, according to the method for claim 19, wherein, for currentElement (i, j), the region of search is:I-th
+ 1 row jth is arranged to end and arranged, wherein, end=min (j+cscope- | i-j |, n) and, wherein cscope is the participle to be matched
The absolute value of the difference of the length of i-th word and the length of j-th of word in the target segmentation sequence is along with predetermined in sequence
Constant, n are the number of the word in the target segmentation sequence.
Claims (10)
1. a kind of natural language deep learning system, including:
Error calculation unit, it is configured to when being trained to the natural language deep learning system, according to based on sample
This to loss function calculate the error amount of sample, wherein, the loss function is that similarity loss function loses with classification
The combination of function,
Wherein, the similarity loss function is defined based on following criterion:When the true classification of the sample pair is identical, its
Difference between class prediction vector value should be smaller, and when the true classification difference of the sample pair, its class prediction to
Difference between value should be larger,
The classification loss function is defined based on the classification error of the sample pair.
2. system according to claim 1, the loss function pair_cost are:
Pair_cost=pair_simi_cost+pair_label_cost,
Wherein, similarity loss function pair_simi_cost is:
<mrow>
<mi>p</mi>
<mi>a</mi>
<mi>i</mi>
<mi>r</mi>
<mo>_</mo>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mi>i</mi>
<mo>_</mo>
<mi>cos</mi>
<mi>t</mi>
<mo>=</mo>
<mi>a</mi>
<mi>b</mi>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>a</mi>
<mi>b</mi>
<mi>s</mi>
<mo>(</mo>
<mfrac>
<mrow>
<msub>
<mi>y</mi>
<mrow>
<msub>
<mi>pred</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<mo>&CenterDot;</mo>
<msub>
<mi>y</mi>
<mrow>
<msub>
<mi>pred</mi>
<mi>j</mi>
</msub>
</mrow>
</msub>
</mrow>
<mrow>
<mo>|</mo>
<msub>
<mi>y</mi>
<mrow>
<msub>
<mi>pred</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<mo>|</mo>
<mo>&CenterDot;</mo>
<mo>|</mo>
<msub>
<mi>y</mi>
<mrow>
<msub>
<mi>pred</mi>
<mi>j</mi>
</msub>
</mrow>
</msub>
<mo>|</mo>
</mrow>
</mfrac>
<mo>)</mo>
<mo>-</mo>
<mi>sgn</mi>
<mo>(</mo>
<mrow>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mo>=</mo>
<msub>
<mi>y</mi>
<mi>j</mi>
</msub>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Classification loss function pair_label_cost is:
<mrow>
<mi>p</mi>
<mi>a</mi>
<mi>i</mi>
<mi>r</mi>
<mo>_</mo>
<mi>l</mi>
<mi>a</mi>
<mi>b</mi>
<mi>e</mi>
<mi>l</mi>
<mo>_</mo>
<mi>cos</mi>
<mi>t</mi>
<mo>=</mo>
<mn>2</mn>
<mo>-</mo>
<mi>s</mi>
<mi>g</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<mi>argmax</mi>
<mo>(</mo>
<msub>
<mi>y</mi>
<mrow>
<msub>
<mi>pred</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<mo>)</mo>
<mo>=</mo>
<mo>=</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>s</mi>
<mi>g</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<mi>argmax</mi>
<mo>(</mo>
<msub>
<mi>y</mi>
<mrow>
<msub>
<mi>pred</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<mo>)</mo>
<mo>=</mo>
<mo>=</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
Wherein, i is the index of the first sample of the sample centering, and j is the index of the second sample of the sample centering, ypredi
Represent first sample i class prediction vector value, ypredjRepresent the second sample j class prediction vector value, yiRepresent the first sample
This i true classification, yjRepresent the second sample j true classification.
3. system according to claim 1, wherein, the natural language deep learning system is under stochastic gradient in batches
Learnt in drop learning framework, the sample is to selected from each batch.
4. system according to claim 1, it is additionally included in the natural language deep learning system and is used for treating for study
Mapping word has no the initialization unit initialized during word to the word to be mapped for what is be not present in existing word insertion dictionary, its
In, the initialization unit is configured as:
If finding the word to be mapped in stereotype dictionary, wait to reflect to described using corresponding vector in the stereotype dictionary
Penetrate word to be initialized, otherwise, if the word to be mapped is found in the stem dictionary, using in the stem dictionary
Corresponding vector initializes to the word to be mapped,
Wherein, the stereotype dictionary be used to storing the words of multiple words with identical original shape in the existing word insertion dictionary to
The centroid vector of amount and corresponding original shape, the stem dictionary, which is used to store in the existing word insertion dictionary, has same words
The centroid vector of the term vector of dry multiple words and corresponding stem.
5. system according to claim 1, in addition to matching unit, the matching unit is configured to:To passing through difference
Different two segmentation sequence for the sentence that participle technique obtains, enters Mobile state based on the similarity between each of which word
Planning matching, so as to carry out word Fusion Features.
6. system according to claim 5, wherein, the similarity is by calculating the levenshtein between two words
Distance obtains.
7. system according to claim 5, wherein, the matching unit is configured to:
Described two segmentation sequences are respectively set as segmentation sequence to be matched and target segmentation sequence;
With based on each word in each word in the segmentation sequence to be matched and the target segmentation sequence between any two
Similarity usually builds matrix as member;And
The News Search path in the matrix, path of the length equal to the length of the sequence to be matched is found, and
The similarity sum of all elements on the path is maximum.
8. system according to claim 7, wherein, preferentially to the segmentation sequence to be matched in News Search path
Direction searching route.
9. system according to claim 7, wherein, the matching unit is configured to:
In the length of i index values and j index values, each self-corresponding word based on currentElement and the target segmentation sequence
The number of word comes limit search section;
Seek the average and standard deviation of all elements of the matrix in the region of search;And
Using it is in the region of search, more than the average and standard deviation sum element as candidate's element to be matched.
10. a kind of natural language deep learning method, including:
When being trained to the natural language deep learning system, sample is calculated according to based on the loss function of sample pair
Error amount, wherein, the loss function is the combination of similarity loss function and classification loss function,
Wherein, the similarity loss function is defined based on following criterion:When the true classification of the sample pair is identical, its
Difference between class prediction vector value should be smaller, and when the true classification difference of the sample pair, its class prediction to
Difference between value should be larger,
The classification loss function is defined based on the classification error of the sample pair.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341719.0A CN107402914B (en) | 2016-05-20 | 2016-05-20 | Deep learning system and method for natural language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610341719.0A CN107402914B (en) | 2016-05-20 | 2016-05-20 | Deep learning system and method for natural language |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107402914A true CN107402914A (en) | 2017-11-28 |
CN107402914B CN107402914B (en) | 2020-12-15 |
Family
ID=60389365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610341719.0A Active CN107402914B (en) | 2016-05-20 | 2016-05-20 | Deep learning system and method for natural language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107402914B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750987A (en) * | 2019-10-28 | 2020-02-04 | 腾讯科技(深圳)有限公司 | Text processing method, device and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110219012A1 (en) * | 2010-03-02 | 2011-09-08 | Yih Wen-Tau | Learning Element Weighting for Similarity Measures |
CN102713945A (en) * | 2010-01-14 | 2012-10-03 | 日本电气株式会社 | Pattern recognition device, pattern recognition method and pattern recognition-use program |
CN103699529A (en) * | 2013-12-31 | 2014-04-02 | 哈尔滨理工大学 | Method and device for fusing machine translation systems by aid of word sense disambiguation |
CN104391902A (en) * | 2014-11-12 | 2015-03-04 | 清华大学 | Maximum entropy topic model-based online document classification method and device |
CN104850539A (en) * | 2015-05-28 | 2015-08-19 | 宁波薄言信息技术有限公司 | Natural language understanding method and travel question-answering system based on same |
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
US20160019458A1 (en) * | 2014-07-16 | 2016-01-21 | Deep Learning Analytics, LLC | Systems and methods for recognizing objects in radar imagery |
CN105427869A (en) * | 2015-11-02 | 2016-03-23 | 北京大学 | Session emotion autoanalysis method based on depth learning |
-
2016
- 2016-05-20 CN CN201610341719.0A patent/CN107402914B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102713945A (en) * | 2010-01-14 | 2012-10-03 | 日本电气株式会社 | Pattern recognition device, pattern recognition method and pattern recognition-use program |
US20110219012A1 (en) * | 2010-03-02 | 2011-09-08 | Yih Wen-Tau | Learning Element Weighting for Similarity Measures |
CN103699529A (en) * | 2013-12-31 | 2014-04-02 | 哈尔滨理工大学 | Method and device for fusing machine translation systems by aid of word sense disambiguation |
US20160019458A1 (en) * | 2014-07-16 | 2016-01-21 | Deep Learning Analytics, LLC | Systems and methods for recognizing objects in radar imagery |
CN104391902A (en) * | 2014-11-12 | 2015-03-04 | 清华大学 | Maximum entropy topic model-based online document classification method and device |
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN104850539A (en) * | 2015-05-28 | 2015-08-19 | 宁波薄言信息技术有限公司 | Natural language understanding method and travel question-answering system based on same |
CN105427869A (en) * | 2015-11-02 | 2016-03-23 | 北京大学 | Session emotion autoanalysis method based on depth learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750987A (en) * | 2019-10-28 | 2020-02-04 | 腾讯科技(深圳)有限公司 | Text processing method, device and storage medium |
CN110750987B (en) * | 2019-10-28 | 2021-02-05 | 腾讯科技(深圳)有限公司 | Text processing method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107402914B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112214995B (en) | Hierarchical multitasking term embedded learning for synonym prediction | |
CN111444320B (en) | Text retrieval method and device, computer equipment and storage medium | |
CN112100529B (en) | Search content ordering method and device, storage medium and electronic equipment | |
CN110309514A (en) | A kind of method for recognizing semantics and device | |
CN111738007B (en) | Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network | |
CN109145087B (en) | Learner recommendation and cooperation prediction method based on expression learning and competition theory | |
CN108897989A (en) | A kind of biological event abstracting method based on candidate events element attention mechanism | |
Liu et al. | FastTagRec: fast tag recommendation for software information sites | |
CN107305543B (en) | Method and device for classifying semantic relation of entity words | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN110633667B (en) | Action prediction method based on multitask random forest | |
CN113010683B (en) | Entity relationship identification method and system based on improved graph attention network | |
CN110795934B (en) | Sentence analysis model training method and device and sentence analysis method and device | |
CN111985612A (en) | Encoder network model design method for improving video text description accuracy | |
CN110569355B (en) | Viewpoint target extraction and target emotion classification combined method and system based on word blocks | |
CN112214595A (en) | Category determination method, device, equipment and medium | |
Tong et al. | One" Ruler" for all languages: multi-lingual dialogue evaluation with adversarial multi-task learning | |
CN107402914A (en) | Natural language deep learning system and method | |
CN111611395B (en) | Entity relationship identification method and device | |
WO2020012975A1 (en) | Conversion device, learning device, conversion method, learning method, and program | |
CN116644751A (en) | Cross-domain named entity identification method, equipment, storage medium and product based on span comparison learning | |
Mathai et al. | Adversarial black-box attacks on text classifiers using multi-objective genetic optimization guided by deep networks | |
CN116341515A (en) | Sentence representation method of dynamic course facing contrast learning | |
CN116302953A (en) | Software defect positioning method based on enhanced embedded vector semantic representation | |
CN113590745B (en) | Interpretable text inference method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |