CN107402914A

CN107402914A - Natural language deep learning system and method

Info

Publication number: CN107402914A
Application number: CN201610341719.0A
Authority: CN
Inventors: 杨铭; 张姝; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-05-20
Filing date: 2016-05-20
Publication date: 2017-11-28
Anticipated expiration: 2036-05-20
Also published as: CN107402914B

Abstract

The present invention relates to natural language deep learning system and method.The system includes：Error calculation unit, it is configured to when being trained to natural language deep learning system, the error amount of sample is calculated according to based on the loss function of sample pair, loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss function is defined based on following criterion：When the true classification of sample pair is identical, difference between its class prediction vector value should be smaller, and when the true classification difference of sample pair, the difference between its class prediction vector value should be larger, and classification error of the classification loss function based on sample pair defines.Within the system, the cost to loss study based on sample is reduced to allowable loss function based on sample.

Description

Natural language deep learning system and method

Technical field

The present invention relates to field of information processing, relates more specifically to a kind of natural language deep learning system and method.

Background technology

The combination of deep learning and natural language processing technique is study hotspot in recent years.In existing deep learning model It is the common form of natural language processing deep learning framework using word as essential characteristic unit in system.Shown by research, from Right Language Processing feature can effectively lift the learning performance of various tasks, so researcher would generally use a variety of different words Feature more can learning study to introduce.But following two aspect in practical operation can be present：

1st, the natural language instrument for generating word feature all relies on participle technique, and different participle techniques causes identical nature Language text can produce different word sequences, so that word feature has differences.The problem of so bringing is that multi-source word feature is deposited In fusion error.

2nd, word insertion is an important step of the deep learning field in natural language processing field.Its main function is Word is mapped to a word and represents vector.Under normal circumstances, a good word represents that vector should make in semantically similar word Distance is near in vector space；Conversely, distance is more remote.Due to being represented by machine learning from random vector to good vector representation Generally require substantial amounts of language material.Therefore, in the task of sample size deficiency, the vocabulary often trained with forefathers is shown as The initial value of word insertion.Thus unavoidably occur and have no word phenomenon.Although the insertion of some words trained will have no word All it is initialised to an identical vector, but different has no that word to the close cost of similar import vocabulary is clearly different.Letter It is single to have no that word is initialised to an identical vector and neutral net local convergence be caused excessively slow by all.

In addition to the problem of handling word characteristic aspect, lack of training samples is also that deep learning is combined with natural language processing A big obstacle.In the case of without adequate sample, how to describe error by using more preferable loss function turns into the heat of research Door problem.

Accordingly, it is desirable to be able to provide a kind of natural language deep learning system and method that can solve the problem that above mentioned problem.

The content of the invention

The brief overview on the present invention is given below, to provide the basic reason on certain aspects of the invention Solution.It should be appreciated that this general introduction is not the exhaustive general introduction on the present invention.It is not intended to determine the key of the present invention Or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides some concepts in simplified form, with This is as the preamble in greater detail discussed later.

A primary object of the present invention is, there is provided a kind of natural language deep learning system, including：Error calculation Unit, it is configured to when being trained to natural language deep learning system, according to based on the loss function of sample pair come The error amount of sample is calculated, loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss Function is defined based on following criterion：When the true classification of sample pair is identical, the difference between its class prediction vector value should When smaller, and when the true classification difference of sample pair, the difference between its class prediction vector value should be larger, classification loss Classification error of the function based on sample pair defines.

According to an aspect of the present invention, there is provided a kind of natural language deep learning method, including：Deep to natural language Degree learning system is when being trained, and according to the error amount of sample is calculated based on the loss function of sample pair, loss function is phase Like the combination of degree loss function and classification loss function, wherein, similarity loss function is defined based on following criterion：Work as sample To true classification it is identical when, the difference between its class prediction vector value should be smaller, and when sample pair true classification not Meanwhile the difference between its class prediction vector value should be larger, classification error of the classification loss function based on sample pair is determined Justice.

In addition, embodiments of the invention additionally provide the computer program for realizing the above method.

In addition, embodiments of the invention additionally provide the computer program product of at least computer-readable medium form, its Upper record has the computer program code for realizing the above method.

By excellent below in conjunction with detailed description of the accompanying drawing to highly preferred embodiment of the present invention, these and other of the invention Point will be apparent from.

Brief description of the drawings

Below with reference to the accompanying drawings illustrate embodiments of the invention, the above of the invention and its can be more readily understood that Its objects, features and advantages.Part in accompanying drawing is intended merely to show the principle of the present invention.In the accompanying drawings, identical or similar Technical characteristic or part will be represented using same or similar reference.

Fig. 1, which is shown, realizes the exemplary of natural language deep learning system 100 according to an embodiment of the invention The block diagram of configuration；

Fig. 2 is to show that the exemplary of natural language deep learning system 200 according to another embodiment of the invention is matched somebody with somebody The block diagram put；

Fig. 3 is to show that the exemplary of natural language deep learning system 300 according to still another embodiment of the invention is matched somebody with somebody The block diagram put；

Fig. 4 shows the example process of natural language deep learning method 400 according to an embodiment of the invention Flow chart；

Fig. 5 is the exemplary mistake for showing natural language deep learning method 500 according to another embodiment of the invention The flow chart of journey；

Fig. 6 is the exemplary mistake for showing natural language deep learning method 600 according to still another embodiment of the invention The flow chart of journey；And

Fig. 7 is to show to can be used for implement showing for the computing device of the natural language deep learning system and method for the present invention Example property structure chart.

Embodiment

The one exemplary embodiment of the present invention is described hereinafter in connection with accompanying drawing.For clarity and conciseness, All features of actual embodiment are not described in the description.It should be understood, however, that developing any this actual implementation It must be made during example much specific to the decision of embodiment, to realize the objectives of developer, for example, symbol Those restrictive conditions related to system and business are closed, and these restrictive conditions may have with the difference of embodiment Changed.In addition, it will also be appreciated that although development is likely to be extremely complex and time-consuming, to having benefited from the disclosure For those skilled in the art of content, this development is only routine task.

Herein, it is also necessary to which explanation is a bit, in order to avoid having obscured the present invention because of unnecessary details, in the accompanying drawings It illustrate only and according to the closely related device structure of the solution of the present invention and/or processing step, and eliminate and the present invention The little other details of relation.

The present invention proposes a kind of natural language deep learning system and method, can solve natural language processing and depth Practise the problems with during being combined：

1st, the design problem of the loss function in the case of sample deficiency

Accuracy is influenceed by sample size based on the loss function of sample and receives limitation, can be by further describing sample This relation between more accurately portrays loss.But simple sample may cause study renewal efficiency to loss again Decline the problem of.For this problem, the invention solves the loss function how designed based on sample pair and base is reduced Cost problem in sample to loss study.

2nd, the initialization matter of word is had no

By analysis it can be found that having no that word can be subdivided into following several situations：

A, by capital and small letter, symbol it is different introduce difference lead to not matching (such as：" worker. ", " worker " are embedding in word Have in entering, and " worker. " is to have no word)；

B, numeral can not match (such as：" 15 ", " 1802 ", " 1000 " have in word insertion, and " 199 " are to have no word)；

C, there are word similar in lemma (Chairman, Chairwoman or video, videotape, videomask etc. Deng).

It is possible thereby to consider to be embedded in initialize word with most similar vocabulary.

3rd, word sequence matching problem

By analysis, it can be found that the fusion error of multi-source word feature comes from what is segmented between different system or instrument Otherness.Therefore, the invention also provides a kind of multi-source word sequence matching algorithm to reduce the error of word Fusion Features.

Describe natural language deep learning system and method according to an embodiment of the invention in detail below in conjunction with the accompanying drawings.Under Description in text is carried out in the following order：

1. natural language deep learning system

2. natural language deep learning method

3. to implement the computing device of the system and method for the application

[1. natural language deep learning system]

Fig. 1, which is shown, realizes the exemplary of natural language deep learning system 100 according to an embodiment of the invention The block diagram of configuration.

As shown in figure 1, natural language deep learning system 100 includes error calculation unit 102.Error calculation unit 102 It is configured to when being trained to natural language deep learning system, sample is calculated according to based on the loss function of sample pair Error amount, loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss function is based on Following criterion defines：When the true classification of sample pair is identical, the difference between its class prediction vector value should be smaller, and When the true classification difference of sample pair, the difference between its class prediction vector value should be larger, and classification loss function is based on The classification error of sample pair defines.

Traditional loss function be using single sample class error as judgment criteria, can in the case of limited sample size The information of study is also limited, influences whether final learning performance.The problem of in order to overcome the loss function based on sample, propose A kind of loss function based on sample pair.

In one example, under mini-batch SGD (stochastic gradient descent in batches) learning framework, at each Under batch (batch), the similarity between neutral net output vector between sample and sample is calculated.Similarity loss function A pair_simi_cost example formula is as follows：

A classification loss function pair_label_cost example formula is as follows：

Loss function pair_cost can be defined as：Pair_cost=pair_simi_cost+pair_label_ cost。

Wherein, function abs represents to take absolute value, and argmax represents the maximum dimension index of orientation value, and sgn is symbol letter Number, i are the indexes of the first sample of sample centering, and j is the index of the second sample of sample centering, y_prediRepresent first sample i Class prediction vector value, y_predjRepresent the second sample j class prediction vector value, y_iFirst sample i true classification is represented, y_jRepresent the second sample j true classification.

In another example, similarity loss function pair_simi_cost can be defined as based on distance metric：

Classification loss function can use and identical function in above-mentioned example：

Loss function pair_cost can be then defined as：

pair_cost=λ₁*pair_simi_cost+λ₂*pair_label_cost。

That is loss function pair_cost is the linear weighted function of similarity loss function and classification loss function, λ₁And λ₂It is each From weight, λ₁+λ₂=1.

Loss function is defined as to the combination of similarity loss function and classification loss function, Ke Yi by above-mentioned criterion In the case of without adequate sample, error is described with more preferable loss function, and based on sample to loss can be reduced The cost of habit.

Fig. 2 is to show that the exemplary of natural language deep learning system 200 according to another embodiment of the invention is matched somebody with somebody The block diagram put.

As shown in Fig. 2 natural language deep learning system 200 includes error calculation unit 202 and initialization unit 204. Error calculation unit 202 in Fig. 2 is similar with the function of the error calculation unit 102 in Fig. 1, will not be described here.

Natural language deep learning system 200 shown in Fig. 2 in addition to including error calculation unit 102, in addition to just Beginningization unit 204.

Initialization unit 204 is suitable for natural language deep learning systematic procedure, when the word to be mapped for study Be embedded in for existing word be not present in dictionary have no the word to be mapped initialized during word.

Initialization unit 204 is configured as：If finding word to be mapped in stereotype dictionary, using right in stereotype dictionary The vector answered initializes to word to be mapped, otherwise, if finding word to be mapped in stem dictionary, uses stem dictionary In corresponding vector word to be mapped is initialized.

In one example, specifically, two new dictionary stem_dict and lemma_dict are initialized first.

Then, in third-party Word Embedding (word insertion) dictionary word perform extraction stem (stem) and Lemma (original shape) is operated.

Then, the term vector that will have identical stem (original shape) takes out, and asks for the barycenter of these term vectors.Respectively by word Stem_dict and lemma_dict is arrived in dry, original shape and the storage of corresponding centroid vector.

For the entry of numeric type, barycenter is asked for after being found out by regular expression.The barycenter is saved as NUM。

Upon initialization：

1. if when having current mapping word in original dictionary, carried out using vector corresponding to the entry in original dictionary initial Change, otherwise carry out step 2；

2. if current mapping word original shape can be found in stereotype dictionary, use current morphology pair in lemma_dict The vector answered is initialized, and otherwise carries out step 3；

3. if current mapping word stem can be found in stem dictionary, use current morphology pair in stem_dict The vector answered is initialized, and otherwise carries out step 4；

4. if currently mapping word is numeral, NUM is initialized as, otherwise carries out step 5；

5. entry, which is mapped in Word Embedding, has no term vector (if it is not, being mapped to one at random Vector).

By above-mentioned initialization process, can avoid having no that word is initialised to caused by an identical vector by all The problem of neutral net local convergence is excessively slow.

Fig. 3 is to show that the exemplary of natural language deep learning system 300 according to still another embodiment of the invention is matched somebody with somebody The block diagram put.

As shown in figure 3, natural language deep learning system 300 includes error calculation unit 302 and matching unit 306.Fig. 3 In error calculation unit 302 it is similar with the function of the error calculation unit 102 in Fig. 1, will not be described here.

Natural language deep learning system 300 shown in Fig. 3 in addition to including error calculation unit 102, in addition to With unit 306.

Matching unit 306 is configured to：Different two of a sentence to being obtained by different participle techniques segment Sequence, dynamic programming matching is carried out based on the similarity between each of which word, so as to carry out word Fusion Features.

The natural language instrument of generation word feature all relies on participle technique, and different participle techniques causes identical natural language Speech text can produce different word sequences, so that word feature has differences.The problem of so bringing is that multi-source word feature is present Merge error.The fusion error of multi-source word feature comes from the otherness segmented between different system or instrument.Pass through nature Matching unit in language deep learning system 300 can reduce the error of word Fusion Features.

In one example, specifically, it is assumed that have two segmentation sequences A and B, make A=[a₁,a₂,a₃,…a_m] to treat With segmentation sequence；B=[b₁,b₂,b₃,…b_n] it is target segmentation sequence.

First,.Below equation can be used to calculate each word a in A_iWith each word b in B_jBetween Levenshtein ratios：

Wherein len () represents the length of sequence；Editdistance represent class editing distance, i.e., insertion, deletion action away from From+1, replacement operation distance+2.

Thus, it is possible to obtain a m * n matrix S, matrix element is S (i, j), notes i, j index is since 0.

Thus will be asked based on the similarity between each word of two different segmentation sequences to carry out dynamic programming matching Topic is converted into：In matrix S, the path that a length is equal to sequence A to be matched is found so that the institute on path is a little (i, j) Levenshtein it is more maximum than sum.Following method can be taken in order to reduce the cost of route searching：

A, define a set walked to be used to record whether node was searched for, when initializing walked, by S The node (i, j) of (i, j)=0 is added in walked set.

B, a sequence path is defined to be used to record the sequence node currently passed by.It is initialized as sky.

C, a current search is defined to the maximum Levenshtein in path candidate than sum max_weight, initially Turn to 0.

D, the hunting zone cscope=of a next both candidate nodes is defined | len (a_j)-len(b_j) |+δ, wherein δ are Constant, value are an integers, and preferably value is 4 or bigger.The bigger volumes of searches of δ also can be more.

E, defined nucleotide sequence paths, for storing the path candidate found.Each element in paths is (path sequence Row, path Levenshtein is than sum).

Specific optimal word sequence matching algorithm can be described as follows：

The path that Levenshtein is more maximum than sum in paths is finally taken as coupling path.

In above-mentioned algorithm, the similarity between each word of different two segmentation sequences is by between two words of calculating Levenshtein distances obtain.

Based on above-mentioned algorithm, matching unit 306 is configured to：Two segmentation sequences are respectively set as to be matched Segmentation sequence and target segmentation sequence；With based on each word in each word in segmentation sequence to be matched and target segmentation sequence Similarity between any two usually builds matrix as member；And News Search path in a matrix, find a length and be equal to The path of the length of sequence to be matched, and the similarity sum of all elements on the path is maximum.

In one example, preferentially to the direction searching route of segmentation sequence to be matched in News Search path.

Based on above-mentioned algorithm, matching unit 306 is configured to：I index values and j indexes based on currentElement The number of value, the length of each self-corresponding word and the word in target segmentation sequence comes limit search section；Matrix is asked to search for The average and standard deviation of all elements in section；And by element in the region of search, more than average and standard deviation sum As candidate's element to be matched.

In above-mentioned algorithm, for currentElement (i, j), the region of search is：I+1 row jth is arranged to end and arranged, wherein, End=min (j+cscope- | i-j |, n) and, wherein cscope is the length and target point of i-th of word in segmentation sequence to be matched The absolute value of the difference of the length of j-th of word in word sequence adds constant predetermined amount, and n is of the word in target segmentation sequence Number.

The matching of polynary word sequence is realized by above-mentioned algorithm, it is possible to reduce the error of word Fusion Features.

Herein it should be noted that natural language deep learning system 100-300 and its component units shown in Fig. 1-3 What structure was merely exemplary, those skilled in the art can modify to the structured flowchart shown in Fig. 1-3 as needed.Example Such as, the structure of Fig. 2 and Fig. 3 natural language deep learning system can be combined, forming one includes error calculation unit, just The natural language deep learning system of beginningization unit and matching unit.

[2. natural language deep learning method]

Fig. 4 shows the example process of natural language deep learning method 400 according to an embodiment of the invention Flow chart.

As shown in figure 4, natural language deep learning method 400 includes error calculating step S402, in step S402, When being trained to natural language deep learning system, according to calculating the error amount of sample based on the loss function of sample pair, Loss function is the combination of similarity loss function and classification loss function, wherein, similarity loss function is based on following criterion To define：When the true classification of sample pair is identical, the difference between its class prediction vector value should be smaller, and works as sample pair True classification difference when, the difference between its class prediction vector value should be larger, and classification loss function is based on sample pair Classification error defines.

Specifically, loss function pair_cost can be：

Pair_cost=pair_simi_cost+pair_label_cost,

Wherein, similarity loss function pair_simi_cost is：

Classification loss function pair_label_cost is：

Wherein, i is the index of the first sample of sample centering, and j is the index of the second sample of sample centering, y_prediRepresent First sample i class prediction vector value, y_predjRepresent the second sample j class prediction vector value, y_iRepresent first sample i's True classification, y_jRepresent the second sample j true classification.

Loss function pair_cost can be then defined as：

pair_cost=λ₁*pair_simi_cost+λ₂*pair_label_cost。

Loss function pair_cost is defined as to the linear weighted function of similarity loss function and classification loss function, λ₁ And λ₂It is respective weight, λ₁+λ₂=1.

Wherein, natural language deep learning system is learnt in stochastic gradient descent learning framework in batches, by sample This is to selected from each batch.

Fig. 5 is the exemplary mistake for showing natural language deep learning method 500 according to another embodiment of the invention The flow chart of journey.

Step S502 in Fig. 5 is similar with the step S402 that reference picture 4 describes, and will not be repeated here.

Step S504 in Fig. 5 is that the word to be mapped for being used to learn in natural language deep learning system is that existing word is embedding Enter be not present in dictionary have no the step of being initialized during word to the word to be mapped.

In step S504, if finding word to be mapped in stereotype dictionary, corresponding vector in stereotype dictionary is used Word to be mapped is initialized, otherwise, if finding word to be mapped in stem dictionary, corresponding in stem dictionary Vector initializes to word to be mapped, wherein, stereotype dictionary, which is used to store in existing word insertion dictionary, has identical original shape Multiple words term vector centroid vector and corresponding original shape, stem dictionary be used to store having in existing word insertion dictionary The centroid vector of the term vector of multiple words of identical stem and corresponding stem.

Fig. 6 is the exemplary mistake for showing natural language deep learning method 600 according to still another embodiment of the invention The flow chart of journey.

Step S602 in Fig. 6 is similar with the step S402 that reference picture 4 describes, and will not be repeated here.

Step S606 in Fig. 6 is matching step, in S606, to a sentence being obtained by different participle techniques Two different segmentation sequences, dynamic programming matching is carried out based on the similarity between each of which word, so as to carry out word feature Fusion.

Wherein, similarity is obtained by calculating the levenshtein distances between two words.

Wherein, step S606 further comprises：Two segmentation sequences are respectively set as segmentation sequence to be matched and target Segmentation sequence；With based on each word in segmentation sequence to be matched and each word in target segmentation sequence between any two similar Degree usually builds matrix as member；And News Search path in a matrix, find the length that a length is equal to sequence to be matched The path of degree, and the similarity sum of all elements on the path is maximum.

Wherein, preferentially to the direction searching route of segmentation sequence to be matched in News Search path.

Wherein, step S606 further comprises：I index values and j index values, each self-corresponding word based on currentElement Length and the number of the word in target segmentation sequence come limit search section；Seek all elements of the matrix in the region of search Average and standard deviation；And using it is in the region of search, more than average and standard deviation sum element as candidate's element to be matched.

Wherein, it is for currentElement (i, j), the region of search：I+1 row jth is arranged to end and arranged, wherein, end=min (j+cscope- | i-j |, n), wherein cscope be segmentation sequence to be matched in i-th of word length and target segmentation sequence in The absolute value of difference of length of j-th of word add constant predetermined amount, n is the number of the word in target segmentation sequence.

Operation and details on natural language deep learning method 400-600 each step be referred to combine Fig. 1- The embodiment of the natural language deep learning system of the invention of 3 descriptions, is not detailed herein.

The present invention proposes a kind of natural language deep learning system and method, by the invention it is obtained that following advantage：

1st, the loss function how designed based on sample pair and the generation reduced based on sample to the study of loss are solved Valency problem.

2nd, solves the initialization matter to having no word in the task of sample size deficiency.

3rd, a kind of multi-source word sequence matching algorithm is proposed to reduce the error of word Fusion Features.

[3. to implement the computing device of the present processes and device]

The general principle of the present invention is described above in association with specific embodiment, however, it is desirable to, it is noted that to this area For those of ordinary skill, it is to be understood that the whole either any steps or part of methods and apparatus of the present invention, Ke Yi In any computing device (including processor, storage medium etc.) or the network of computing device, with hardware, firmware, software or Combinations thereof is realized that this is that those of ordinary skill in the art use them in the case where having read the explanation of the present invention Basic programming skill can be achieved with.

Therefore, the purpose of the present invention can also by run on any computing device a program or batch processing come Realize.The computing device can be known fexible unit.Therefore, the purpose of the present invention can also include only by offer The program product of the program code of methods described or device is realized to realize.That is, such program product is also formed The present invention, and the storage medium for being stored with such program product also forms the present invention.Obviously, the storage medium can be Any known storage medium or any storage medium developed in the future.

In the case where realizing embodiments of the invention by software and/or firmware, from storage medium or network to The computer of specialized hardware structure, for example, shown in Fig. 7 all-purpose computer 700 installation form the software program, the computer When being provided with various programs, various functions etc. are able to carry out.

In the figure 7, CPU (CPU) 701 is according to the program stored in read-only storage (ROM) 702 or from depositing The program that storage part 708 is loaded into random access memory (RAM) 703 performs various processing.In RAM 703, also according to need Store the data required when CPU 701 performs various processing etc..CPU 701, ROM 702 and RAM 703 are via bus 704 links each other.Input/output interface 705 also link to bus 704.

Components described below link is to input/output interface 705：Importation 706 (including keyboard, mouse etc.), output section Points 707 (including displays, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.), storage part 708 (including hard disks etc.), communications portion 709 (including NIC such as LAN card, modem etc.).Communications portion 709 Communication process is performed via network such as internet.As needed, driver 710 also can link to input/output interface 705. Detachable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in driver 710 as needed On so that the computer program read out is installed in storage part 708 as needed.

It is such as removable from network such as internet or storage medium in the case where realizing above-mentioned series of processes by software Unload the program that the installation of medium 711 forms software.

It will be understood by those of skill in the art that this storage medium be not limited to wherein having program stored therein shown in Fig. 7, Separately distribute with equipment to provide a user the detachable media 711 of program.The example of detachable media 711 includes disk (including floppy disk (registration mark)), CD (including compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (including mini-disk (MD) (registration mark)) and semiconductor memory.Or storage medium can be ROM 702, storage part Hard disk included in 708 etc., wherein computer program stored, and user is distributed to together with the equipment comprising them.

The present invention also proposes a kind of program product for the instruction code for being stored with machine-readable.Instruction code is read by machine When taking and performing, above-mentioned method according to embodiments of the present invention can perform.

Correspondingly, the storage medium of the program product for carrying the above-mentioned instruction code for being stored with machine-readable is also wrapped Include in disclosure of the invention.Storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..

It should be appreciated by those skilled in the art that this enumerated be it is exemplary, the invention is not limited in this.

In this manual, the statement such as " first ", " second " and " n-th " be in order to by described feature in word On distinguish, so that the present invention is explicitly described.Therefore, it should not serve to that there is any limited implication.

As an example, each step of the above method and all modules and/or unit of the said equipment can To be embodied as software, firmware, hardware or its combination, and as the part in relevant device.Each composition mould in said apparatus Workable specific means or mode are ability when block, unit are configured by way of software, firmware, hardware or its combination Known to field technique personnel, it will not be repeated here.

As an example, in the case where being realized by software or firmware, can from storage medium or network to Computer (such as all-purpose computer 700 shown in Fig. 7) installation of specialized hardware structure forms the program of the software, the computer When being provided with various programs, various functions etc. are able to carry out.

In the feature in the description of the specific embodiment of the invention, describing and/or showing for a kind of embodiment above It can be used in a manner of same or similar in one or more other embodiments, with the feature in other embodiment It is combined, or substitute the feature in other embodiment.

It should be emphasized that term "comprises/comprising" refers to the presence of feature, key element, step or component when being used herein, but simultaneously It is not excluded for the presence or additional of other one or more features, key element, step or component.

In addition, the method for the present invention be not limited to specifications described in time sequencing perform, can also according to it His time sequencing, concurrently or independently perform.Therefore, the execution sequence of the method described in this specification is not to this hair Bright technical scope is construed as limiting.

The present invention and its advantage it should be appreciated that without departing from the essence of the invention being defined by the claims appended hereto Various changes, replacement and conversion can be carried out in the case of god and scope.Moreover, the scope of the present invention is not limited only to specification institute The process of description, equipment, means, the specific embodiment of method and steps.One of ordinary skilled in the art is from the present invention's Disclosure will readily appreciate that, can be used according to the present invention perform the function essentially identical to corresponding embodiment in this or Obtain result, existing and in the future to be developed process, equipment, means, method or the step essentially identical with it.Cause This, appended claim includes such process, equipment, means, method or step in the range of being directed at them.

Explanation based on more than, it is known that open at least to disclose following technical scheme：

1st, a kind of natural language deep learning system, including：

Error calculation unit, it is configured to when being trained to the natural language deep learning system, according to base The error amount of sample is calculated in the loss function of sample pair, wherein, the loss function is similarity loss function and classification The combination of loss function,

Wherein, the similarity loss function is defined based on following criterion：When the true classification of the sample pair is identical When, the difference between its class prediction vector value should be smaller, and when the true classification difference of the sample pair, its classification is pre- Difference between direction finding value should be larger,

The classification loss function is defined based on the classification error of the sample pair.

2nd, the system according to note 1, the loss function pair_cost are：

Pair_cost=pair_simi_cost+pair_label_cost,

Wherein, similarity loss function pair_simi_cost is：

Classification loss function pair_label_cost is：

Wherein, i is the index of the first sample of the sample centering, and j is the index of the second sample of the sample centering, y_prediRepresent first sample i class prediction vector value, y_predjRepresent the second sample j class prediction vector value, y_iRepresent the One sample i true classification, y_jRepresent the second sample j true classification.

3rd, the system according to note 1, wherein, the natural language deep learning system is under stochastic gradient in batches Learnt in drop learning framework, the sample is to selected from each batch.

4th, the system according to note 1, it is additionally included in the natural language deep learning system and is used for treating for study Mapping word has no the initialization unit initialized during word to the word to be mapped for what is be not present in existing word insertion dictionary, its In, the initialization unit is configured as：

If finding the word to be mapped in stereotype dictionary, using corresponding vector in the stereotype dictionary to described Word to be mapped is initialized, and otherwise, if finding the word to be mapped in the stem dictionary, uses the stem word Corresponding vector initializes to the word to be mapped in allusion quotation,

Wherein, the stereotype dictionary is used to store multiple words with identical original shape in the existing word insertion dictionary The centroid vector of term vector and corresponding original shape, the stem dictionary, which is used to store in the existing word insertion dictionary, has phase Centroid vector and corresponding stem with the term vector of multiple words of stem.

5th, the system according to note 1, in addition to matching unit, the matching unit are configured to：To passing through difference Different two segmentation sequence for the sentence that participle technique obtains, enters Mobile state based on the similarity between each of which word Planning matching, so as to carry out word Fusion Features.

6th, the system according to note 5, wherein, the similarity is by calculating the levenshtein between two words Distance obtains.

7th, the system according to note 5, wherein, the matching unit is configured to：

Described two segmentation sequences are respectively set as segmentation sequence to be matched and target segmentation sequence；

With based on each word in each word in the segmentation sequence to be matched and the target segmentation sequence two-by-two it Between similarity as member usually build matrix；And

The News Search path in the matrix, path of the length equal to the length of the sequence to be matched is found, And the similarity sum of all elements on the path is maximum.

8th, the system according to note 7, wherein, preferentially to the segmentation sequence to be matched in News Search path Direction searching route.

9th, the system according to note 7, wherein, the matching unit is configured to：

The length of i index values and j index values, each self-corresponding word based on currentElement and the target segmentation sequence In the number of word come limit search section；

Seek the average and standard deviation of all elements of the matrix in the region of search；And

Using it is in the region of search, more than the average and standard deviation sum element as candidate's element to be matched.

10th, the system according to note 9, wherein, for currentElement (i, j), the region of search is：I+1 row Jth is arranged to end and arranged, wherein, end=min (j+cscope- | i-j |, n) and, wherein cscope is the segmentation sequence to be matched In the length of i-th word and the absolute value of difference of length of j-th of word in the target segmentation sequence add constant predetermined amount, N is the number of the word in the target segmentation sequence.

11st, a kind of natural language deep learning method, including：

When being trained to the natural language deep learning system, calculated according to based on the loss function of sample pair The error amount of sample, wherein, the loss function is the combination of similarity loss function and classification loss function,

12nd, the method according to note 11, the loss function pair_cost are：

Pair_cost=pair_simi_cost+pair_label_cost,

Wherein, similarity loss function pair_simi_cost is：

Classification loss function pair_label_cost is：

13rd, according to the method for claim 11, wherein, the natural language deep learning system is random in batches Gradient declines to be learnt in learning framework, and the sample is to selected from each batch.

14th, according to the method for claim 11, it is additionally included in the natural language deep learning system and is used to learn The word to be mapped practised has no the initialization initialized during word to the word to be mapped for what is be not present in existing word insertion dictionary Step, wherein, the initialization step includes：

15th, according to the method for claim 11, in addition to matching step, the matching step include：To by not Different two segmentation sequence of the sentence obtained with participle technique, enters action based on the similarity between each of which word State planning matching, so as to carry out word Fusion Features.

16th, according to the method for claim 15, wherein, the similarity is by between two words of calculating Levenshtein distances obtain.

17th, according to the method for claim 15, wherein, the matching step further comprises：

18th, the method according to claim 11, wherein, preferentially to the participle to be matched in News Search path The direction searching route of sequence.

19th, according to the method for claim 17, wherein, the matching step further comprises：

20th, according to the method for claim 19, wherein, for currentElement (i, j), the region of search is：I-th + 1 row jth is arranged to end and arranged, wherein, end=min (j+cscope- | i-j |, n) and, wherein cscope is the participle to be matched The absolute value of the difference of the length of i-th word and the length of j-th of word in the target segmentation sequence is along with predetermined in sequence Constant, n are the number of the word in the target segmentation sequence.

Claims

1. a kind of natural language deep learning system, including：

Error calculation unit, it is configured to when being trained to the natural language deep learning system, according to based on sample This to loss function calculate the error amount of sample, wherein, the loss function is that similarity loss function loses with classification The combination of function,

Wherein, the similarity loss function is defined based on following criterion：When the true classification of the sample pair is identical, its Difference between class prediction vector value should be smaller, and when the true classification difference of the sample pair, its class prediction to Difference between value should be larger,

2. system according to claim 1, the loss function pair_cost are：

Pair_cost=pair_simi_cost+pair_label_cost,

Wherein, similarity loss function pair_simi_cost is：

<mrow> <mi>p</mi> <mi>a</mi> <mi>i</mi> <mi>r</mi> <mo>_</mo> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mi>i</mi> <mo>_</mo> <mi>cos</mi> <mi>t</mi> <mo>=</mo> <mi>a</mi> <mi>b</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>a</mi> <mi>b</mi> <mi>s</mi> <mo>(</mo> <mfrac> <mrow> <msub> <mi>y</mi> <mrow> <msub> <mi>pred</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>&CenterDot;</mo> <msub> <mi>y</mi> <mrow> <msub> <mi>pred</mi> <mi>j</mi> </msub> </mrow> </msub> </mrow> <mrow> <mo>|</mo> <msub> <mi>y</mi> <mrow> <msub> <mi>pred</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <msub> <mi>y</mi> <mrow> <msub> <mi>pred</mi> <mi>j</mi> </msub> </mrow> </msub> <mo>|</mo> </mrow> </mfrac> <mo>)</mo> <mo>-</mo> <mi>sgn</mi> <mo>(</mo> <mrow> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>=</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

Classification loss function pair_label_cost is：

<mrow> <mi>p</mi> <mi>a</mi> <mi>i</mi> <mi>r</mi> <mo>_</mo> <mi>l</mi> <mi>a</mi> <mi>b</mi> <mi>e</mi> <mi>l</mi> <mo>_</mo> <mi>cos</mi> <mi>t</mi> <mo>=</mo> <mn>2</mn> <mo>-</mo> <mi>s</mi> <mi>g</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>argmax</mi> <mo>(</mo> <msub> <mi>y</mi> <mrow> <msub> <mi>pred</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>)</mo> <mo>=</mo> <mo>=</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>s</mi> <mi>g</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>argmax</mi> <mo>(</mo> <msub> <mi>y</mi> <mrow> <msub> <mi>pred</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>)</mo> <mo>=</mo> <mo>=</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, i is the index of the first sample of the sample centering, and j is the index of the second sample of the sample centering, y_predi Represent first sample i class prediction vector value, y_predjRepresent the second sample j class prediction vector value, y_iRepresent the first sample This i true classification, y_jRepresent the second sample j true classification.

3. system according to claim 1, wherein, the natural language deep learning system is under stochastic gradient in batches Learnt in drop learning framework, the sample is to selected from each batch.

4. system according to claim 1, it is additionally included in the natural language deep learning system and is used for treating for study Mapping word has no the initialization unit initialized during word to the word to be mapped for what is be not present in existing word insertion dictionary, its In, the initialization unit is configured as：

If finding the word to be mapped in stereotype dictionary, wait to reflect to described using corresponding vector in the stereotype dictionary Penetrate word to be initialized, otherwise, if the word to be mapped is found in the stem dictionary, using in the stem dictionary Corresponding vector initializes to the word to be mapped,

Wherein, the stereotype dictionary be used to storing the words of multiple words with identical original shape in the existing word insertion dictionary to The centroid vector of amount and corresponding original shape, the stem dictionary, which is used to store in the existing word insertion dictionary, has same words The centroid vector of the term vector of dry multiple words and corresponding stem.

5. system according to claim 1, in addition to matching unit, the matching unit is configured to：To passing through difference Different two segmentation sequence for the sentence that participle technique obtains, enters Mobile state based on the similarity between each of which word Planning matching, so as to carry out word Fusion Features.

6. system according to claim 5, wherein, the similarity is by calculating the levenshtein between two words Distance obtains.

7. system according to claim 5, wherein, the matching unit is configured to：

With based on each word in each word in the segmentation sequence to be matched and the target segmentation sequence between any two Similarity usually builds matrix as member；And

8. system according to claim 7, wherein, preferentially to the segmentation sequence to be matched in News Search path Direction searching route.

9. system according to claim 7, wherein, the matching unit is configured to：

In the length of i index values and j index values, each self-corresponding word based on currentElement and the target segmentation sequence The number of word comes limit search section；

10. a kind of natural language deep learning method, including：

When being trained to the natural language deep learning system, sample is calculated according to based on the loss function of sample pair Error amount, wherein, the loss function is the combination of similarity loss function and classification loss function,