CN109388793A

CN109388793A - Entity mask method, intension recognizing method and corresponding intrument, computer storage medium

Info

Publication number: CN109388793A
Application number: CN201710655187.2A
Authority: CN
Inventors: 胡于响
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2019-02-26
Anticipated expiration: 2037-08-03
Also published as: CN109388793B; WO2019024704A1

Abstract

The present invention provides a kind of entity mask method, intension recognizing method and corresponding intruments, computer storage medium.Wherein entity mask method includes: the first expression vector for obtaining at least partly word at least partly the attribute tags of word carry out Chinese word coding in sentence using knowledge mapping；Based on sentence structure at least partly word carries out Chinese word coding in the sentence, the second expression vector of at least partly word is obtained；First expression vector sum second is expressed vector to merge, obtains the entity annotation results to the sentence.Intension recognizing method includes: to be combined coding using attribute tags of the knowledge mapping at least partly word in sentence, obtains first vector of the sentence；The sentence is encoded based on sentence structure, obtains second vector of the sentence；First vector sum, second vector of the sentence is merged, the intention assessment result to the sentence is obtained.Mode provided by the invention can be improved the accuracy of entity mark and intention assessment.

Description

Entity mask method, intension recognizing method and corresponding intrument, computer storage medium

[technical field]

The present invention relates to computer application technology, in particular to a kind of entity mask method, intension recognizing method and Corresponding intrument, computer storage medium.

[background technique]

Natural language processing is important or even core a part of artificial intelligence, and the purpose is to understand in short to want table It mainly include two main tasks up to what: entity mark and intention assessment.Wherein entity mark is entity word in mark a word Attribute tags, it is intended that identification is identification in short wants that intention or purpose realized.For example, if there is so in short " which film Zhou Jielun drilled ", the task of entity mark are that entity word " Zhou Jielun " is labeled as Movie_actor label, Movie_actor refers to video display performer；And intention assessment is to identify that word is to obtain which film a performer drilled.

Current entity mark and intension recognizing method be all based only on sentence structure, this sentence structure that is based purely on Mode often will cause the problems such as intention assessment and low entity mark accuracy rate.

[summary of the invention]

In view of this, the present invention provides a kind of entity mask method, intension recognizing method and corresponding intrument, computers to deposit Storage media, in order to improve the accuracy rate of entity mark and intention assessment.

Specific technical solution is as follows:

The present invention provides a kind of entity mask methods, this method comprises:

Using knowledge mapping at least partly the attribute tags of word carry out Chinese word coding in sentence, at least partly word is obtained First expression vector；

Based on sentence structure at least partly word carries out Chinese word coding in the sentence, the second of at least partly word is obtained Express vector；

First expression vector sum second is expressed vector to merge, obtains the entity annotation results to the sentence.

The present invention also provides a kind of intension recognizing methods, this method comprises:

It is combined coding using attribute tags of the knowledge mapping at least partly word in sentence, obtains the sentence First vector；

The sentence is encoded based on sentence structure, obtains second vector of the sentence；

First vector sum, second vector of the sentence is merged, the intention assessment knot to the sentence is obtained Fruit.

The present invention provides a kind of entity annotation equipments, which is characterized in that the device includes:

First Chinese word coding unit, for using knowledge mapping in sentence at least partly word attribute tags carry out word volume Code obtains the first expression vector of at least partly word；

Second Chinese word coding unit, for, at least partly word carries out Chinese word coding in the sentence, being obtained based on sentence structure Second at least partly word expresses vector；

Vector Fusion unit is merged for the first expression vector sum second to be expressed vector, is obtained to the sentence Entity annotation results.

The present invention also provides a kind of intention assessment device, which includes:

First coding unit, for being combined using attribute tags of the knowledge mapping at least partly word in sentence Coding, obtains first vector of the sentence；

Second coding unit obtains the second of the sentence for encoding based on sentence structure to the sentence Sentence vector；

Vector Fusion unit is obtained for merging first vector sum, second vector of the sentence to institute State the intention assessment result of sentence.

The present invention provides a kind of equipment, including

Memory, including one or more program；

One or more processor is coupled to the memory, executes one or more of programs, on realizing State the operation executed in method.

The present invention also provides a kind of computer storage medium, the computer storage medium is encoded with computer journey Sequence, described program by one or more computers when being executed, so that one or more of computers execute in the above method The operation of execution.

As can be seen from the above technical solutions, knowledge mapping is introduced entity mark and intention assessment by the present invention, that is, is passed through Entity attributes information in knowledge mapping is merged with the mode based on sentence structure, Lai Jinhang entity mark and intention are known Not, the mode that the prior art that compares is based purely on sentence structure improves accuracy.

[Detailed description of the invention]

Fig. 1 is the method flow diagram of entity provided in an embodiment of the present invention mark；

Fig. 2 is the schematic diagram provided in an embodiment of the present invention that Chinese word coding is carried out using knowledge mapping；

Fig. 3 is the schematic diagram provided in an embodiment of the present invention that Chinese word coding is carried out based on sentence structure；

Fig. 4 is the signal that fusion knowledge mapping provided in an embodiment of the present invention and sentence structure mode carry out entity mark Figure；

Fig. 5 is the method flow diagram of intention assessment provided in an embodiment of the present invention；

Fig. 6 is the schematic diagram provided in an embodiment of the present invention that sentence coding is carried out using knowledge mapping；

Fig. 7 is the signal that fusion knowledge mapping provided in an embodiment of the present invention and sentence structure mode carry out intention assessment Figure；

Fig. 8 is the structure chart of entity annotation equipment provided in an embodiment of the present invention；

Fig. 9 is intention assessment structure drawing of device provided in an embodiment of the present invention；

Figure 10 is the structure chart of example apparatus provided in an embodiment of the present invention.

[specific embodiment]

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

Core of the invention thought is, knowledge mapping is introduced entity mark and intention assessment, i.e., will be in knowledge mapping Entity attributes information is merged with the mode based on sentence structure, Lai Jinhang entity mark and intention assessment, to improve Accuracy.Method and apparatus provided by the invention are described in detail respectively below with reference to embodiment.

Fig. 1 is the method flow diagram of entity provided in an embodiment of the present invention mark, and as shown in fig. 1, this method can wrap Include following steps:

In 101, knowledge mapping is pre-processed.

In knowledge mapping, the relationship that is stored between each entity and the corresponding attribute information of each entity, each entity.But know Knowing map is usually with field/category division, for example, entity " Zhou Jielun " is corresponding with attribute mark in music field/classification Label: " singer ", " composer " and " word author ", while in video display field/classification, there is also entity " Zhou Jielun ", correspond to There is attribute tags " performer ".Utilization to knowledge mapping for convenience in the embodiment of the present invention, can first to knowledge mapping into Row pretreatment.Specifically, it may comprise steps of:

S11, attribute tags of the entity each in knowledge mapping in each field are integrated first, it is corresponding obtains each entity Attribute tags.

Still by taking above-mentioned entity " Zhou Jielun " as an example, by it after the attribute tags in each field are integrated respectively, obtain The corresponding all properties label of entity " Zhou Jielun " are as follows: " singer ", " composer ", " word author ", " performer ".

S12, the corresponding attribute tags of each entity are stored in key assignments storage engines.

After obtaining the corresponding attribute tags of each entity, respectively using entity as key (key), by the corresponding attribute mark of entity Label are as value (value), then by each key assignments (key-value) to being stored in key assignments storage engines.

It should be noted that the purpose of above-mentioned pretreatment to knowledge mapping is to entity for subsequent convenience in knowledge mapping In attribute tags quickly searched, but be not the present invention it is had to carry out the step of.It is of course also possible to use other are right Knowledge mapping carries out pretreated mode.

In 102, Chinese word coding is carried out to the attribute tags of word each in sentence using knowledge mapping, obtains the of each word One expression vector.

In this step, the first expression vector of each word obtained using knowledge mapping, its purpose is to allow each word The first expression vector in include entity attributes information in knowledge mapping.Specifically, it can be realized by following steps:

S21, the entity and the corresponding attribute tags of the entity in knowledge mapping identification sentence are utilized.

Sentence can be matched using longest match principle in knowledge mapping, be identified in sentence in this step Entity.Specifically, each n-gram (n-gram) of sentence is obtained, in embodiments of the present invention, n-gram refers to continuous n The collocation that word is constituted, each value that wherein n is 1 or more.Each n-gram is matched with knowledge mapping respectively, which n- seen Gram is matched to entity in knowledge mapping, when there is the multiple n-gram to overlap to be all matched to entity, takes wherein length Longest n-gram is as the entity identified.

For example, respectively obtaining each n-gram includes: in " which film Zhou Jielun drilled " the words

1-gram: " week ", " outstanding person ", " human relations " ..., " shadow "；

2-gram: " Zhou Jie ", " Jie Lun ", " human relations are drilled " ..., " film "；

3-gram: " Zhou Jielun ", " Jie Lun is drilled ", " human relations were drilled " ..., " a little films "；

……

Wherein " Zhou Jie " can be matched to entity in knowledge mapping, and " Zhou Jielun " can be also matched in knowledge mapping Entity, and there is overlapping in the two, then taking length longest " Zhou Jielun " as the entity identified.

When determining the corresponding attribute tags of each entity, key assignments storage engines can be inquired, are searched using entity as key pairs The value answered.

S22, sentence is segmented using recognition result, attribute tags is marked to obtained each word.

When segmenting to sentence, the entity that will identify that is as independent word, then to the other content of sentence It is segmented.Which still by taking " which film Zhou Jielun drilled " as an example, after being segmented, obtain: " Zhou Jielun ", " drilling ", " mistake ", " A bit ", " film ".

After marking attribute tags to each word, " Zhou Jielun " mark " singer ", " composer ", " word author ", " performer ", by The entity of knowledge mapping is not it in " drilling ", " mistake ", " which ", " film ", therefore it is corresponding to mark " O " instruction Attribute tags.

It should be noted that being described by taking " each word " in sentence as an example in embodiments of the present invention, but also not It excludes to be handled using " at least partly word " in sentence.For example, in this step, after being segmented to sentence, to At least partly word mark attribute tags arrived, such as only mark attribute tags to the entity in instruction map.

S23, Chinese word coding is carried out to the attribute tags of each word, and coding result is carried out to the conversion of full articulamentum, obtained First expression vector of each word.

In this step, Chinese word coding is carried out to the attribute tags of each word, it is therefore an objective to turn the attribute tags set of each word It is changed to the coding that a string of computers can identify.The coding mode used in the present embodiment can include but is not limited to one-hot (solely heat) coding.

As shown in Fig. 2, respectively obtaining one after the corresponding attribute tags set of each word carries out one-hot coding respectively A coding result.The length of the coding result can be the total quantity of attribute tags, such as there are M attribute marks in knowledge mapping Label, then coding result is exactly M, each corresponds to an attribute tags.Everybody value is to be shown to be in coding result It is no that there are attribute tags corresponding to this.Such as " Zhou Jielun " is carried out in the result of Chinese word coding, having 4 is 1, shows " Zhou Jie There are this corresponding attribute tags in 4 positions for human relations ".

For the coding result of one-hot, the conversion of full articulamentum is carried out, it is therefore an objective to by the volume of the attribute tags of each word Code result maps on entity tag, which is exactly the label that entity mark is carried out to word in sentence.By connecting entirely After connecing layer conversion, the first expression vector of each word is obtained.

In embodiments of the present invention, above-mentioned full articulamentum can be trained in advance.Training process may include: preparatory Using the sentence for having been marked with entity tag as training sample, above-mentioned reality is carried out to the sentence in training sample using knowledge mapping After body identification, participle, the mark of attribute tags, one-hot coding, as the input of the full articulamentum, each word in the sentence The first expression vector that corresponding entity tag is constituted is that the target of the full articulamentum exports, and is trained to full articulamentum.Instruction The full articulamentum got is actually to carry out after one-hot coding coding result to the mapping of entity tag.

Continue as shown in Fig. 2, the corresponding one-hot coding result of each word respectively by full articulamentum conversion after, obtain First expression vector of each word, respectively indicates are as follows: T-dict1, T-dict2, T-dict3, T-dict4 and T-dict5.

In 103, based on sentence structure to word each in sentence carry out Chinese word coding, obtain the second of each word express to Amount.

In this step, can specifically following steps be executed:

S31, the term vector for determining each word in sentence.

In determining sentence when the term vector of each word, existing term vector Core Generator can be used, such as Word2vec etc. then can be for each word point using the word2vec based on word2vec is trained in advance based on semantic Not Sheng Cheng term vector, the corresponding term vector length of each word is identical.The mode of this determining term vector be it is semantic-based, can So that the distance between term vector embodies the correlation degree between phrase semantic, the higher word of correlation degree, right between semanteme The distance between term vector answered is smaller.Current existing technology can be used in view of semantic-based term vector method of determination, It is not described here in detail.

S32, the neural network for training term vector input in advance respectively obtain the second expression vector of each word.

By each term vector input neural network trained in advance, it is therefore an objective to be encoded to sentence according to word granularity.On Such as two-way RNN (Recognition with Recurrent Neural Network), unidirectional RNN, CNN (convolutional neural networks) etc. can be used by stating neural network.Its In preferably bidirectional RNN sentence is encoded because two-way RNN can be recycled.The basic thought of two-way RNN is to propose each Training sequence is forwardly and rearwardly two RNN respectively, and the two RNN are connected to an output layer.This structure can incite somebody to action The front and back contextual information that each in list entries is put is supplied to output layer.Specific to the present invention, when one participle of input Good sentence, it is assumed that include n word in the sentence, then after two-way RNN, it will have n output vector, each word is corresponding One vector.Due to the Memorability of RNN, i-th of vector contains the information of all words in front, therefore the output of the last one word Vector is also referred to as " sentence vector ", because theoretically it contains the information of all words in front.

Still by taking " which film Zhou Jielun drilled " as an example, as shown in figure 3, each word " Zhou Jielun " obtained after participle, After " drilling ", " mistake ", " which ", " film " determine corresponding term vector respectively, term vector is inputted into two-way RNN, so respectively Obtain the lyrics and second expression vector, be denoted as respectively: output1, output2, output3, output4, output5, often Second expression vector of a word all contains contextual information, i.e., divides the influence for considering sentence structure again, contain sentence Structural information.Wherein output5 contains the information of entire sentence, can be referred to as sentence vector.

It should be noted that knowledge based map and the processing carried out based on sentence structure can in above-mentioned steps 102 and 103 Successively to execute in any order, also may be performed simultaneously.It is sequentially only one of side of execution shown in the present embodiment Formula.

In 104, the first expression vector sum second is expressed vector to merge, obtains the entity mark knot to sentence Fruit.

Expressing the fusion of vector to the first expression vector sum second in this step is actually to obtain to knowledge based map To entity mark and the entity mark that is obtained based on sentence structure merged.Specifically, following steps can specifically be executed:

S41, the first of each word the expression vector sum second is expressed into vector respectively splice, obtain the third of each word Express vector.

In this step, two vectors can be spliced according to preset sequence, so that a longer vector is obtained, The vector is that third expresses vector.

It should be noted that may be used also other than the first expression vector sum second is expressed the mode that vector splices Other amalgamation modes such as to be overlapped using the first expression vector sum second is expressed vector, but since the mode of splicing can The influence of knowledge based map and the influence based on sentence structure are separately considered, thus in the conversion process of subsequent full articulamentum The mode that different parameters is respectively adopted, therefore preferably splices.

S42, the result vector that the third expression vector of each word is converted to each word by full articulamentum.

The full articulamentum of the third expression vector input training in advance of each word is converted, so that each third be expressed DUAL PROBLEMS OF VECTOR MAPPING obtains result vector after conversion to entity tag.Wherein the length of result vector is correspondent entity label Total quantity, every corresponding each entity tag of result vector, everybody value correspond to the score of each entity tag.

In embodiments of the present invention, above-mentioned full articulamentum can be trained in advance.Training process may include: preparatory Using the sentence for having been marked with entity tag as training sample, the step in above-mentioned steps 102 and 103 is executed respectively, that is, is directed to The first expression vector sum second that sentence in training sample respectively obtains each word expresses vector, then expresses vector for first Input with the second expression spliced result of vector (i.e. third expression vector) as the full articulamentum, the entity mark of the sentence It signs and is trained as the output of full articulamentum.The full articulamentum that training obtains is expressed to carry out the third of each word in sentence Vector to entity tag mapping.

S43, according to each word result vector to sentence carry out entity mark.

Each word is corresponding with a result vector, can select according to the score of entity tag each in result vector The entity tag of highest scoring carries out entity mark to each word in sentence.

Still by taking " which film Zhou Jielun drilled " as an example, as shown in figure 4, expressing the second table of vector sum for the first of each word Spliced respectively up to vector, obtains third expression vector.In Fig. 4, the first expression vector T-dict1 and second of " Zhou Jielun " After expression vector Output1 is spliced, third expression vector K1 is obtained, other words are similar.Then by the third of each word Expression vector K1, K2 ... K5 inputs full articulamentum respectively, respectively obtains the result vector of each word.Word " Zhou Jielun " is corresponding Result vector in entity tag " Actor_name (actor names) " highest scoring, can using " Actor_name " to word Language " Zhou Jielun " is labeled, and in the corresponding result vector of other words, the entity tag of highest scoring is " O ", and instruction is not Entity, therefore other words are labeled using entity tag " O ".

Fig. 5 is the method flow diagram of intention assessment provided in an embodiment of the present invention, as shown in figure 5, this method may include Following steps:

In 501, it is combined coding using attribute tags of the knowledge mapping to word each in sentence, obtains the of sentence One vector.

Similarly with entity mark, knowledge mapping can be pre-processed first before this step, pretreated mistake Cheng Buzai is described in detail, and may refer in Fig. 1 101 associated description.

In this step, first vector of sentence is obtained using knowledge mapping, in order to allow in first vector and wrap Containing entity attributes information in knowledge mapping.Specifically, it can be realized by following steps:

S51, the entity and the corresponding attribute tags of the entity in knowledge mapping identification sentence are utilized.

In the embodiment shown in Figure 1 of realization in detail of this step, the step S21 in 102, details are not described herein.

S52, the attribute tags of each word are combined with coding, and coding result is carried out to the conversion of full articulamentum, obtained To first vector of sentence.

It is unified that coding is combined to the attribute tags of each word in obtaining sentence after the attribute tags of each word, it obtains To a coding result.The coding result is a vector, and the length of the vector corresponds to the total quantity of attribute tags, each is right An attribute tags are answered, everybody value is weight of the attribute tags in sentence.

It wherein, can be according to time that attribute tags in the sentence occur when determining weight of the attribute tags in sentence It counts and is determined with the corresponding identical entity attributes number of labels of the attribute tags.Specifically, attribute tags label_iWeightIt can be determined using following formula:

Wherein, m indicates that m-th of word in sentence, M indicate the number of word in sentence.a_imIndicate label label_iIt is right In the value of m-th of word, if label_iIt is not the attribute tags of m word, then a_imValue take 0, if label_iIt is m-th of word The attribute tags of language, then a_imValue beWherein count (label_m) be m-th of word attribute tags Number.

Still by taking sentence " which film Zhou Jielun drilled " as an example, " Zhou Jielun " corresponding all properties label are as follows: " singer ", Corresponding attribute tags are not present in " composer ", " word author ", " performer ", other words in knowledge mapping.So for For " singer " this attribute tags, the weight in sentence are as follows:So in coding result In, " singer " corresponding position value is 0.25.Similarly, " composer ", " word author ", " performer " respectively correspond in coding result Position value be 0.25, and in coding result the corresponding position value of other attribute tags be 0.

After obtaining above-mentioned coding result, coding result is converted by full articulamentum, it is therefore an objective to by sentence based on category The coding result of property label maps on entity tag.The entity tag is exactly the mark that entity mark is carried out to word in sentence Label.After the conversion of full articulamentum, first vector of sentence is obtained.The length correspondent entity label of first vector it is total Quantity, every value of first vector are weight of the corresponding entity tag of this in sentence.

In embodiments of the present invention, above-mentioned full articulamentum can be trained in advance.Training process may include: preparatory Using the sentence for having been marked with entity tag as training sample, above-mentioned reality is carried out to the sentence in training sample using knowledge mapping After body identification, participle, the mark of attribute tags, assembly coding, input of the obtained coding result as the full articulamentum, the sentence First vector that the corresponding entity tag of each word is constituted in son is exported as the target of the full articulamentum, to full articulamentum into Row training.The obtained full articulamentum of training is actually to be combined after coding coding result to the mapping of entity tag.

Process in this step can be as shown in fig. 6, the attribute tags of each word pass through in " which film Zhou Jielun drills " After crossing assembly coding, obtained coding result finally obtains first vector, is expressed as S-dict by full articulamentum.

In 502, sentence is encoded based on sentence structure, obtains second vector of sentence.

In this step, can specifically following steps be executed:

S61, the term vector for determining each word in sentence.

S62, the neural network for training the term vector input of each word in advance, obtain second vector of sentence.

Specifically, by after the term vector input of each word neural network trained in advance, the second of each word is respectively obtained Vector is expressed, using the second expression vector of the last one word as second vector of sentence.

The term vector of each word in above-mentioned determining sentence, and the nerve net that the term vector input of each word is trained in advance The treatment process of network is corresponding in step 103 in embodiment illustrated in fig. 1 to realize that unanimously details are not described herein.Only obtaining After second expression vector of each word, using the second expression vector of the last one word as second vector of sentence, and its He in sentence intention assessment and is not used the second expression vector of word.That is, the output5 used in Fig. 3 makees For second vector of the sentence.

In 503, first vector sum, second vector of sentence is merged, the intention assessment knot to sentence is obtained Fruit.

It is actually to be obtained to knowledge based map to the fusion of first vector sum, second vector in this step Intent information and the intent information obtained based on sentence structure are merged.The wherein entity annotation results pair of knowledge based map There is intention assessment very big influence to be correctly labeled as " Zhou Jielun " still by taking " which film Zhou Jielun drilled " as an example " performer " is very big on correctly intention assessment result " which film a performer drilled " influence, if by entity " Zhou Jielun " mistake Ground is labeled as " singer ", then is just likely to be unable to get above-mentioned intention assessment result.

Specifically, this step may comprise steps of:

S71, first vector sum, second vector is spliced, obtains third sentence vector.

In this step, two vectors can be spliced according to preset sequence, so that a longer vector is obtained, The vector is third sentence vector.

It should be noted that can also be adopted other than the mode for splicing first vector sum, second vector With first vector sum, second vector other amalgamation modes such as to be overlapped to.But consider since the mode of splicing can separate The influence of knowledge based map and influence based on sentence structure, to be respectively adopted in the conversion process of subsequent full articulamentum Different parameters, therefore the mode preferably spliced.

S72, third sentence vector is converted into result vector by full articulamentum.

The full articulamentum of third sentence vector input training in advance is converted, thus by third sentence DUAL PROBLEMS OF VECTOR MAPPING to sentence It is intended to, obtains result vector after conversion.The length of the result vector corresponds to the categorical measure of sentence intention, result vector The score that everybody corresponding all kinds of sentences are intended to.

In embodiments of the present invention, above-mentioned full articulamentum can be trained in advance.Training process may include: preparatory It will have determined that the sentence that sentence is intended to executes the step in above-mentioned steps 501 and 502 as training sample respectively, and for instruction The sentence practiced in sample respectively obtains first vector sum, second vector, then by first vector sum, second vector splicing Input of the result (i.e. third sentence vector) as full articulamentum afterwards, the sentence of the sentence are intended to the output as full articulamentum, It is trained.The mapping that the full articulamentum that training obtains is intended to carry out the third sentence vector of sentence to sentence.

S73, determine that sentence is intended to according to result vector.

In this step, the score value of classification can be intended to according to sentence each in result vector to determine that sentence is intended to, such as The highest sentence of score value is intended to the intention as the sentence identified.

Still by taking " which film Zhou Jielun drilled " as an example, as shown in fig. 7, by first vector S-dict of sentence and second After sentence vector Output5 is spliced, third sentence vector K is obtained.Then third sentence vector K is inputted into full articulamentum, finally To a result vector.The sentence of highest score is intended in the result vector are as follows: " which film a performer drilled ".

It is the detailed description carried out to method provided by the present invention above, below with reference to embodiment to dress provided by the invention It sets and is described in detail.

Fig. 8 is the structure chart of entity annotation equipment provided in an embodiment of the present invention, as shown in figure 8, the apparatus may include: First Chinese word coding unit 10, the second Chinese word coding unit 20 and Vector Fusion unit 30 can further include map pretreatment Unit 40.Wherein the major function of each component units is as follows:

First Chinese word coding unit 10 is responsible for carrying out Chinese word coding using attribute tags of the knowledge mapping to word each in sentence, obtains First to each word expresses vector.

Specifically, the first Chinese word coding unit 10 may include: coupling subelement 11, participle subelement 12 and the first Chinese word coding Subelement 13.

Wherein, coupling subelement 11 is responsible for utilizing the entity and the corresponding attribute of the entity in knowledge mapping identification sentence Label.Specifically, coupling subelement 11 can be matched sentence using longest match principle in knowledge mapping, be identified Entity in sentence.For example, each n-gram of available sentence, each value that wherein n is 1 or more.By each n-gram respectively with Knowledge mapping is matched, and sees which n-gram is matched to entity in knowledge mapping, when there is the multiple n-gram to overlap When being all matched to entity, take wherein the longest n-gram of length as the entity identified.

Participle subelement 12 is responsible for segmenting sentence using the recognition result of coupling subelement 11, and each to what is obtained Word marks attribute tags.Subelement is segmented when segmenting to sentence, the entity that coupling subelement 11 can be identified As independent word.

First Chinese word coding subelement 13 is responsible for carrying out Chinese word coding to the attribute tags of each word, such as can be to each word Attribute tags carry out one-hot coding, and coding result is carried out to the conversion of full articulamentum, obtain the first of each word express to Amount.

Map pretreatment unit 40 is responsible for integrating attribute tags of the entity each in knowledge mapping in each field, obtains To the corresponding attribute tags set of each entity；The corresponding attribute tags set of each entity is stored in key assignments storage engines.Accordingly Ground, coupling subelement 11 can be matched sentence using Longest prefix match algorithm in key assignments storage engines.

Second Chinese word coding unit 20 is responsible for carrying out Chinese word coding to word each in sentence based on sentence structure, obtains each word Second expression vector.Specifically, the second Chinese word coding unit 20 can determine the term vector of each word in sentence first；Then by word Vector input neural network trained in advance respectively obtains the second expression vector of each word.

Second Chinese word coding unit 20 when the term vector of each word, can be generated in determining sentence using existing term vector Then tool, such as word2vec etc. can be directed to based on based on semantic training word2vec in advance using the word2vec Each word generates term vector respectively, and the corresponding term vector length of each word is identical.The mode of this determining term vector is based on language Justice, enable to the distance between term vector to embody the correlation degree between phrase semantic, correlation degree is higher between semanteme Word, the distance between corresponding term vector are smaller.

Above-mentioned neural network can use such as two-way RNN (Recognition with Recurrent Neural Network), unidirectional RNN, CNN (convolutional Neural net Network) etc..Wherein preferably bidirectional RNN.

Vector Fusion unit 30 is responsible for merging the first expression expression vector of vector sum second, obtains the reality to sentence Body annotation results.

Specifically, the first expression vector sum second of each word can be expressed vector and carried out respectively by Vector Fusion unit 30 Splicing obtains the third expression vector of each word；Then the third expression vector of each word is converted to respectively by full articulamentum The result vector of word, wherein the total quantity of the length correspondent entity label of result vector, every corresponding each reality of result vector Body label, everybody value embody the score of correspondent entity label；Finally sentence is carried out according to the result vector of each word real Body mark.

Wherein, Vector Fusion unit 30 is when the result vector according to each word carries out entity mark to sentence, Ke Yifen Entity mark is not carried out to each word in sentence according to the entity tag of highest scoring in the result vector of each word.

Fig. 9 is intention assessment structure drawing of device provided in an embodiment of the present invention, as shown in figure 9, the apparatus may include: the One coding unit, 50, second coding units 60 and Vector Fusion unit 70 can also include map pretreatment unit 80.Its In each component units major function it is as follows:

First coding unit 50 is responsible for being combined coding to the attribute tags of word each in sentence using knowledge mapping, Obtain first vector of sentence.

Wherein, first coding unit 50 can specifically include: coupling subelement 51, participle subelement 52 and assembly coding Subelement 53.

Wherein, coupling subelement 51 is responsible for utilizing the entity and the corresponding attribute of the entity in knowledge mapping identification sentence Label.Specifically, coupling subelement 51 can be matched sentence using Longest prefix match algorithm in knowledge mapping, be identified Entity in sentence.

Participle subelement 52 is responsible for segmenting sentence using recognition result, and marks attribute mark to obtained each word Label.Wherein, the entity identified coupling subelement 51 in participle is as independent word.

Assembly coding subelement 53 is responsible for being combined the attribute tags of each word coding, and coding result is carried out entirely The conversion of articulamentum, obtains first vector of sentence, the total quantity of the length correspondent entity label of first vector, and first Every value of vector is weight of the corresponding entity tag of this in sentence.

Map pretreatment unit 80 is responsible for integrating attribute tags of the entity each in knowledge mapping in each field, obtains To the corresponding attribute tags set of each entity；The corresponding attribute tags set of each entity is stored in key assignments storage engines.Accordingly Ground, above-mentioned coupling subelement 51 can be matched sentence using Longest prefix match algorithm in key assignments storage engines.

Second coding unit 60 is responsible for encoding sentence based on sentence structure, obtains second vector of sentence. Specifically, second coding unit 60 can determine the term vector of each word in sentence first；Then term vector is inputted preparatory Trained neural network obtains second vector of sentence.

Wherein in determining sentence when the term vector of each word, second coding unit 60 is using based on semantic training in advance Word2vec, generate term vector respectively for word each in sentence.

Second coding unit 60 obtains second vector of sentence in the neural network that term vector input is trained in advance When, the neural network that can specifically train term vector input in advance respectively obtains the second expression vector of each word；It will be last Second vector of the second expression vector of one word as sentence.

Vector Fusion unit 70 is responsible for merging first vector sum, second vector of sentence, obtains to sentence Intention assessment result.Specifically, first vector sum, second vector can be spliced, obtains third sentence vector；By Three vectors are converted to result vector by full articulamentum, and wherein the length of result vector corresponds to the categorical measure of sentence intention, Everybody corresponding all kinds of sentences of result vector are intended to, everybody value embodies the score that corresponding sentence is intended to；According to result vector Determine that sentence is intended to.

Wherein, Vector Fusion unit 70, can be by score in result vector when determining that sentence is intended to according to result vector Highest sentence is intended to be intended to as the sentence of sentence.

The method of above-mentioned entity mark and intention assessment can be applied to the several scenes based on natural language processing, herein For the example of an application scenarios:

In intelligent answer field, such as " Zhou Jielun was drilled intelligent answer class client input problem of the user on mobile phone Which film ", after above-mentioned entity mark and intention assessment, marking out entity " Zhou Jielun " is " Actor_name (performer's name Word) ", it is intended that be " which film a performer drilled ".So corresponding processing logic of the intention is looked into movie database Look for the corresponding movie name of entity that " Actor_name (actor names) " are labeled as in the sentence.Assuming that in movie database Find " Zhou Jielun " corresponding movie name are as follows: " ineffable secret ", " thorn mound ", " balcony love ", " Gold world First " ..., then intelligent answer class client directly can return to answer: " ineffable secret ", " thorn mound ", " day to user Platform love ", " Curse of the Golden Flower " ....

Figure 10 schematically illustrates example apparatus 1000 according to various embodiments.Equipment 1000 may include one or more A processor 1002, system control logic 1001 are coupled at least one processor 1002, nonvolatile memory (non- Volatile memory, NMV)/memory 1004 is coupled in system control logic 1001, and network interface 1006 is coupled in system Control logic 1001.

Processor 1002 may include one or more single core processors or multi-core processor.Processor 1002 may include any The combination of general service processor or application specific processor (such as image processor, application processor baseband processor).

System control logic 1001 in one embodiment, it may include any interface controller appropriate, to provide everywhere Any suitable interface of at least one of device 1002 is managed, and/or is provided any to being communicated with system control logic 1001 Any suitable interface of suitable equipment or component.

System control logic 1001 in one embodiment, it may include one or more Memory Controller Hub, to provide to being The interface of system memory 1003.Installed System Memory 1003 is used to load and storing data and/or instruction.For example, corresponding equipment 1000, In one embodiment, Installed System Memory 1003 may include any suitable volatile memory.

NVM/ memory 1004 may include the computer-readable medium of one or more tangible nonvolatiles, for storing number According to and/or instruction.For example, NVM/ memory 1004 may include any suitable non-volatile memory device, it is such as one or more Hard disk (hard disk device, HDD), one or more CDs (compact disk, CD), and/or one or more numbers Word universal disc (digital versatile disk, DVD).

NVM/ memory 1004 may include storage resource, which is physically that the system is installed or can be with A part of accessed equipment, but it is not necessarily a part of equipment.For example, NVM/ memory 1004 can be via network interface 1006 are accessed by network.

Installed System Memory 1003 and NVM/ memory 1004 can respectively include the copy of interim or lasting instruction 1010. Instruction 1010 may include that equipment 1000 is caused to realize that Fig. 1 or Fig. 5 is described when being executed by least one of processor 1002 One of method or combined instruction.In each embodiment, instruction 1010 or hardware, firmware and/or component software can additionally/can Alternatively it is placed in system control logic 1001, network interface 1006 and/or processor 1002.

Network interface 1006 may include a receiver to provide wireless interface and one or more nets for equipment 1000 Network and/or any suitable equipment are communicated.Network interface 1006 may include any suitable hardware and/or firmware.Network Interface 1006 may include mutiple antennas to provide MIMO wireless interface.In one embodiment, network interface 1006 It may include a network adapter, a wireless network adapter, a telephone modem and/or wireless modulation-demodulation Device.

In one embodiment, at least one of processor 1002 can be with one or more for system control logic The logic of a controller encapsulates together.In one embodiment, at least one of processor can be patrolled with for system control The logic for the one or more controllers collected is encapsulated together to form system in package.In one embodiment, in processor At least one can be integrated on the same die with the logic of one or more controllers for system control logic.One In a embodiment, at least one of processor can be with the logical set of one or more controllers for system control logic At on the same die to form System on Chip/SoC.

Equipment 1000 can further comprise input/output device 1005.Input/output device 1005 may include user interface It is intended to interact user with equipment 1000, it may include peripheral component interface is designed so that peripheral assembly can be with System interaction, and/or, it may include sensor, it is intended to determine environmental condition and/or the location information in relation to equipment 1000.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of entity mask method, which is characterized in that this method comprises:

Using knowledge mapping at least partly the attribute tags of word carry out Chinese word coding in sentence, the of at least partly word is obtained One expression vector；

Based on sentence structure at least partly word carries out Chinese word coding in the sentence, the second expression of at least partly word is obtained Vector；

2. the method according to claim 1, wherein it is described using knowledge mapping at least partly word in sentence Attribute tags carry out Chinese word coding include:

The entity and the corresponding attribute tags of the entity in the sentence are identified using knowledge mapping；

The sentence is segmented using the recognition result, and attribute tags are marked to obtained at least partly word；

Chinese word coding is carried out to the attribute tags of at least partly word, and coding result is carried out to the conversion of full articulamentum, obtain to First expression vector of small part word.

3. according to the method described in claim 2, it is characterized in that, identifying the entity packet in the sentence using knowledge mapping It includes:

The sentence is matched in knowledge mapping using longest match principle, identifies the entity in the sentence.

4. according to the method described in claim 3, it is characterized in that, this method further include: by entity each in knowledge mapping each Attribute tags in field are integrated, and the corresponding attribute tags set of each entity is obtained；By the corresponding attribute tags of each entity Set is stored in key assignments storage engines；

It is described using Longest prefix match algorithm the sentence is carried out in knowledge mapping matching include: it is described using longest matching calculate Method matches the sentence in the key assignments storage engines.

5. according to the method described in claim 2, it is characterized in that, carrying out participle packet to the sentence using the recognition result It includes:

Sentence is segmented, wherein the entity that will identify that is as independent word.

6. according to the method described in claim 2, it is characterized in that, the attribute tags at least partly word carry out word volume Code include:

Solely hot one-hot coding is carried out to the attribute tags of at least partly word.

7. the method according to claim 1, wherein it is described based on sentence structure in the sentence at least partly Word carries out Chinese word coding

Determine in the sentence at least partly term vector of word；

By term vector input neural network trained in advance, the second expression vector of at least partly word is respectively obtained.

8. the method according to the description of claim 7 is characterized in that determining in the sentence at least partly term vector packet of word It includes:

Using based on semantic word2vec trained in advance, term vector is generated respectively at least partly word in the sentence.

9. the method according to the description of claim 7 is characterized in that the neural network includes: bidirectional circulating neural network.

10. the method according to claim 1, wherein it is described by first expression vector sum second express vector into Row fusion, obtain include: to the entity annotation results of the sentence

The first of at least partly word expression vector sum second is expressed vector to splice respectively, obtains at least partly word Third expresses vector；

The third expression vector of at least partly word is converted to the result vector of at least partly word by full articulamentum, In, the total quantity of the length correspondent entity label of the result vector, every corresponding each entity tag of the result vector, respectively The value of position embodies the score of correspondent entity label；

Result vector according at least partly word carries out entity mark to the sentence.

11. according to the method described in claim 10, it is characterized in that, the result vector of the foundation at least partly word is to institute Stating sentence progress entity mark includes:

Respectively according to the entity tag of highest scoring in the result vector of at least partly word at least partly word in sentence Carry out entity mark.

12. a kind of intension recognizing method, which is characterized in that this method comprises:

It is combined coding using attribute tags of the knowledge mapping at least partly word in sentence, obtains the first of the sentence Sentence vector；

First vector sum, second vector of the sentence is merged, the intention assessment result to the sentence is obtained.

13. according to the method for claim 12, which is characterized in that it is described using knowledge mapping at least partly word in sentence The attribute tags of language are combined coding

Coding is combined to the attribute tags of at least partly word, and coding result is carried out to the conversion of full articulamentum, is obtained First vector of the sentence, the total quantity of the length correspondent entity label of first vector, first vector Every value be weight of the corresponding entity tag of this in the sentence.

14. according to the method for claim 13, which is characterized in that identify the entity packet in the sentence using knowledge mapping It includes:

The sentence is matched in knowledge mapping using Longest prefix match algorithm, identifies the entity in the sentence.

15. according to the method for claim 13, which is characterized in that this method further include: entity each in knowledge mapping exists Attribute tags in each field are integrated, and the corresponding attribute tags of each entity are obtained；The corresponding attribute tags of each entity are deposited It is stored in key assignments storage engines；

16. according to the method for claim 12, which is characterized in that described to be compiled based on sentence structure to the sentence Code, second vector for obtaining the sentence include:

Determine in the sentence at least partly term vector of word；

By term vector input neural network trained in advance, second vector of the sentence is obtained.

17. according to the method for claim 16, which is characterized in that determine in the sentence at least partly term vector of word Include:

18. according to the method for claim 16, which is characterized in that the neural network includes: bidirectional circulating neural network.

19. according to the method for claim 16, which is characterized in that by term vector input nerve net trained in advance Network, second vector for obtaining the sentence include:

By term vector input neural network trained in advance, the second expression vector of at least partly word is respectively obtained；

Using the second expression vector of the last one word as second vector of the sentence.

20. according to the method for claim 12, which is characterized in that by first vector sum, second vector of the sentence Merged, obtain include: to the intention assessment result of the sentence

Second vector described in first vector sum is spliced, third sentence vector is obtained；

The third sentence vector is converted into result vector by full articulamentum, wherein the length of the result vector corresponds to sentence Everybody corresponding all kinds of sentences of the categorical measure of intention, the result vector are intended to, everybody value embodies corresponding sentence and is intended to Score；

Determine that the sentence is intended to according to the result vector.

21. according to the method for claim 20, which is characterized in that determine that the sentence is intended to packet according to the result vector It includes:

The sentence of highest scoring in the result vector is intended to be intended to as the sentence of the sentence.

22. a kind of entity annotation equipment, which is characterized in that the device includes:

First Chinese word coding unit, for using knowledge mapping in sentence at least partly word attribute tags carry out Chinese word coding, Obtain the first expression vector of at least partly word；

Second Chinese word coding unit, for based on sentence structure in the sentence at least partly word carry out Chinese word coding, obtain to Second expression vector of small part word；

Vector Fusion unit merges for the first expression vector sum second to be expressed vector, obtains the reality to the sentence Body annotation results.

23. device according to claim 22, which is characterized in that the first Chinese word coding unit includes:

Coupling subelement, for identifying entity and the corresponding attribute tags of the entity in the sentence using knowledge mapping；

Subelement is segmented, for the recognition result using the coupling subelement sentence is segmented, and to obtaining At least partly word marks attribute tags；

First Chinese word coding subelement carries out Chinese word coding for the attribute tags at least partly word, and coding result is carried out The conversion of full articulamentum obtains the first expression vector of at least partly word.

24. device according to claim 23, which is characterized in that the coupling subelement is specifically used for:

25. device according to claim 24, which is characterized in that the device further include:

Map pretreatment unit obtains each for integrating attribute tags of the entity each in knowledge mapping in each field The corresponding attribute tags set of entity；The corresponding attribute tags set of each entity is stored in key assignments storage engines；

The coupling subelement is matched the sentence using Longest prefix match algorithm in the key assignments storage engines.

26. device according to claim 23, which is characterized in that the participle subelement is specifically used for: being carried out to sentence Participle, wherein the entity that the coupling subelement is identified is as independent word.

27. device according to claim 23, which is characterized in that the first Chinese word coding subelement is at least partly word It is specific to execute when attribute tags carry out Chinese word coding:

28. device according to claim 22, which is characterized in that the second Chinese word coding unit is specifically used for:

Determine in the sentence at least partly term vector of word；

29. device according to claim 28, which is characterized in that the second Chinese word coding unit is in determining the sentence At least partly the term vector of word when, it is specific to execute:

30. device according to claim 28, which is characterized in that the neural network includes: bidirectional circulating neural network.

31. device according to claim 22, which is characterized in that the Vector Fusion unit is specifically used for:

32. device according to claim 31, which is characterized in that the Vector Fusion unit is according at least partly word Result vector when carrying out entity mark to the sentence, it is specific to execute:

33. a kind of intention assessment device, which is characterized in that the device includes:

First coding unit, for being combined volume using attribute tags of the knowledge mapping at least partly word in sentence Code, obtains first vector of the sentence；

Second coding unit, for being encoded to the sentence based on sentence structure, obtain second of the sentence to Amount；

Vector Fusion unit is obtained for merging first vector sum, second vector of the sentence to the sentence The intention assessment result of son.

34. device according to claim 33, which is characterized in that first coding unit specifically includes:

Subelement is segmented, for segmenting using the recognition result to the sentence, and to obtained at least partly word Mark attribute tags；

Assembly coding subelement is combined coding for the attribute tags at least partly word, and coding result is carried out The conversion of full articulamentum, obtains first vector of the sentence, the length correspondent entity label of first vector it is total Quantity, every value of first vector are weight of the corresponding entity tag of this in the sentence.

35. device according to claim 34, which is characterized in that the coupling subelement is specifically used for utilizing knowledge graph Entity in sentence described in spectrum discrimination includes:

36. device according to claim 35, which is characterized in that the device further include:

37. device according to claim 33, which is characterized in that second coding unit is specifically used for:

Determine in the sentence at least partly term vector of word；

38. the device according to claim 37, which is characterized in that second coding unit is in determining the sentence At least partly the term vector of word when, it is specific to execute:

39. the device according to claim 38, which is characterized in that the neural network includes: bidirectional circulating neural network.

40. device according to claim 33, which is characterized in that second coding unit is specifically used for:

41. device according to claim 33, which is characterized in that the Vector Fusion unit is specifically used for:

Determine that the sentence is intended to according to the result vector.

42. device according to claim 41, which is characterized in that the Vector Fusion unit is according to the result vector It is specific to execute when determining that the sentence is intended to:

43. a kind of equipment, including

Memory, including one or more program；

One or more processor is coupled to the memory, executes one or more of programs, to realize such as right It is required that the operation executed in 1 to 11 any claim the method.

44. a kind of equipment, including

Memory, including one or more program；

One or more processor is coupled to the memory, executes one or more of programs, to realize such as right It is required that the operation executed in 12 to 21 any claim the methods.

45. a kind of computer storage medium, the computer storage medium is encoded with computer program, and described program is by one When a or multiple computers execute, so that one or more of computers are executed as described in any claim of claim 1 to 11 The operation executed in method.

46. a kind of computer storage medium, the computer storage medium is encoded with computer program, and described program is by one When a or multiple computers execute, so that one or more of computers are executed as described in any claim of claim 12 to 21 The operation executed in method.