CN108563790A

CN108563790A - A kind of semantic understanding method and device, equipment, computer-readable medium

Info

Publication number: CN108563790A
Application number: CN201810401066.XA
Authority: CN
Inventors: 王兴宝; 蒋亚冲; 雷琴辉; 黄倪莎
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2018-09-21
Anticipated expiration: 2038-04-28
Also published as: CN108563790B

Abstract

This application provides a kind of semantic understanding method and devices, equipment, according to preset domain classification, sentence to be understood is divided into the first generic clause, again according to preset intent classifier, each first generic clause is divided into the second generic clause, the finally extraction tank information from each second generic clause, wherein, each first generic clause belongs to a classification in domain classification, each second generic clause belongs to a classification in intent classifier, it can be seen that, semantic understanding method and device described herein, using different partition dimensions, in the way of dividing clause step by step, it can obtain the slot information under multi-field more intentions, therefore, the sentence more being intended to can be directed to, obtain the higher understanding results being intended to of accuracy more.

Description

A kind of semantic understanding method and device, equipment, computer-readable medium

Technical field

This application involves electronic information field more particularly to a kind of semantic understanding method and devices, equipment, computer-readable Medium.

Background technology

With the continuous breakthrough of artificial intelligence technology and becoming increasingly popular for various intelligent terminals, human-computer interaction is in people The frequency occurred in routine work and life is higher and higher.Interactive voice has become as most easily one of interactive mode The important means of human-computer interaction.Human-computer interaction conversational system is used in various intelligent terminals, such as：TV, mobile phone, vehicle Load, smart home and robot etc..And how to understand that being intended that for user is the most key in human-computer interaction conversational system Technology.

Since human language has diversity, complexity and ambiguousness, a variety of intentions can be often in short expressed, but work as Preceding human-computer interaction conversational system is merely able to provide a kind of semantic understanding as a result, (be intended to for the voice for being related to more being intended to more Can be understood as at least two events, such as " open air-conditioning, close radio "), it may appear that no result only provides one kind The problem of semantic understanding result, such as：" first open air-conditioning and be then shut off radio ", may be without output as a result, output result： " opening air-conditioning ", in another example：" navigate to WanDa Plaza and select a not stifled route " may export result：" navigation is not blocked up Route ".

As it can be seen that existing semantic understanding method, the understanding of the sentence for more being intended to are not accurate enough.

Invention content

This application provides a kind of semantic understanding method and device, equipment, computer-readable mediums, it is therefore intended that solution pair In the problem that the understanding for the sentence more being intended to is not accurate enough.

To achieve the goals above, this application provides following technical schemes：

A kind of semantic understanding method, including：

According to preset domain classification, sentence to be understood is divided into the first generic clause, wherein each first kind Clause belongs to a classification in the domain classification；

According to preset intent classifier, each first generic clause is divided into the second generic clause, wherein each described Second generic clause belongs to a classification in the intent classifier；

The extraction tank information from each second generic clause.

Optionally, described according to preset domain classification, sentence to be understood, which is divided into the first generic clause, includes：

Using each word in the sentence to be understood as input, using field discrimination model, obtain described waiting understanding Sentence in each word field label, the field label of any one word in the sentence to be understood includes： The title and first position information of domain classification belonging to the word, the first position information are that the word is meeting first condition All words in location information, the first condition is to belong to identical domain classification with the word in the sentence to be understood Word, the position includes starting, intermediate and terminate.

Optionally, the training process of the field discrimination model includes：

For each classification in the preset domain classification, the first preset quantity sentence is collected respectively, and right The sentence of collection marks the field label；

It collects the second preset quantity item and is not belonging to the sentence of any one of described preset domain classification classification, and mark Note is not belonging to the label of the preset domain classification；

Using the object function of sentence training the first sequence marking model after mark, the field discrimination model is obtained.

Optionally, described according to preset intent classifier, each first generic clause is divided into the second generic clause packet It includes：

Using each word with same area specific name in first generic clause as input, using with described the The corresponding intention discrimination module of domain classification belonging to one generic clause, obtains the intention mark of each word in first generic clause It signs, the intention labels of any one word in first generic clause include：The title of intent classifier belonging to the word with And second position information, the second position information is location information of the word in all words for meeting second condition, described Second condition is the word for belonging to identical domain classification and identical intent classifier in first generic clause with the word, described Position includes starting, centre and terminates, wherein each domain classification corresponds to an intention discrimination module, different field point The corresponding intention discrimination module of class is different.

Optionally, the corresponding training method for being intended to discrimination module of any one of domain classification includes：

The third preset quantity sentence for belonging to the domain classification is collected, and the intention is marked to every sentence of collection Label；

The sentence that the 4th preset quantity is not belonging to the classification of any one of preset intent classifier is collected, and marks and does not belong to In the label of the preset domain classification；

Using the object function of sentence training the second sequence marking model after mark, the corresponding institute of the domain classification is obtained State intention discrimination module.

Optionally, the extraction tank information from each second generic clause includes：

Using the intention vector of each word and second generic clause in the second generic clause as input, using with it is described The corresponding slot information discrimination model of domain classification belonging to second generic clause, obtains the slot of each word in second generic clause The slot information labels of information labels, any one word in second generic clause include：The title of slot information belonging to the word And the third place information, the third place information are location information of the word in all words for meeting third condition, institute It is to belong to identical domain classification, identical intent classifier in second generic clause with the word and have phase to state third condition The word of same slot information；

Wherein, each domain classification corresponds to a slot information discrimination module, the corresponding slot information of different domain classifications Discrimination module is different.

Optionally, the training method of the corresponding slot information discrimination module of any one of domain classification includes：

The 5th preset quantity article sentence for belonging to the domain classification is collected, and slot information labels are marked to the sentence of collection；

The sentence that the 6th preset quantity is not belonging to any one of preset slot information slot information is collected, and marks and does not belong to In the label of the preset slot information；

The object function for the marking model that sorted using the sentence training third after mark, obtains the corresponding institute of the domain classification State slot information discrimination module.

Optionally, further include after extraction tank information in each second generic clause described：

By the institute in the word segmentation result of the sentence to be understood, slot information candidate path and the sentence to be understood The intention vector of the second generic clause is stated as input, using preset model, is scored slot information candidate path, the slot It is combined and is formed by least one slot information extracted from each second generic clause in information candidate path；

According to the scoring, slot information extraction result is determined.

A kind of semantic understanding device, including：

First division module, for according to preset domain classification, sentence to be understood to be divided into the first generic clause, In, each first generic clause belongs to a classification in the domain classification；

Second division module, for according to preset intent classifier, each first generic clause to be divided into the second class Clause, wherein each second generic clause belongs to a classification in the intent classifier；

Extraction module, for the extraction tank information from each second generic clause.

Optionally, first division module is used for according to preset domain classification, and sentence to be understood is divided into the One generic clause includes：

First division module is specifically used for, and using each word in the sentence to be understood as input, uses neck Domain discrimination model obtains the field label of each word in the sentence to be understood, arbitrary in the sentence to be understood The field label of one word includes：The title and first position information of domain classification belonging to the word, described first Confidence breath is location information of the word in all words for meeting first condition, and the first condition is the sentence to be understood In belong to the word of identical domain classification with the word, the position includes starting, intermediate and terminate.

Optionally, first division module is additionally operable to, with the following method, the training field discrimination model：

Optionally, second division module is used for according to preset intent classifier, and each first generic clause is drawn Being divided into the second generic clause includes：

Second division module is specifically used for, by each in first generic clause with same area specific name Word as input, using intention discrimination module corresponding with the domain classification belonging to first generic clause, obtain described the The intention labels of the intention labels of each word in one generic clause, any one word in first generic clause include： The title and second position information of intent classifier belonging to the word, the second position information are that the word is meeting second condition All words in location information, the second condition be first generic clause in the word belong to identical domain classification with And the word of identical intent classifier, the position include starting, centre and terminate, wherein each domain classification corresponds to one It is intended to discrimination module, the corresponding intention discrimination module of different domain classifications is different.

Optionally, second division module is additionally operable to, in accordance with the following methods, any one of domain classification pair of training The intention discrimination module answered：

Optionally, the extraction module be used for from each second generic clause extraction tank information include：

The extraction module is specifically used for, by the intention of each word and second generic clause in the second generic clause to Amount is as input, using slot information discrimination model corresponding with the domain classification belonging to second generic clause, obtains described the The slot information labels of the slot information labels of each word in two generic clauses, any one word in second generic clause include： The title and the third place information of slot information belonging to the word, the third place information are that the word is meeting third condition Location information in all words, the third condition are to belong to identical domain classification, phase with the word in second generic clause The word of same intent classifier and slot information having the same；

Optionally, the extraction module is additionally operable to, and in accordance with the following methods, any one of domain classification of training is corresponding The slot information discrimination module：

Optionally, further include：

Screening module is used in the extraction module after extraction tank information in each second generic clause, by institute State second generic clause in word segmentation result, slot information candidate path and the sentence to be understood of sentence to be understood Intention vector as input, using preset model, score slot information candidate path, slot information candidate path It is combined and is formed by least one slot information extracted from each second generic clause；And according to the scoring, determine that slot is believed Breath extraction result.

A kind of semantic understanding equipment, including：

Memory and processor；

The memory is for storing one or more programs；

The processor is for executing one or more of programs, so that the semantic understanding equipment realizes the above institute State semantic understanding method.

A kind of computer-readable medium is stored with instruction in the computer readable storage medium, when its on computers When operation so that computer executes semantic understanding method described above.

Semantic understanding method and device described herein, equipment, according to preset domain classification, by sentence to be understood It is divided into the first generic clause, then according to preset intent classifier, each first generic clause is divided into the second generic clause, most The extraction tank information from each second generic clause afterwards, wherein each first generic clause belongs to a classification in domain classification, often A second generic clause belongs to a classification in intent classifier, it is seen then that semantic understanding method and device described herein uses Different partition dimensions can obtain the slot information under multi-field more intentions, therefore, energy in the way of dividing clause step by step It is enough to be directed to the sentence more being intended to, obtain the higher understanding results being intended to of accuracy more.

Description of the drawings

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.

Fig. 1 is a kind of flow chart of semantic understanding method disclosed in the embodiment of the present application；

Fig. 2 is the flow chart of the specific implementation process of semantic understanding method disclosed in the embodiment of the present application；

Fig. 3 is to choose a road in semantic understanding method disclosed in the embodiment of the present application from a plurality of slot information candidate path The flow chart of diameter；

Fig. 4 is a kind of structural schematic diagram of semantic understanding device disclosed in the embodiment of the present application.

Specific implementation mode

The embodiment of the present application discloses a kind of semantic understanding method and device, can apply with human-computer interaction function In electronic equipment, such as the equipment such as TV, mobile phone, vehicle-mounted, smart home and robot.

Below in conjunction with the attached drawing in the embodiment of the present application, by taking intelligent vehicle-mounted system as an example, in the embodiment of the present application Technical solution is clearly and completely described, it is clear that and described embodiments are only a part of embodiments of the present application, without It is whole embodiment.Based on the embodiment in the application, those of ordinary skill in the art are not before making creative work The every other embodiment obtained is put, shall fall in the protection scope of this application.

Fig. 1 is a kind of semantic understanding method disclosed in the embodiment of the present application, is included the following steps：

S101：According to preset domain classification, sentence to be understood is divided into clause, wherein each clause belongs to one A domain classification.For the ease of subsequently illustrating, the clause divided in the step is known as the first generic clause.

Wherein, sentence to be understood can be the sentence gone out according to the speech recognition of collected user.

Table 1 is by taking onboard system as an example, and preset domain classification includes：Map, phone, radio, weather and order control System.

Table 1

For example, sentence to be understood is：" open air-conditioning and play Anhui Traffic Announcement ", the first generic clause marked off (belong to the radio neck in table 1 for " opening air-conditioning " (the order control field belonged in table 1) and " playing Anhui Traffic Announcement " Domain).

In another example sentence to be understood is：" navigate to WanDa Plaza and select most fast route ", the first generic clause marked off Still it is former sentence：" navigate to WanDa Plaza and select most fast route " (belonging to the map field in table 1).

S102：According to preset intent classifier, each first generic clause is divided into the second generic clause, wherein each Second generic clause belongs to an intent classifier.

Table 2 shows the intent classifier under each domain classification shown in table 1 by taking onboard system as an example, for example, order control Under domain classification processed, including intent classifier have：It opens, close and returns.

Table 2

Example is connected, sentence to be understood is：" open air-conditioning and play Anhui Traffic Announcement ", the first generic clause marked off is " opening air-conditioning " and " playing Anhui Traffic Announcement ".Wherein, the second generic clause that clause's " opening air-conditioning " marks off still is " to open Air-conditioning " (belong to opening be intended to) under the order control field in table 2, clause's " playing Anhui Traffic Announcement " mark off second Generic clause is still " playing Anhui Traffic Announcement " (broadcasting belonged under the radio field in table 2 is intended to).

In another example sentence to be understood is：" navigate to WanDa Plaza and select most fast route ", the first generic clause marked off For：" navigate to WanDa Plaza and select most fast route ", the second generic clause marked off are that " navigating to WanDa Plaza " (belongs to table 2 In map field under navigation purposes) and " selection most fast route " (route belonged under the map field in table 2 is intended to).

S103：The extraction tank information from each second generic clause.

Wherein, slot information can be understood as keyword.

Table 3 is the example of the slot information under domain classification and intent classifier shown in table 2.

Table 3

For example, the slot information of the second generic clause " navigating to WanDa Plaza " extraction is " WanDa Plaza " (endLoc).It needs Illustrating, each is intended to the part that the lower slot information extracted can be pre-defined slot information, and not all, for example, The slot information of " navigating to WanDa Plaza " extraction be " WanDa Plaza " (endLoc) rather than startLoc endLoc Landmark whole in distance.

It should be noted that table shown in the present embodiment is only the citing for intelligent vehicle-mounted system, in practice, Different domain classifications, intent classifier and slot information can be formulated according to different product demands.

The semantic understanding method described in the present embodiment is can be seen that from process shown in FIG. 1, first from the dimension of domain classification Degree, is divided into the first generic clause, then from the dimension of intent classifier, each first generic clause is divided by sentence to be understood Second generic clause finally extracts the slot information of each second generic clause, included therefore, it is possible to obtain in sentence to be understood Different field classify and different intent classifier under slot information, that is to say, that the result of semantic understanding include different field not Therefore for the sentence to be identified more being intended to, it can completely understand with the recognition result under being intended to and multiple intentions, improve reason The accuracy of solution.

Also, applicant has found in the course of the study, existing semantic understanding method, why can not understand it is more A intention is because if using existing semantic understanding algorithm, it is understood that the algorithm performs difficulty for going out multiple intentions is high：Although Theoretically, it can understand the modes of multiple intentions from understanding that the mode for an intention expands to, it is contemplated that algorithm The reasons such as complexity and the support of hardware, in fact, being appreciated that out that the algorithm of multiple intentions is difficult to be performed.And Fig. 1 institutes The semantic understanding method shown, splits clause step by step, the object that each clause that upper level is split splits as next stage, it is this by The mode that grade splits, understands step by step has higher enforceability.

Each step in Fig. 1, can use trained model realization, be described more detail below how training pattern And how to use trained model realization step shown in FIG. 1.

Fig. 2 is the specific implementation flow of step shown in FIG. 1, is included the following steps：

S201：Using field discrimination model, the field label of each word in sentence to be understood is obtained.

The input of sentence that will be to be understood as field discrimination model, field discrimination model export in sentence to be understood Each word field label.

The field label of any one word in sentence to be understood includes：The title of domain classification belonging to the word and First position information, first position information are location information of the word in all words for meeting first condition, wherein first Part is the word for belonging to identical domain classification in sentence to be understood with the word.Specifically, position includes starting (such as start information Indicated with b), intermediate (such as average information is indicated with m) and terminate (such as ending message is indicated with e).It should be noted that if One word is not belonging to the classification of any one of preset domain classification, then output is specific, can distinguish and preset field The mark of bookmark name, such as specifically it is identified as s.It should be noted that domain classification is labeled as the word of s, it can be without The identification of follow-up intention and slot information, to save resource.

Table 4 is the example of the field label of each word of output after the input by sentence field discrimination model understood.This reality The example in example is applied, by taking the domain classification, intent classifier and slot information illustrated in table 3 as an example.

Table 4

Example 1：It opens air-conditioning and plays Anhui Traffic Announcement

pos

It beats

It opens

It is empty

It adjusts

It broadcasts

It puts

Peace

Emblem

It hands over

It is logical

Extensively

It broadcasts

pos

↓

pos

cmd_b

cmd_m

cmd_e

radio_b

radio_m

raido_m

radio_e

pos

Example 2：It navigates to WanDa Plaza and selects most fast route

pos

It leads

Boat

It arrives

Ten thousand

It reaches

Extensively

Choosing

It selects

Most

Soon

Road

Line

pos

↓

pos

map_b

map_m

map_e

pos

Example 3：Inquire the price of No. 92 gasoline

pos

It looks into

It askes

One

Under

92

Number

Vapour

Oil

's

Valence

Lattice

pos

↓

pos

s

pos

By taking example 1 as an example, in " beating " corresponding field label cmd_b, cmd indicates the title of domain classification：Order control, B indicates that " beating " position shared in all words for belonging to the fields cmd is initial position, i.e., " beating " is belong to the fields cmd the One word.

In example 3, it is not determined by the domain classification of each word, therefore, is exported as s.

Based on first position information, it can be seen that after obtaining the field label of each word, that is, mark off the first generic clause.

In the present embodiment, each word can be used as a vectorial input field discrimination model, and vectorial length can be 100.

In above each exemplary table, POS indicates separator.

In the present embodiment, field discrimination model uses but is not limited to " the word insertion two-way long short-term memories of embedding+ Model lstm+ condition random field crf " neural network models (other sequence marking models).The training of field discrimination model Method is：For each classification in preset domain classification, the first preset quantity such as 100,000 sentences are collected respectively, often Sentence marks field label.It collects the second preset quantity such as 200,000 and is not belonging to any one of preset domain classification The sentence of domain classification, and the field label for marking every sentence is s.Using the corpus data with label of above-mentioned collection, The object function of training neural network model：.Wherein,For prediction field label it is general Rate value.Specific training algorithm may refer to the prior art, and which is not described herein again.

S202：Using intention discrimination module corresponding with the domain classification belonging to the first generic clause, the meaning of each word is obtained Icon label.Wherein, the domain classification belonging to the first generic clause is by the field tag representation that is obtained in S201.

In the present embodiment, each domain classification in preset domain classification, corresponding intention discrimination module.By Word of each in one generic clause with same area specific name is as the corresponding input for being intended to discrimination module, it is intended that differentiates Module exports the intention labels of each word.

The intention labels of any one word in first generic clause include：The title of intent classifier belonging to the word and Two location informations, second position information are location information of the word in all words for meeting second condition, and second condition is institute State the word for belonging to identical domain classification and identical intent classifier in the first generic clause with the word.Specifically, position includes It originates (such as start information is indicated with b), intermediate (such as average information is indicated with m) and terminates (such as ending message is indicated with e).Such as One word of fruit is not belonging to any one classification in preset intent classifier, then exports specific identifier, such as s.

Table 5 be belong to order control and map domain classification word input corresponding intention discrimination module respectively after, output Each word intention labels example.

Table 5

Example 1：It opens air-conditioning and plays Anhui Traffic Announcement

Example 2：It navigates to WanDa Plaza and selects most fast route

For " the beating " of example 1 in table 4, it is intended that in label open_b, open indicates the opening under order control field It is intended to, b indicates to be in the word that " beating " belongs to identical intent classifier in the clause's " opening air-conditioning " for belonging to order control Beginning position.

Based on second position information, it can be seen that after obtaining the intention labels of each word, that is, mark off the second generic clause.

In the present embodiment, each word can be used as a vector input and be intended to domain discrimination model, and vectorial length can be 100。

In the present embodiment, it is intended that discrimination model uses but is not limited to " the two-way lstm+crf of embedding+ " neural network mould Type (other sequence marking models).Each domain classification corresponds to an intention discrimination model.Any one domain classification pair The training method for the intention discrimination model answered is：Third preset quantity such as 100,000 sentences for belonging to the domain classification are collected, Every sentence marks intention labels.Collect such as 200,000 articles of the 4th preset quantity is not belonging in preset intent classifier any one The sentence of a classification, and the intention labels for marking every sentence are s.Use the corpus data with label of above-mentioned collection, instruction Practice the object function of neural network model： For the probability of the intention labels of prediction Value.Specific training algorithm may refer to the prior art, and which is not described herein again.

S203：To each word (i.e. word in the second generic clause) with field label and intention labels, determine that slot is believed Cease label.

The slot information labels of any one word in second generic clause include：Slot information belonging to the word title (such as Shown in table 3) and the third place information, the third place information is position letter of the word in all words for meeting third condition Breath, third condition are to belong to identical domain classification, identical intent classifier and having the same in the second generic clause with the word The word of slot information.Specifically, position include starting (such as start information is indicated with b), intermediate (such as average information is indicated with m) and Terminate (such as ending message is indicated with e).

Because different types of slot information has the characteristics that itself, therefore, in order to preferably extract slot information, the present embodiment In, using different extraction modes：

1, grammar rule mode, which is suitable for enumerate but the extraction of regular strong slot information.Such as the time, Amount of money etc. analyzes extraction tank information using the following abnf syntax, that is, regular expression：

$ digital=0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9；

$ date=$ digital<2,4>Year $ digital<1-2>Moon $ digital<1-2>Day；

$ money=$ digital<0,->[hundred million | ten thousand | thousand | hundred | ten] member.

2, dictionary mode, which are suitable for the extraction of ordinal type slot information.By taking city name as an example, dictionary list (city)={ Beijing, Shanghai, Guangzhou, Shenzhen, Hangzhou ... }.By the way of dictionary matching, extraction tank information.

The specific implementation of both the above mode may refer to the prior art, and which is not described herein again.

3, it for that can not enumerate and the not strong slot information of regularity, in the present embodiment, is extracted using slot information discrimination model：

The input of slot information discrimination model is the intention vector of each word and the second generic clause in the second generic clause.Its In, the example of the intention vector of any one the second generic clause is：The length for being intended to vector is the intention under a domain classification The total quantity of classification, the intent classifier under the domain classification correspond an element in vector, the intention point described in the word The corresponding element of class is 1, and other elements are 0.

Each word can be used as a vector, and vectorial length can be 100.

Table 6 is the slot Examples of information of clause under the navigation purposes of map field：

Table 6

In the present embodiment, slot information discrimination model uses but is not limited to " the two-way lstm+crf of embedding+ " neural network Model (other sequence marking models).Each domain classification corresponds to a slot information discrimination model.Any one field point The training method of the corresponding slot information discrimination model of class is：Collect such as 100,000 articles of the 5th preset quantity for belonging to the domain classification Sentence, every sentence mark slot information labels.Such as 200,000 articles of the 6th preset quantity is collected to be not belonging in preset slot information The sentence of any one slot information, and the slot information labels for marking every sentence are s.Use the language with label of above-mentioned collection Expect data, the object function of training neural network model：Wherein,For the slot information of prediction The probability value of label.Specific training algorithm may refer to the prior art, and which is not described herein again.

It should be noted that three of the above slot information extraction mode, can use parallel, i.e., each word is carried out with The extraction of upper three kinds of modes, in practical applications, some possible modes are without output as a result, knot may also be exported there are many mode Fruit extracts different slot information, it is also possible to which a kind of mode exports multiple slot information, for the final result of slot information Selection, will subsequently illustrate.

It is of course also possible to train a slot information discrimination model, this case not to limit herein for each intent classifier. In the case where each intent classifier corresponds to a slot information discrimination model respectively, the input data of slot information discrimination model In, can not include the intention vector of the second generic clause.

So far, the domain classification, intent classifier and slot information that each word in sentence to be understood is determined respectively, because This, completes for sentence comprehension to be understood.

Based on the mode for gradually understanding domain classification, intent classifier and slot information, each step is respectively trained accordingly Model, so, the enforceability of model training is higher.Also, for different domain classifications, differentiated using different intentions Module and slot information discrimination module so that for the accuracy higher for understanding result more being intended to.

It should be noted that it is shown in Fig. 2 in the way of model judgement, a kind of specific implementation of only Fig. 1 Citing can also use other manner, such as sentenced according to the mode that preset correspondence judges in addition to utilizing model Fixed, which is not described herein again.

As previously mentioned, the slot information finally extracted may have it is multigroup, it is also possible to one group of slot information includes multiple slot information, Multiple slot information can form different slot information paths, for example, slot information path 1 (path1)=<Slot_a, slot_b, slot_c>, path2=<Slot_a, slot_d>.That is slot information a, b and c forms a kind of slot information candidate path, slot information a and Another slot information candidate of d compositions path.Slot information candidate path described in the present embodiment from the second generic clause by extracting At least one slot information combine to be formed, in order to improve efficiency of selection, optionally, different slot information candidate paths is at least There are one different slot information for tool.

Therefore, method described herein further can choose a road from a plurality of slot information candidate path Diameter.Detailed process is as shown in figure 3, include the following steps：

S301：Sentence to be understood is subjected to word segmentation processing, obtains word segmentation result.

S302：Using order models, the intention vector of word segmentation result, slot information candidate path and the second generic clause is made For input, the scoring in each slot information candidate path is determined.

Wherein, word segmentation result can be the vector of each of word segmentation result participle composition, and vectorial length can be m* 100, wherein m are the number of participle.

Slot information candidate path can be the vector in each slot information candidate path and each participle composition, vectorial length Can be n*100, n is the number of path candidate participle.

The intention vector of second generic clause is as previously mentioned, which is not described herein again.

It should be noted that three of the above vector might not be used as input, can at least select a kind of as sequence The input of model, but vector as input is more, then and the result exported is more accurate.

In the present embodiment, order models use but are not limited to CNN models (other order models).For different necks Domain is classified, and identical sorting module can be used.The training method of order models is：The training language of multiplexing field discrimination module Sentence, and correct path pathT is marked to each sentence, erroneous path pathF is forged by program, finally by correct path and mistake Accidentally path forms pair pairs<PathT, pathF>.Using the corpus data with label of above-mentioned collection, CNN models are trained Object function：

S(θ)>thre

Wherein, PT (θ)_iIndicate correct path score, PF (θ)_iIndicate that erroneous path score, final score S (θ) are more than threshold Value thre.

S303：According to the scoring in each slot information candidate path, final slot information path is determined.

In general, the highest slot information path that will score is as final slot information path.

For example, the slot information candidate path of clause " ten streets Li Miao for navigating to 3 kilometers " extraction includes：

path1:Navigate to the $ endLoc of $ distance

path2:Navigate to the mausoleums the $ distance street of $ distance

Executing the scoring obtained after process shown in Fig. 3 is respectively：Path1,0.95.Path2,0.2.Therefore, by path1 As final slot information path, the slot information in the path, for finally determining slot information.

Fig. 4 is a kind of semantic understanding device disclosed in the embodiment of the present application, including：First division module, second divide mould Block and extraction module can also include optionally screening module.

Wherein, the first division module is used for according to preset domain classification, and sentence to be understood is divided into first kind Sentence.Second division module is used to, according to preset intent classifier, each first generic clause is divided into the second generic clause.It carries Modulus block is used for the extraction tank information from each second generic clause.Screening module is used in the extraction module from each institute It states in the second generic clause after extraction tank information, scores slot information candidate path, and according to the scoring, determine that slot is believed Breath extraction result.

The specific implementation process of the function of the above modules, may refer to above method part, which is not described herein again.

Semantic understanding device shown in Fig. 3, using different partition dimensions (i.e. field, intention, slot information), according to step by step The mode for dividing clause can obtain the slot information under multi-field more intentions, therefore, it is possible to for the sentence more being intended to, obtain The higher understanding results being intended to of accuracy more.

The embodiment of the present application also discloses a kind of semantic understanding equipment and computer-readable medium.

Specifically, semantic understanding equipment includes memory and processor.Memory is for storing one or more programs.Place Reason device is for executing one or more of programs, so that the semantic understanding equipment realizes following functions previous embodiment institute The semantic understanding function of stating.

Instruction is stored in computer readable storage medium, when run on a computer so that before computer executes State the semantic understanding function described in embodiment.

If the function described in the embodiment of the present application method is realized in the form of SFU software functional unit and as independent production Product are sold or in use, can be stored in a computing device read/write memory medium.Based on this understanding, the application is real Applying the part of a part that contributes to existing technology or the technical solution can be expressed in the form of software products, The software product is stored in a storage medium, including some instructions are used so that a computing device (can be personal meter Calculation machine, server, mobile computing device or network equipment etc.) execute each embodiment the method for the application whole or portion Step by step.And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), with Machine accesses various Jie that can store program code such as memory (RAM, Random Access Memory), magnetic disc or CD Matter.

Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with it is other The difference of embodiment, just to refer each other for same or similar part between each embodiment.

The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the application. Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or range.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest range caused.

Claims

1. a kind of semantic understanding method, which is characterized in that including：

According to preset domain classification, sentence to be understood is divided into the first generic clause, wherein each first generic clause Belong to a classification in the domain classification；

The extraction tank information from each second generic clause.

2. according to the method described in claim 1, it is characterized in that, described according to preset domain classification, by language to be understood Sentence is divided into the first generic clause and includes：

Using each word in the sentence to be understood as input, using field discrimination model, the language to be understood is obtained The field label of the field label of each word in sentence, any one word in the sentence to be understood includes：The word The title and first position information of affiliated domain classification, the first position information are the word in the institute for meeting first condition It is to belong to identical domain classification with the word in the sentence to be understood to have the location information in word, the first condition Word, the position include starting, centre and terminate.

3. according to the method described in claim 2, it is characterized in that, the training process of the field discrimination model includes：

For each classification in the preset domain classification, the first preset quantity sentence is collected respectively, and to collecting Sentence mark the field label；

The sentence that the second preset quantity item is not belonging to the classification of any one of described preset domain classification is collected, and is marked not Belong to the label of the preset domain classification；

It, will each described the 4. according to the method described in claim 1, it is characterized in that, described according to preset intent classifier One generic clause is divided into the second generic clause：

Using each word with same area specific name in first generic clause as input, using with the first kind The corresponding intention discrimination module of domain classification belonging to clause, obtains the intention labels of each word in first generic clause, The intention labels of any one word in first generic clause include：The title of intent classifier belonging to the word and Two location informations, the second position information be location information of the word in all words for meeting second condition, described second Condition is to belong to the word of identical domain classification and identical intent classifier, the position in first generic clause with the word Including starting, centre and end, wherein each domain classification corresponds to an intention discrimination module, different domain classifications pair The intention discrimination module answered is different.

5. according to the method described in claim 4, it is characterized in that, the corresponding intention of any one of domain classification is sentenced The training method of other module includes：

The third preset quantity sentence for belonging to the domain classification is collected, and the intention is marked to every sentence of collection and is marked Label；

The sentence that the 4th preset quantity is not belonging to the classification of any one of preset intent classifier is collected, and marks and is not belonging to institute State the label of preset domain classification；

Using the object function of sentence training the second sequence marking model after mark, the corresponding meaning of the domain classification is obtained Figure discrimination module.

6. according to the method described in claim 1, it is characterized in that, the extraction tank information from each second generic clause Including：

Using the intention vector of each word and second generic clause in second generic clause as input, using with it is described The corresponding slot information discrimination model of domain classification belonging to second generic clause, obtains the slot of each word in second generic clause The slot information labels of information labels, any one word in second generic clause include：The title of slot information belonging to the word And the third place information, the third place information are location information of the word in all words for meeting third condition, institute It is to belong to identical domain classification, identical intent classifier in second generic clause with the word and have phase to state third condition The word of same slot information；

Wherein, each domain classification corresponds to a slot information discrimination module, and the corresponding slot information of different domain classifications differentiates Module is different.

7. according to the method described in claim 6, it is characterized in that, the corresponding slot information of any one of domain classification The training method of discrimination module includes：

The sentence that the 6th preset quantity is not belonging to any one of preset slot information slot information is collected, and marks and is not belonging to institute State the label of preset slot information；

The object function for the marking model that sorted using the sentence training third after mark, obtains the corresponding slot of the domain classification Information discrimination module.

8. according to claim 1-7 any one of them methods, which is characterized in that described from each second generic clause After extraction tank information, further include：

By described in the word segmentation result of the sentence to be understood, slot information candidate path and the sentence to be understood The intention vector of two generic clauses, using preset model, scores to slot information candidate path as input, the slot information Path candidate is combined by least one slot information extracted from each second generic clause and is formed；

According to the scoring, slot information extraction result is determined.

9. a kind of semantic understanding device, which is characterized in that including：

First division module, for according to preset domain classification, sentence to be understood to be divided into the first generic clause, wherein Each first generic clause belongs to a classification in the domain classification；

Second division module, for according to preset intent classifier, each first generic clause to be divided into the second generic clause, Wherein, each second generic clause belongs to a classification in the intent classifier；

10. a kind of semantic understanding equipment, which is characterized in that including：

Memory and processor；

The memory is for storing one or more programs；

The processor is for executing one or more of programs, so that the semantic understanding equipment realizes claim 1- Semantic understanding method described in any one of 8.

11. a kind of computer-readable medium, which is characterized in that instruction is stored in the computer readable storage medium, when it When running on computers so that computer perform claim requires the semantic understanding method described in any one of 1-8.