CN104166643A

CN104166643A - Dialogue act analyzing method in intelligent question-answering system

Info

Publication number: CN104166643A
Application number: CN201410410275.2A
Authority: CN
Inventors: 吴云芳; 王异秀
Original assignee: NANJING JINWAWA SOFTWARE TECHNOLOGY Co Ltd
Current assignee: NANJING JINWAWA SOFTWARE TECHNOLOGY Co Ltd
Priority date: 2014-08-19
Filing date: 2014-08-19
Publication date: 2014-11-26

Abstract

The invention discloses a dialogue act analyzing method in an intelligent question-answering system. Firstly, a dialogue act label set suitable for spoken Chinese is sorted out; meanwhile, a hierarchical classification device used for analyzing dialogue act units is built; then each sentence of a user is segmented into the dialogue act units; the dialogue act units are classified through the hierarchical classification device to obtain corresponding classes. By means of the technical scheme, automatic analysis of dialogue acts can be achieved effectively at a high speed, and a foundation is laid for intelligent dialogues.

Description

Dialogue behavior analysis method in a kind of Intelligent Answer System

Technical field

The present invention relates to a kind of automatic analysis method of talking with behavior, be specifically related to a kind of sequence labelling method of behavior unit and machine learning method of level of talking with.

Background technology

Dialogue behavior, main research dialogue is beyond semanteme, and the information reflecting, as dialogue intention, session structure information etc.In English, dialogue behavior is a time-honored research, comprises complete tag system and automatic analysis method.And in Chinese, this research does not launch.Below briefly introduce the progress of dialogue behavioral study in English.

2000, Stollcke proposes to use the feature of grammer and morphology, utilize the complete dialogue behavior unit of mark, dialogue is regarded as to independently sentence stream, set up the task that Hidden Markov Model (HMM) completes dialogue behavior classification, in conjunction with n-gram algorithm, decision tree and neural network, generate validity feature.But the method does not complete the automatic segmentation for dialogue behavior unit.

2005, Ang proposed to utilize decision-tree model to complete the cutting of dialogue behavior unit, and has used the inactive distance feature in same teller, and for dialogue behavior unit, cutting reaches 44% accuracy.Yet this accuracy rate is still lower in dialogue behavioural analysis.

Summary of the invention

Goal of the invention: for above-mentioned prior art, the invention provides the dialogue behavior analysis method in a kind of Intelligent Answer System, solve the automatic segmentation accuracy rate of the dialogue behavior under Chinese spoken environment at present low and then cause talking with the low technical matters of automatic discrimination of behavior.

Technical scheme: for solving the problems of the technologies described above, the invention provides following technical scheme:

First arrange out the dialogue behavior tally set being applicable in Chinese characters spoken language, then by every words cutting of user, be dialogue behavior unit, then creation analysis is talked with the hierarchical classifier of behavior unit, each dialogue behavior unit, through the classification of hierarchical classifier, is finally classified in each label in dialogue behavior tally set; Wherein, while arranging out the dialogue behavior tally set that is applicable to Chinese characters spoken language, with reference to the DAMSL broad sense tally set in English language, and the characteristic of oral language of combined with intelligent question answering system, in the present invention, described tally set comprises 4 kinds of large classes of label, and comprises multiple label in the large class of each label, is specifically respectively:

The large class of the first label: reaction teller's intention; Concrete label classification and meaning tag are as follows:

<S>: state a conviction or event, referred to as declarative sentence;

<Q>: can not be with being or the open question sentence of no answer, referred to as open question sentence;

<QYN>: yes-no question;

<R>:: expect that hearer makes the question sentence of an action or response, referred to as demand question sentence;

<TH>: the term of courtesy such as express thanks, referred to as thanking to sentence;

The large class of the second label: reaction teller is for reaction above; Concrete label classification and meaning tag are as follows:

<SA>:: a upper declarative sentence is made to response;

<RA>: a upper demand question sentence is made to response;

<AY>: the affirmative acknowledgement (ACK) of yes-no question;

<AN>: the negative acknowledge of yes-no question;

<AQ>: the answer of open question sentence;

<D>: to thanking the courtesy that sentence is made to respond;

The large class of the third label: session structure information, the effect of response utterance to session structure; Concrete label classification and meaning tag are as follows:

<CO>: start a dialogue;

<CC>: finish a dialogue;

<CT>:: continue a dialogue;

The 4th kind of large class of label: except other situations three kinds of large classes of label above; Concrete label classification and meaning tag are as follows:

<U>: represent uncertain information.

Owing to considering in a sentence, can contain a plurality of dialogue behaviors, first, we adopt following algorithm is dialogue behavior unit sentence cutting:

Step 1.1, language be take to comma, space as separator is broken into elementary cell, extract the text feature of each elementary cell;

Step 1.2, utilize sequence labelling algorithm, judge whether each elementary cell is the beginning of a new dialogue behavior unit;

Step 1.3, according to the judged result of step 1.2, the elementary unit groups that will be under the jurisdiction of same dialogue behavior unit becomes dialogue behavior unit, talks with and between behavior unit, forms dialogue behavior cellular chain.

Further, in the present invention, the text feature of extraction comprises following 10 kinds:

The length of elementary cell;

Whether contain verb;

Whether contain conjunction;

Whether contain number;

Whether contain pronoun;

Whether contain high frequency stop word;

Whether contain the high frequency words that independently forms a complete sentence;

With this elementary cell before the cell pairs that forms of elementary cell, whether contain conjunction pair;

With this elementary cell before the cell pairs that forms of elementary cell, whether contain identical word;

With this elementary cell before the cell pairs that forms of elementary cell, whether contain similar word.

As preferably, in the present invention, described sequence labelling algorithm is selected linear chain condition random field algorithm.This algorithm is widely used in sequence labelling problem.

Because the judgement of subdialogue behavior need to depend on hierarchical classifier above, and the purposes of subdialogue behavior is comparatively single, and the form of expression is similar.Therefore in the present invention, the These characteristics having with reference to dialogue behavior builds hierarchical classifier, and the sorting technique of hierarchical classifier comprises the steps:

Whether step 2.1, judgement dialogue behavior unit can sort out a kind of in <CO>, <CC>, <CT>, <TH>, <D>, <AN>5 kind tag class; If it is exit hierarchical classifier, the classification information of label under returning, finishes classification; If, do not continue step 2.2;

Whether step 2.2, judgement dialogue behavior unit can sort out a kind of in <SA>, <RA>, <AY>3 kind tag class, if it is skip to step 2.4; Otherwise skip to step 2.3;

Whether step 2.3, judgement dialogue behavior unit can sort out a kind of in <Q>, <QYN>, <R>3 kind tag class; If judge dialogue behavior unit, can sort out specifically a certain in <Q>, <QYN>, <R>, exit hierarchical classifier, the classification information of label under returning, finishes classification; If judge dialogue behavior unit, all can not sort out to these 3 kinds of tag class, skip to step 2.5;

The dialogue behavior of having decided in step 2.4, utilization dialogue above, judgement dialogue behavior unit belongs to any in <SA>, <RA>, <AY>, then exit hierarchical classifier, the classification information of label under returning, finishes classification;

Step 2.5, structure svm classifier device, utilize svm classifier device judgement dialogue behavior unit can sort out to label <S> or label <AQ>, then exit hierarchical classifier, the classification information of label under returning, finishes classification.

In the present invention, <CO> in step 2.1, <CC> and <CT> all represent session structure information, together with <TH>, <D> and these 6 of <AN>, be all the classification the most easily judging, so first judge; Next label <SA>, the <RA> in step 2.2 and <AY> all belong to a kind of positive response, judged this step, remaining judgement direction can be divided into the concrete positive response type of judgement and determine whether 2 kinds of interrogative sentences, wherein interrogative sentence comprises <Q>, <QYN>, <R>3 kind; Whether following step 2.3 is again for being that interrogative sentence judges and draws thus two kinds of results, if interrogative sentence continues to judge the type of concrete interrogative sentence, be any in <Q>, <QYN>, <R> on earth, if not interrogative sentence, judge and can sort out to label <S> or label <AQ>.Above process is known, sets artificially the judgement sequencing of hierarchical classifier, by easy first and difficult later mode, is judged and is classified, and has improved efficiency and the accuracy rate of operation.

Further, in the present invention, in step 2.3, it is as follows whether judgement dialogue behavior unit can be sorted out to a kind of method in <Q>, <QYN>, <R>: first whether judgement dialogue behavior unit can be sorted out to label <QYN>; If can not sort out to label <QYN>, utilize frequent sequence signature and word bag model judgement dialogue behavior unit whether can sort out to label <R>; If can not sort out to label <R>, judge that this dialogue behavior unit sorts out to label <Q>.

Further, in the present invention, in step 2.5, the structure of svm classifier device utilizes following feature: the classification results of the dialogue behavior of above having decided, contained verb and noun in the dialogue behavior unit that belongs to label <Q> in this dialogue behavior unit and 5 dialogues above, distance between this dialogue behavior unit and the nearest dialogue behavior unit that belongs to label <Q>, in the dialogue behavior unit that belongs to label <Q> in this dialogue behavior unit and 5 dialogues above, whether contain the word pair repeating.

Beneficial effect:

The inventive method is first according to talking with behavior label standard in English, arrange out the applicable dialogue behavior tally set of Chinese characters spoken language, and a kind of automatic analysis method of talking with behavior is provided, for solving for the automatic segmentation of dialogue behavior unit and the automatic identification of dialogue behavior.

Principle of the present invention is, first extracts text feature, uses sequence labelling algorithm to be syncopated as dialogue behavior unit, recycles frequent arrangement set feature, and the component level sorter such as non-text feature, automatic analysis dialogue behavior unit.

In Chinese, for dialogue behavior research seldom, the feature that the present invention is directed to dialogue behavior is effectively analyzed, for next needing the work that relies on dialogue behavior to lay the foundation;

Compare with the dialogue behavior in English, the present invention is directed to Intelligent Answer System, tally set in English has been carried out improving obtaining being applicable to the tally set of Chinese, and taked sorting technique efficient, level, it can be applied in practical Intelligent Answer System.

Utilize technical scheme provided by the invention, the automatic analysis of the behavior that can engage in the dialogue effectively, at high speed, for reaching of Intelligent dialogue lays the foundation.

Accompanying drawing explanation

Fig. 1 dialogue behavior unit of the present invention cutting flow process;

The dialogue behavior autoanalyzer flow process of Fig. 2 level of the present invention.

Embodiment

Below in conjunction with accompanying drawing, the present invention is further described.

Dialogue behavior analysis method in a kind of Intelligent Answer System of the present invention, first arranges out the as shown in table 1 below dialogue behavior tally set in Chinese characters spoken language that is applicable to.

Table 1

As can be seen from Table 1, tally set comprises 4 kinds of label classifications, is respectively:

The first, reaction teller's intention;

The second, reaction teller are for reaction above;

The third, the effect of response utterance to session structure;

The 4th kind, except other situations three kinds above.

For every kind of label classification, give more careful several label groups, the implication of every kind of label group representative is all different, there is no each other cross one another part.Before three kinds of label classifications substantially contained the common dialogue behavior of language representative, also have the atypical dialogue behavior of small part to be all referred in the third.

In the process that label is marked, notice that a lot of sentences contain a plurality of dialogue behaviors.Therefore, first we will be dialogue behavior unit sentence cutting, then carry out next step dialogue behavior analytical work.

In practical operation, by a word cutting of user, be dialogue behavior unit, comprise execution following steps:

Step 1.1, user's language be take to comma, space as separator is broken into elementary cell, extracts the text feature of each elementary cell, comprise following 10 kinds:

The length of elementary cell;

Whether contain verb;

Whether contain conjunction;

Whether contain number;

Whether contain pronoun;

Whether contain high frequency stop word;

By feature numeral, represent respectively above-mentioned text feature, the length of elementary cell wherein represents by three-dimensional, all the other 9 kinds of text features are all used one-dimensional representation, concrete, if the answer of the problem of paying close attention in text feature is for negating, do not occur, text feature represents with 0, if the answer of the problem of paying close attention in text feature is sure, occur, with 1, represent, also has a kind of situation, if text feature is that invalid feature is (as first elementary cell of this elementary cell formula, cannot try to achieve with this elementary cell before the unit that forms of elementary cell to), the text feature of this situation represents with 2.Become characteristic number word string in order to represent elementary cell above-mentioned feature numeral der group.

Step 1.2, utilize the sequence labelling algorithm of linear chain random field algorithm, judge whether each elementary cell is the beginning of a new dialogue behavior unit;

Engage in the dialogue after the cutting of behavior unit, by according to the feature of the dialogue behavior summing up, the hierarchical classifier of creation analysis dialogue behavior unit, the sorting technique of hierarchical classifier comprises the steps:

Whether step 2.3, judgement dialogue behavior unit can sort out a kind of in <Q>, <QYN>, <R>3 kind tag class; If judge dialogue behavior unit, can sort out a certain in <Q>, <QYN>, <R>, the category that belongs to interrogative sentence, then specifically judge and belong to any, first whether judgement dialogue behavior unit can be sorted out to label <QYN>; If can not sort out to label <QYN>, utilize frequent sequence signature and word bag model judgement dialogue behavior unit whether can sort out to label <R>; If can not sort out to label <R>, judge that this dialogue behavior unit sorts out to label <Q>, judge backed off after random hierarchical classifier, the classification information of label under returning, finishes classification; If judge dialogue behavior unit, all can not sort out to these 3 kinds of tag class, skip to step 2.5;

Step 2.5, utilize the classification results of the dialogue behavior of having decided above, contained verb and noun in the dialogue behavior unit that belongs to label <Q> in this dialogue behavior unit and 5 dialogues above, distance between this dialogue behavior unit and the nearest dialogue behavior unit that belongs to label <Q>, in the dialogue behavior unit that belongs to label <Q> in this dialogue behavior unit and 5 dialogues above, whether contain the word pair repeating, build svm classifier device, utilize svm classifier device judgement dialogue behavior unit can sort out to label <S> or label <AQ>, then exit hierarchical classifier, the classification information of label under returning, finish classification.

Below by a specific embodiment, the present invention will be further described.

Suppose that in one section of dialogue advancing, active user's language is: our use, will the picture of RM be charged according to purposes, is it right?

And the information above of knowing the current language of user is following 3 dialogues:

The 1st

A customer service: you are good!

The 2nd

Client: I want to buy a pictures.

The 3rd

Customer service: are you what purposes of current picture?

Wherein the 1st word are automatically recognized as this dialogue behavior is to open dialogue, and the 2nd word are automatically recognized as this dialogue behavior is declarative sentence, the 3rd word be automatically recognized for this dialogue behavior be interrogative sentence.

Current language is analyzed, first, the cutting of the behavior unit that engages in the dialogue, flow process as shown in Figure 1, comprises the steps:

1) user's language short by comma be two elementary cells, the set expression of the elementary cell of the words language is { (we use ourselves) so, (picture of RM will be charged according to purposes, right?), and the text feature of choosing according to the present invention, each elementary cell is expressed as to the characteristic number word string { (0 1002210220 1), (0 0100000010 1) } that string number forms;

2) utilize linear chain condition random field algorithm to judge whether these 2 elementary cells are the beginning of a new dialogue behavior unit, input using characteristic number word string as linear chain condition random field algorithm, the judged result obtaining is as output, obtaining (we use ourselves) is the beginning of a dialogue behavior unit, (picture of RM will be charged according to purposes, right?) be also the beginning of a dialogue behavior unit;

3) according to the result of previous step, language is divided into 2 dialogue behavior units, will be respectively (we oneself with) and (picture of RM be charged according to purposes, right?), thereby acquisition dialogue behavior cellular chain (we use ourselves), (picture of RM will be charged according to purposes, right?).

Secondly, take that to talk with behavior unit (we own with) be example, the engage in the dialogue automatic identification of behavior of use hierarchical classifier, comprises the following steps:

1) judgement (we use) does not belong to a kind of in <CO>, <CC>, <CT>, <TH>, <D>, <AN>6 kind tag class by own;

2) judgement (we use) does not belong to a kind of in <SA>, <RA>, <AY>3 kind tag class by own;

3) judgement (we use) does not belong to a kind of in <Q>, <QYN>, <R>3 kind tag class by own;

4) basis dialog information above, calculating (we use ourselves) is 1 (distance that represents two sentences is 1) with the distance feature of the open question sentence (your what purposes of current picture) above occurring, and contain (you, I), (purposes, with) two pairs of words pair that repeat, by svm classifier device, judge that it belongs to for the answer of open question sentence is above label <AQ>, return to this classification information, exit hierarchical classifier, and continue the differentiation of next dialogue behavior unit.

Finally, obtain the net result of judgement: (we use ourselves) is the answer of open question sentence; (picture of RM will be charged according to purposes, right?) be yes-no question.

The above is only the preferred embodiment of the present invention; be noted that for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims

1. the dialogue behavior analysis method in an Intelligent Answer System, it is characterized in that: first arrange out the dialogue behavior tally set being applicable in Chinese characters spoken language, then by every words cutting of user, be dialogue behavior unit, then creation analysis is talked with the hierarchical classifier of behavior unit, each dialogue behavior unit, through the classification of hierarchical classifier, is finally classified in each label in dialogue behavior tally set; Wherein, while arranging out the dialogue behavior tally set that is applicable to Chinese characters spoken language, build tally set and be characterised in that: described tally set comprises 4 kinds of large classes of label, and comprises multiple label in the large class of each label, is specifically respectively:

<S>: state a conviction or event, referred to as declarative sentence;

<QYN>: yes-no question;

<R>: expect that hearer makes the question sentence of an action or response, referred to as demand question sentence;

<SA>:: a upper declarative sentence is made to response;

<RA>: a upper demand question sentence is made to response;

<AY>: the affirmative acknowledgement (ACK) of yes-no question;

<AN>: the negative acknowledge of yes-no question;

<AQ>: the answer of open question sentence;

<D>: to thanking the courtesy that sentence is made to respond;

<CO>: start a dialogue;

<CC>: finish a dialogue;

<CT>:: continue a dialogue;

<U>: represent uncertain information.

2. the cutting method of dialogue behavior unit according to claim 1, is characterized in that: during cutting dialogue behavior unit, order is carried out following steps:

3. the dialogue behavior analysis method in Intelligent Answer System according to claim 1, is characterized in that: the text feature of extraction comprises following 10 kinds:

The length of elementary cell;

Whether contain verb;

Whether contain conjunction;

Whether contain number;

Whether contain pronoun;

Whether contain high frequency stop word;

4. the dialogue behavior analysis method in Intelligent Answer System according to claim 1, is characterized in that: described sequence labelling algorithm is selected linear chain condition random field algorithm.

5. the dialogue behavior analysis method in Intelligent Answer System according to claim 1, is characterized in that: the sorting technique of hierarchical classifier comprises the steps:

Whether step 2.1, judgement dialogue behavior unit can sort out a kind of in <CO>, <CC>, <CT>, <TH>, <D>, <AN>6 kind tag class; If it is exit hierarchical classifier, the classification information of label under returning, finishes classification; If, do not continue step 2.2;

6. the dialogue behavior analysis method in Intelligent Answer System according to claim 5, it is characterized in that: in step 2.3, it is as follows whether judgement dialogue behavior unit can be sorted out to a kind of method in <Q>, <QYN>, <R>: first whether judgement dialogue behavior unit can be sorted out to label <QYN>; If can not sort out to label <QYN>, utilize frequent sequence signature and word bag model judgement dialogue behavior unit whether can sort out to label <R>; If can not sort out to label <R>, judge that this dialogue behavior unit sorts out to label <Q>.

7. the dialogue behavior analysis method in Intelligent Answer System according to claim 5, it is characterized in that: in step 2.5, the structure of svm classifier device utilizes following feature: the classification results of the dialogue behavior of above having decided, contained verb and noun in the dialogue behavior unit that belongs to label <Q> in this dialogue behavior unit and 5 dialogues above, distance between this dialogue behavior unit and the nearest dialogue behavior unit that belongs to label <Q>, in the dialogue behavior unit that belongs to label <Q> in this dialogue behavior unit and 5 dialogues above, whether contain the word pair repeating.