CN109241261A

CN109241261A - User's intension recognizing method, device, mobile terminal and storage medium

Info

Publication number: CN109241261A
Application number: CN201811004715.9A
Authority: CN
Inventors: 徐乐乐
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2019-01-18

Abstract

The invention discloses a kind of user's intension recognizing methods, this method comprises: obtaining the dialog text of user's input, dialog text is pre-processed, obtain the keyword in dialog text, keyword is inputted to sample training model, each keyword of vectorization, keyword after input vector is to Naive Bayes Classifier, determine that user is intended to, the invention also discloses a kind of user's intention assessment device, mobile terminal and storage mediums, it can solve the problem of that robot can not provide the answer for meeting user's intention when human-computer dialogue.

Description

User's intension recognizing method, device, mobile terminal and storage medium

Technical field

The present invention relates to interactive field more particularly to a kind of user's intension recognizing methods, device, mobile terminal And storage medium.

Background technique

With the fast development of interactive, user's intention assessment is particularly important for interactive.Mesh Before, in live streaming field, session content substantially includes four plates, respectively game, face value, open air and Quadratic Finite Element.So, when When user and robot chat, robot must determine that user thinks merely which plate first, raw from corresponding plate again afterwards At reply, the answer for meeting user's intention can be just provided.For example, when user inputs: I likes the song of small elder sister, the sheet of user Expect and chats face value, and when there is no to carry out user's intention assessment, robot not can determine that user thinks the content of merely which plate, from And the content of user's game or other plates may be replied, the answer for meeting user's intention cannot be provided, user satisfaction is reduced.

Summary of the invention

The main purpose of the present invention is to provide a kind of user's intension recognizing method, device, mobile terminal and storage medium, When aiming to solve the problem that human-computer dialogue in the prior art, robot can not provide the problem of answer for meeting user's intention.

To achieve the above object, first aspect of the embodiment of the present invention provides a kind of user's intension recognizing method, comprising:

Obtain the dialog text of user's input；

The dialog text is pre-processed, the keyword in the dialog text is obtained；

The keyword is inputted to sample training model, each keyword of vectorization；

Keyword after inputting the vectorization determines that user is intended to Naive Bayes Classifier.

Second aspect of the embodiment of the present invention provides a kind of user's intention assessment device, comprising:

Module is obtained, for obtaining the dialog text of user's input；

Preprocessing module pre-processes the dialog text, obtains the keyword in the dialog text；

First input module, for inputting the keyword to sample training model, each keyword of vectorization；

Second input module determines user for inputting the keyword after the vectorization to Naive Bayes Classifier It is intended to.

The third aspect of the embodiment of the present invention provides a kind of mobile terminal, comprising:

Memory, processor and storage are on a memory and the computer program that can run on a processor, feature exist In the processor realizes user's intension recognizing method that first aspect of the embodiment of the present invention provides when executing described program.

Fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, is stored thereon with computer journey Sequence realizes the user intention assessment side that first aspect of the embodiment of the present invention provides when the computer program is executed by processor Method.

From the embodiments of the present invention it is found that user's intension recognizing method provided by the invention, device, mobile terminal and depositing Storage media obtains the dialog text of user's input, dialog text is pre-processed, obtain keyword in dialog text it Afterwards, input keyword is to sample training model, each keyword of vectorization, the keyword after then inputting the vectorization to simple shellfish This classifier of leaf, to determine that user is intended to, when can solve human-computer dialogue, robot can not provide the answer for meeting user's intention Problem.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those skilled in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is the flow diagram for user's intension recognizing method that first embodiment of the invention provides；

Fig. 2 is the flow diagram for user's intension recognizing method that second embodiment of the invention provides；

Fig. 3 is the structural schematic diagram for user's intention assessment device that third embodiment of the invention provides；

Fig. 4 shows a kind of hardware structure diagram of mobile terminal.

Specific embodiment

In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described reality Applying example is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, those skilled in the art Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Referring to Fig. 1, Fig. 1 is the flow diagram for user's intension recognizing method that first embodiment of the invention provides, it should Method includes:

S101, the dialog text for obtaining user's input；

In human-computer dialogue field, if user is talked with by way of voice, the voice data of user's input is obtained, is used Above-mentioned voice data is converted to dialog text by speech-to-text method, convenient for processing.If user is by way of text pair Words directly acquire the dialog text of user's input, are handled.

It should be noted that speech-to-text method is the prior art, details are not described herein again.

S102, dialog text is pre-processed, obtains the keyword in dialog text；

Dialog text is subjected to word segmentation processing first, each participle in dialog text is obtained, then removes stop words.Its In, removal stop words can be realized by preset deactivated vocabulary, and the stop words present in the dialog text preset is deactivated at this When in vocabulary, the stop words is deleted, finally obtains the keyword in the dialog text.

Illustratively, dialog text: I likes the song of small elder sister, carries out word segmentation processing to it first, obtains after participle: I likes the song of small elder sister, due to exist in default deactivated vocabulary " " word, obtain after removing stop words: I likes Small elder sister's song.

S103, input keyword to sample training model, each keyword of vectorization；

Sample training model is intended to form by a plurality of corpus text and the corresponding user of corpus text, by improved Feature Words in corpus text are carried out vectorization processing by TF-IDF algorithm.

Wherein, TF-IDF (term frequency-inverse document frequency) is a kind of for information The common weighting technique of retrieval and data mining.TF means word frequency (Term Frequency) that IDF means inverse text frequency Index (Inverse Document Frequency).

Specifically, corpus text is carried out word segmentation processing and removal stop words processing in above-mentioned steps S102, then lead to Improved TF-IDF algorithm is crossed, the Feature Words in corpus text are subjected to vectorization processing.Enabling corpus text is D_i, D= (t₁,t₂,...,t_m), wherein t_jIt is j-th of Feature Words after participle, 1≤j≤m, w_ijIndicate Feature Words t_jIn corpus text D_i In weight, tf_ijIndicate Feature Words t_jIn corpus text D_iIn word frequency, n_jThere is Feature Words t in expression_jCorpus textual data, N Indicate corpus text sum.

Wherein, log (tf_ij+ 1.0) TF is indicated,Indicate IDF.

The improved TF-IDF algorithm enables algorithm normally to realize in extreme circumstances.For example, working as tf_ij、n_j、 When N is 0, behind plus 1, it is smoothed, avoids algorithm meaningless.

S104, input vector keyword to Naive Bayes Classifier, determine that user is intended to.

It is Bayes' theorem as follows:

Where it is assumed that the value of Feature Words is independent of one another, i.e., there is no the relationship of interaction between Feature Words.Y={ y₁, y2,...,y_iShare i type, X={ x₁,x₂,...,x_nThere is n independent characteristic.

So, due toThe value of P (X) is fixed, therefore compares P (X | Y) * P (Y), i.e. By (y_i)=max (P (X | y_i) * P (yi)), it is then, mutually indepedent between each Feature Words according to Bayes' theorem, therefore, P (X | y_i)=P (x₁, x₂, x₃..., x_n|y_i)=P (x₁y_i)P(x₂y_i)...P(x_ny_i)。

So,

Wherein, P (y_i) indicate the total degree of i-th of classification appearance divided by the sum of all categories frequency of occurrence.

P(x_n|y_i) indicate under i-th of classification, the probability that n-th of feature lexical item occurs.

In the above-mentioned TF-IDF matrix found out, Sum (x_n∈y_i) need to count under i-th of classification, all the specific words The sum of weight, Sum (y_i) the sum of the weight of this all Feature Words i.e. under i-th of classification.

It is so available to appear in the probability highest of Y classification under that is, each feature set of words determines at P (Y | X), it may be assumed that

Y=max P (x1 | yi) * P (x2 | yi) * ... * P (xn | yi) * P (yi) }

The keyword of input vector can determine that user is intended to above-mentioned Naive Bayes Classifier.

In embodiments of the present invention, the dialog text for obtaining user's input, dialog text is pre-processed, is talked with After keyword in text, input keyword to sample training model, each keyword of vectorization, after then inputting the vectorization Keyword to Naive Bayes Classifier, to determine that user is intended to, when can solve human-computer dialogue, robot, which can not provide, to be met The problem of answer that user is intended to.

Referring to Fig. 2, Fig. 2 is the flow diagram for user's intension recognizing method that second embodiment of the invention provides, it should Method includes:

S201, the dialog text for obtaining user's input；

S202, dialog text is pre-processed, obtains the keyword in dialog text；

S203, input keyword to sample training model, each keyword of vectorization；

In sample training model, illustratively, corpus text is denoted as D for two₁And D₂, then N=2, D₁And D₂ Corresponding user is intended to for game and face value, and the sample of the two composition is as shown in table 1 below:

Table 1

After carrying out word segmentation processing and removal stop words processing to above-mentioned two corpus texts, obtain:

D₁: outstanding this of game direct broadcasting room is marched into the arena out greatly

D₂: the small elder sister's song of direct broadcasting room is pleasing to the ear

Therefore all feature set of words are as follows: game direct broadcasting room, Jie Si, marches into the arena, opens large and small elder sister, is song, pleasing to the ear, amounts to 8 A Feature Words.Then, " game " is t₁, " direct broadcasting room " is t₂, " Jie Si " be t₃, " marching into the arena " be t₄, " open big " be t₅, " small elder sister " be t₆, " song " be t₇, it is " pleasing to the ear " be t₈。

Then, by each Feature Words t_jIt projects in feature set of words, obtains each Feature Words t_jIn its corpus text D_iIn word Frequency tf_ij。

Specifically, in corpus text D₁In, t₁Occur 1 time, tf₁₁=1；t₂Occur 1 time, tf₁₂=1；t₃Occur 1 time, tf₁₃ =1；t₄Occur 1 time, tf₁₄=1；t₅Occur 1 time, tf₁₅=1；t₆Occur 0 time, tf₁₆=0；t₇Occur 0 time, tf₁₇=0；t₈Out It is 0 time existing, tf₁₈=0.Then, D₁=[1,1,1,1,1,0,0,0].

In corpus text D₂In, t₁Occur 0 time, tf₂₁=1；t₂Occur 1 time, tf₂₂=1；t₃Occur 0 time, tf₂₃=0；t₄Out It is 0 time existing, tf₂₄=0；t₅Occur 0 time, tf₂₅=0；t₆Occur 1 time, tf₂₆=1；t₇Occur 1 time, tf₂₇=1；t₈Occur 2 times, tf₂₈ =2.Then, D₂=[0,1,0,0,0,1,1,2].

Then, according to the part TF in improved TF-IDF formula, by tf_ijAll plus 1 and to counting mode normalization at Reason, obtains word frequency TF:

D₁=[log (2/13), log (2/13), log (2/13), log (2/13), log (2/13), log (1/13), log (1/13),log(1/13)]

D₂=[log (1/13), log (2/13), log (1/13), log (1/13), log (1/13), log (2/13), log (2/13),log(3/13)]

Further, in feature set of words, " game " is in D₁Middle appearance, therefore there is the corpus text of Feature Words " game " Number is 1, i.e. n₁=1." direct broadcasting room " is in D₁And D₂In occur, therefore occur Feature Words " direct broadcasting room " corpus textual data be 2, i.e., n₂=2.It is identical, n₃=1, n₄=1, n₅=1, n₆=1, n₇=1, " pleasing to the ear " is in D₂Middle appearance (disregarding frequency of occurrence), therefore go out The corpus textual data of existing Feature Words " pleasing to the ear " is 1, i.e. n₈=1.

Further, by above-mentioned n₁=1, n₂=2, n₃=1, n₄=1, n₅=1, n₆=1, n₇=1, n₈After=1 brings improvement into TF-IDF algorithm in the part IDF, find out inverse document frequency IDF are as follows: [log (2/3), log (3/3), log (2/3), log(2/3),log(2/3),log(2/3),log(2/3),log(2/3)]。

Therefore, improved TF-IDF algorithm, obtained TF-IDF weight matrix, i.e., in corpus text after vectorization are based on Each Feature Words are as follows:

D₁=[log (2/13) * log (2/3), log (2/13) * log (3/3), log (2/13) * log (2/3), log (2/ 13)*log(2/3),log(2/13)*log(2/3),log(1/13)*log(2/3),log(1/13)*log(2/3),log(1/ 13)*log(2/3)]

D₂=[log (1/13) * log (2/3), log (2/13) * log (3/3), log (1/13) * log (2/3), log (1/ 13)*log(2/3),log(1/13)*log(2/3),log(2/13)*log(2/3),log(2/13)*log(2/3),log(3/ 13)*log(2/3)]

Further, it by Feature Words each in TF-IDF matrix, is projected in feature set of words, even in corresponding corpus text There is the specific word in this, then the Feature Words after retaining the vectorization.If there is not this in corresponding corpus text Feature Words after the vectorization are then denoted as 0 by Feature Words, obtain TF-IDF weight matrix:

D₁=[log (2/13) * log (2/3), log (2/13) * log (3/3), log (2/13) * log (2/3), log (2/ 13)*log(2/3),log(2/13)*log(2/3),0,0,0]

D₂=[0, log (2/13) * log (3/3), 0,0,0, log (2/13) * log (2/3), log (2/13) * log (2/ 3),2*log(3/13)*log(2/3)]

It should be noted that if there is the specific word in corresponding corpus text, also need to calculate the specific word appearance Number, for example, above-mentioned " pleasing to the ear " in D₂Middle appearance twice, is denoted as 2*log (3/13) * log (2/3).

Above-mentioned steps are the overall process of sample training model realization.

Illustratively, user's input dialogue text is by taking " I likes the song of small elder sister " as an example.It is carried out above-mentioned identical Word segmentation processing and removal stop words processing, obtain keyword: I likes small elder sister's song.

Further, above-mentioned keyword is input to TF-IDF weight matrix, each keyword of vectorization.Specifically, due to Keyword " I ", " liking " are not present in above-mentioned matrix, therefore are denoted as 0.There are two kinds of users again to be intended to, therefore is generated after inputting Two matrixes, conditional probability of the element representation keyword of each matrix in the case where user is intended to:

Game: [0,0, log (1/13) * log (2/3), log (1/13) * log (2/3)]

Face value: [0,0, log (2/13) * log (2/3), log (2/13) * log (2/3)]

Further, for it is above-mentioned there are conditional probability be 0 the case where, will after all conditions probability add 1, therefore It arrives:

Game: [1,1, log (1/13) * log (2/3)+1, log (1/13) * log (2/3)+1]

Face value: [1,1, log (2/13) * log (2/3)+1, log (2/13) * log (2/3)+1]

Keyword after S204, input vector determines that user is intended to Naive Bayes Classifier.

In the case where mutually indepedent between each keyword, the Bayesian formula as shown in the embodiment of the present invention one, i.e., Naive Bayes Classifier calculates separately dialog text and corresponds to user's intention probability that each user is intended to, obtains dialog text pair The user for answering each user to be intended to is intended to probability value.

Further, more above-mentioned each user is intended to probability value, can obtain:

P (face value | I likes small elder sister's song) > P (game | I likes small elder sister's song)

Finally can be obtained from result above, user be intended to the maximum user of probability value be intended to face value, using face value as pair The user for talking about text is intended to output.

Referring to Fig. 3, Fig. 3 is the structural schematic diagram for user's intention assessment device that third embodiment of the invention provides, it should Device includes:

Obtain module 301, preprocessing module 302, the first input module 303 and the second input module 304.

Module 301 is obtained, for obtaining the dialog text of user's input.

Preprocessing module 302, dialog text is pre-processed, and obtains the keyword in the dialog text.

First input module 303, for inputting keyword to sample training model, each keyword of vectorization.

Second input module 304 determines that user anticipates for the keyword after input vector to Naive Bayes Classifier Figure.

Fig. 4 is referred to, Fig. 4 shows a kind of hardware structure diagram of mobile terminal.

Mobile terminal as described in this embodiment, comprising:

Memory 41, processor 42 and it is stored in the computer program that can be run on memory 41 and on a processor, located Reason device realizes user's intension recognizing method described in aforementioned embodiment illustrated in fig. 1 when executing the program.

Further, the mobile terminal further include:

At least one input equipment 43；At least one output equipment 44.

Above-mentioned memory 41,42 input equipment 43 of processor and output equipment 44 are connected by bus 45.

Wherein, input equipment 43 concretely camera, touch panel, physical button or mouse etc..Output equipment 44 concretely display screens.

Memory 41 can be high random access memory body (RAM, Random Access Memory) memory, can also For non-labile memory (non-volatile memory), such as magnetic disk storage.Memory 41 can for storing one group Program code is executed, processor 42 is coupled with memory 41.

Further, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages Medium can be in the terminal being set in the various embodiments described above, which can be shown in earlier figures 4 Memory in embodiment.It is stored with computer program on the computer readable storage medium, when which is executed by processor Realize user's intension recognizing method described in aforementioned embodiment illustrated in fig. 1.Further, the computer can storage medium may be used also To be USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), the various media that can store program code such as magnetic or disk.

It should be noted that each functional module in each embodiment of the present invention can integrate in a processing module In, it is also possible to modules and physically exists alone, can also be integrated in two or more modules in a module.It is above-mentioned Integrated module both can take the form of hardware realization, can also be realized in the form of software function module.

If the integrated module is realized in the form of software function module and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies.

It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this hair Necessary to bright.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

The above are to a kind of user's intension recognizing method provided by the present invention, device, mobile terminal and storage medium Description, for those skilled in the art, thought according to an embodiment of the present invention, in specific embodiments and applications It will change, to sum up, the contents of this specification are not to be construed as limiting the invention.

Claims

1. a kind of user's intension recognizing method, which is characterized in that the described method includes:

Obtain the dialog text of user's input；

The dialog text is pre-processed, the keyword in the dialog text is obtained；

2. obtaining institute the method according to claim 1, wherein described pre-process the dialog text The keyword stated in dialog text includes:

Word segmentation processing is carried out to the dialog text, obtains each participle in the dialog text；

According to preset deactivated vocabulary, the existing stop words in each participle is removed, the pass in the dialog text is obtained Keyword.

3. method according to claim 1 or 2, which is characterized in that the keyword of the input vectorization to simplicity Bayes classifier determines that user is intended to include:

Using Naive Bayes Classifier, calculates separately the dialog text and correspond to user's intention probability that each user is intended to, obtain The user that each user is intended to, which is corresponded to, to dialog text is intended to probability value；

Compare each user and be intended to probability value, the user is intended to the maximum user of probability value and is intended to as the dialogue text This user is intended to output.

4. according to the method described in claim 3, it is characterized in that, the sample training model uses a plurality of corpus text and institute The sample that the corresponding user of predicate material text is intended to composition is trained, and the Feature Words in corpus text described in vectorization obtain The sample training model.

5. according to the method described in claim 4, it is characterized in that, Feature Words in corpus text described in the vectorization, obtain Include: to sample training model

The corpus text is pre-processed, the Feature Words in the corpus text are obtained；

By improved TF-IDF algorithm, each Feature Words of vectorization obtain the TF-IDF weight matrix of sample training；

D_iFor the corpus text, D=(t₁,t₂,...,t_m), wherein t_jIt is j-th of Feature Words after participle, 1≤j≤m, w_ij Indicate Feature Words t_jIn corpus text D_iIn weight, tf_ijIndicate Feature Words t_jIn corpus text D_iIn word frequency, n_jIt represents Existing Feature Words t_jCorpus textual data, N indicate corpus text sum.

6. according to the method described in claim 5, it is characterized in that, the input keyword to sample training model, to Quantifying each keyword includes:

Input the TF-IDF weight matrix of the keyword to sample training, each keyword of vectorization.

7. according to the method described in claim 6, it is characterized in that, the value of each keyword after there are the vectorization is 0 When, the value of the keyword after each vectorization is added one.

8. a kind of user's intention assessment device, which is characterized in that described device includes:

Module is obtained, for obtaining the dialog text of user's input；

Second input module determines that user is intended to for inputting the keyword after the vectorization to Naive Bayes Classifier.

9. a kind of mobile terminal, comprising: memory, processor and storage are on a memory and the calculating that can run on a processor Machine program, which is characterized in that when the processor executes the computer program, realize any one of claim 1 to 7 institute The each step in user's intension recognizing method stated.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program When being executed by processor, each step in user's intension recognizing method described in any one of claim 1 to 7 is realized.