Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member
Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware
Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing
Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server
Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution
In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each
Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with
Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions
The signals of data distinguish one entity or operation from another entity or operation, and not necessarily require or secretly
Show that there are any actual relationship or orders between these entities or operation.Moreover, the terms "include", "comprise", no
Only include those elements, but also including other elements that are not explicitly listed, or further include for this process, method,
Article or the intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence " including ... ", and
It is not excluded in process, method, article or equipment in the process, method, article or apparatus that includes the element that there is also other identical elements.
Regard the dialog procedure of person to person's machine conversational system as a Markovian decision mistake in embodiments of the present invention
Journey (Markov Decision Process, MDP).At each moment, interactive system is in state s, according to tactful π
(a | s) movement a is taken, observe the reply o ' of user, state is changed into s ', receives reward r '.
The embodiment of the present invention provides a kind of interactive method, is applied to interactive system, as shown in Figure 1, man-machine
Dialogue method includes:
S11, using the current state of the interactive system as the input of first nerves network, it is described man-machine with determination
For conversational system to the active interlocution mode of user, first nerves network is the first Q neural network.
Wherein, at least based on user and the interactive system from starting dialogue accessed knowledge until current time
The statistical information and actualite information of point generate the current state.The actualite information is described current for characterizing
The statistical information of the topic vector information of topic, the access knowledge point includes at least:
The primary vector information for the knowledge point quantity being accessed by the user in actualite;
The secondary vector information for the knowledge point quantity for negating by user in actualite;
From the third vector information for starting to talk with until current time the knowledge point quantity that user negated;
From the 4th vector information for starting to talk with until current time the maximum knowledge point quantity that user continuously negated.
The statistical information for accessing knowledge point detects user for the reply situation of recommended knowledge point by interactive system
It determines, specifically:
Interactive system parses the reply of user by semantic understanding module, and there are four types of types for parsing result:
Selecting one in the knowledge point of system recommendation, semantic expressiveness is inform (series=id), such as " first
It is a ", " second ", " the last one ";
User is to the affirmative or negative acknowledge of rhetorical question, and semantic expressiveness negates for deny (), affirm () is affirmed, such as
" good, to introduce ", " having no interest ", " being not desired to understand ";
Last round of system is not recommended and is asked in reply or user pays no attention to the recommendation and rhetorical question of system, inquires one again and knows
Know point kt, semantic expressiveness is inform (kp=kt)。
Terminate dialogue, semantic expressiveness is bye ().
In the present embodiment, accesses and know in the dialog procedure that active user had carried out with interactive system of combining closely
Statistical information and the actualite of point are known to determine the current state of conversational system, to determine for inputting first nerves network
Interactive system actively initiates the active interlocution mode of dialogue again, so that the interactive method of the present embodiment is each time
It actively initiates that the conversational feedback information having been carried out and actualite can be comprehensively considered when dialogue to determine that active is right
Words mode, to ensure that selected active interlocution mode all has more specific aim each time, more meets current meaning of being close to the users
It is willing to, so that the active interlocution mode that interactive system is initiated is easier to be received by user, ensure that interactive
Smooth pleasant progress improves the human-computer dialogue experience of user.
S12, topic to be recommended is determined according to the active interlocution mode;
Wherein, the active interlocution mode include at least rhetorical question conversational mode, recommend conversational mode and neither ask in reply nor
Recommend, specifically:
Whether the knowledge point kt under-confirm (kp=kt), rhetorical question user session topic t is interested, such as " you are to us
Far field speech recognition technology it is interested? ".
Recommend n knowledge point to user, such as:
" you can also ask me: 1, introducing the typical products for thinking must to speed in terms of intelligentized Furniture.2, day cat spirit is used
Your which technology? 3, which the hardware product for thinking to speed has? ".In order to simple, it can be assumed that this n knowledge point is from n
Topic, i.e., from one knowledge point of stochastical sampling in each topic (it is of course also possible to be from m topic selection n knowledge point,
Wherein m is less than n).
- null, expression are neither recommended nor are asked in reply.
S13, come using the current state and the feature vector of the topic to be recommended as the input of nervus opticus network
Determine that the recommendation probability of the topic to be recommended, nervus opticus network are the 2nd Q neural network;
When the active interlocution mode is rhetorical question conversational mode, the topic to be recommended is a candidate topics, described
It is determined using the feature vector of the current state and the topic to be recommended as the input of nervus opticus network described wait push away
The recommendation probability for recommending topic includes:
Using the current state and the feature vector of the candidate topics as the input of nervus opticus network to determine
State the recommendation probability of candidate topics.
When the active interlocution mode is to recommend conversational mode, the topic to be recommended includes multiple candidate topics, institute
State using the current state and the feature vector of the topic to be recommended as the input of nervus opticus network determine it is described to
Recommend topic recommendation probability include:
Come using the current state and the feature vector of the multiple candidate topics as the input of nervus opticus network true
Surely correspond to multiple recommendation probability of the multiple candidate topics.
S14, select knowledge point to be recommended to be presented to the user from the topic to be recommended according to the recommendation probability value.
The present embodiment determines the active interlocution mode between user according to the current state of interactive system, thus
It can guarantee that the dialogic operation that interactive system is actively initiated is more targetedly, to meet current interactive progress
Situation is improved for experiencing.It is corresponding after determining active interlocution mode to determine the words to be recommended for being used for active interlocution again
Topic, further using preparatory current state and to be recommended theme of the trained nervus opticus network according to interactive system
Feature vector determine that topic to be recommended recommends probability, so as to according to recommending the probability to come from corresponding topic to be recommended
Obtain the knowledge point for initiating active interlocution.
As shown in Fig. 2, in some embodiments of the invention, it is described at least based on user and the interactive system from
Start to talk with until current time the statistical information of accessed knowledge point and actualite information generates the current state packet
It includes:
S21, current system feature vector is generated based on the topic vector information and first to fourth vector information;
S22, it compresses the current system feature vector to obtain the current state using Recognition with Recurrent Neural Network.
The topic vector information and first to fourth vector information, which link up, may be constructed a feature vector, use
X is indicated.In addition to above- mentioned information, historical information is also critically important, can will be from right with Recognition with Recurrent Neural Network (for example, RNN/LSTM etc.)
Words start all x so far and are compressed into a state expression vector s, i.e. the input at each moment of network is the corresponding moment
X, last moment corresponding state be s.The present embodiment considers the selection of topic by using Recognition with Recurrent Neural Network to go through
History information, so that jumping every time more personalized and accurate.
As shown in figure 3, in some embodiments, interactive method further include:
S31, record and store each round dialogue in dialogue empirical data, it is described right to be used to form empirical data pond
Words empirical data include at least each round talk with the states of corresponding Current dialog systems, the action taken, subsequent time pair
The prize signal that the state and conversational system of telephone system receive;
S32, the first nerves net is trained based on the dialogue empirical data in the empirical data pond according to predetermined period
Network and/or the nervus opticus network.
Prize signal can come from following aspects in above-described embodiment:
The reply that user recommends interactive system or asks in reply
If user is selected from last round of recommendation knowledge point, it can give system one positive value reward,
Otherwise give system one negative value reward;
If user indicates affirmative acknowledgement (ACK) to the rhetorical question knowledge point of system, gives system one positive value reward, otherwise give
One negative value reward of system.
The specific reward value of above-mentioned two situations can be different, generally, asks in reply the dialogue of mode more naturally, institute
With the absolute value for asking in reply (positive/negative) reward value of mode can be bigger than recommendation pattern.
For system, if being centainly oriented to purpose, a forward direction is obtained if having reached the purpose of oneself
Excitation.And the target of system, it is different according to the type of conversational system, such as:
Publicity class: illustrating enough information to user, every to introduce a knowledge point to user and then obtain a forward direction
Reward, different rewards can be set in the knowledge point of different topics;
Shopping guide's class: producing single purchase corelation behaviour, then obtains a biggish positive reward;
Commercial class: facilitating a business cooperation, then obtains a biggish positive reward;
Recruitment class: obtaining resume, then obtains a biggish positive reward.
Prize signal can derive from above-mentioned two aspect incessantly, and the designer of system can specifically set according to specific tasks
Meter.
There is above-mentioned prize signal, then it can be with nitrification enhancement come optimisation strategy, so that basis in each state
The active interlocution mode and topic (knowledge point) of policy selection are all that the progressive award for making to obtain maximizes.During the present invention implements
Dialog strategy is indicated with Q network, with DQN (Deep-Q-Networks) come optimisation strategy.Unlike common DQN, this
There are three types of different strategies in inventive embodiments: main strategy, Generalization bounds, rhetorical question strategy.Each strategy has oneself corresponding Q net
Network, they can share an empirical data pond D.When each undated parameter, a collection of (batch) number is sampled from empirical data pond
According to then according to root loss function (TD error) Optimal Parameters.At the initial stage of system optimization, if do not carried out to the topic of selection
Limitation then has very big randomness according to the topic that policy selection arrives, and effect may not be highly desirable.It is asked to solve this
Topic, the topic of selection can be limited in a certain range by system optimization initial stage, such as fraternal topic, the sub- words of actualite
Topic etc..
Dialog strategy decides how user is recommended and be asked in reply, and is broadly divided into three steps:
Step 1: active interlocution mode is determined: by main strategy πm(am| s) determine recommend, rhetorical question, still neither recommend nor
Any mode in rhetorical question.
As shown in figure 4, the structural schematic diagram of the Q neural network of main strategy is indicated in the embodiment of the present invention, Q neural network
Input is current state s, and output layer has 3 dimensions, respectively corresponds selection 3 kinds of modes Q value obtained, then every kind of way of recommendation amIt is right
The probability answered are as follows:
τ is the hyper parameter for the degree that a control strategy is explored in above-mentioned formula, and τ is bigger, and above-mentioned probability is average,
The degree that system is explored is bigger.Generally, τ starts larger, is then gradually reduced.When each decision, according to above-mentioned probability
It carries out sampling a kind of active interlocution mode.
Step 2: it carries out topic reasoning: speculating user's next possible interested topic, that is, select one or more
Possible topic needs to select a topic, if step if the active interlocution mode sampled in step 1 is rhetorical question
The one active interlocution mode sampled is to recommend, then needs to select n topic.
As shown in figure 5, the structural representation of the Q neural network of Q value when to select topic t under current state for determining
Figure, the network include two branching networks, and the input of one of branching networks is current state s, after several layer networks
φ (s) is indicated to a state, and the input of another branching networks is that the vectorization of topic t indicates et, into after excessively several layer networks
Obtain another and φ (s) indicates with the vector of dimensionThe Q value Q (s, t) of topic t is then selected under corresponding current state
For the inner product of above-mentioned two vector, it may be assumed that
For every kind of possible candidate topics, corresponding Q value is calculated first with above-mentioned formula, then according to following formula
Calculate the probability of every kind of theme of every kind of selection:
If active interlocution mode is rhetorical question, a theme is sampled according to above-mentioned probability, if it is recommendation pattern, then root
N theme is sampled according to above-mentioned probability.
Step 3: a knowledge point is randomly choosed from each theme sampled as rhetorical question or recommends knowledge point.
As shown in fig. 6, the flow chart of the embodiment for interactive method of the invention, comprising the following steps:
Step 1: the enquirement or reply of user are received;
Step 2: carrying out semantic understanding to user, and there are four types of parsing results: selecting in the knowledge point of system recommendation
One;To the affirmative or negative acknowledge of the rhetorical question of user;User inquires a knowledge point k againt;User wants to exit dialogue;
Step 3: if user is not desired to exit, actualite is updated according to semantic parsing result;
Step 4: it determines to reply the main contents of user.It is the knowledge selected from last round of recommendation if it is user
Point, or the affirmative acknowledgement (ACK) to system rhetorical question knowledge point, then directly provide the knowledge point contents;It is re-prompted if it is user
One problem, then inquire corresponding knowledge point contents from knowledge base;
Step 5: dialogue state is updated.Actualite and session features are extracted as the current time of Recognition with Recurrent Neural Network
Input, the vectorization for obtaining the hiding expression at current time as current state indicates;
Step 6: main strategy decision active interlocution mode: recommend, rhetorical question, neither recommend nor ask in reply;
Step 7: according to the recommendation pattern selected in step 6, specific active interlocution content is determined.If it is recommendation mould
Formula then selects n topic according to Generalization bounds, then samples a knowledge point from each topic and forms recommendation list;If
Then rhetorical question mode selects a knowledge point as rhetorical question content then according to rhetorical question one topic of policy selection from topic;Such as
Fruit is neither to recommend nor ask in reply, then when front-wheel active interlocution content is sky.
Step 8: the active interlocution content in the reply content and step 7 in step 4 is showed into user, returns to step
Rapid one.
Above-mentioned steps are the online service processes of whole system, without reference to the training process of strategy.It services on line
In the process, empirical data can be stored in an empirical data pond by system, and each experience includes the shape of current interactive system
State, the movement taken (selection active interlocution mode, for example, recommending conversational mode or rhetorical question conversational mode), next moment
The state of interactive system, the reward received.Training process can be decoupled with online service process, online lower progress.Every
At certain moment, training service samples a collection of (batch) data from empirical data pond, then according to the loss function (TD of root DQN
Error) optimisation strategy parameter.After parameter updates, the service that is then pushed to newest parameter on line.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Movement merge, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
As shown in fig. 7, the embodiment of the present invention also provides a kind of interactive system 700 comprising:
Active interlocution mode determines program module 710, for using the current state of the interactive system as first
The input of neural network, in such a way that the determination interactive system is to the active interlocution of user;
Topic to be recommended determines program module 720, for determining topic to be recommended according to the active interlocution mode;
Recommend determine the probability program module 730, for the feature vector of the current state and the topic to be recommended
The recommendation probability of the topic to be recommended is determined as the input of nervus opticus network;
Recommend knowledge point option program module 740, for selecting from the topic to be recommended according to the recommendation probability value
Knowledge point to be recommended is selected to be presented to the user.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit
Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but
It is not limited to computer, server or the network equipment etc.) it reads and executes, with man-machine for executing any of the above-described of the present invention
Dialogue method.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, and the computer program produces
Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes that program refers to
It enables, when described program instruction is computer-executed, the computer is made to execute any of the above-described interactive method.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor,
And the memory being connect at least one described processor communication, wherein the memory is stored with can be by described at least one
The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy
Enough execute interactive method.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program,
It is characterized in that, the step of which is executed by processor interactive method.
The interactive system of the embodiments of the present invention can be used for executing the interactive method of the embodiment of the present invention, and
Reach the realization interactive method technical effect achieved of the embodiments of the present invention accordingly, which is not described herein again.This
Hardware processor (hardware processor) Lai Shixian related function module can be passed through in inventive embodiments.
Fig. 8 is the hardware configuration signal of the electronic equipment for the execution interactive method that another embodiment of the application provides
Figure, as shown in figure 8, the equipment includes:
One or more processors 810 and memory 820, in Fig. 8 by taking a processor 810 as an example.
The equipment for executing interactive method can also include: input unit 830 and output device 840.
Processor 810, memory 820, input unit 830 and output device 840 can pass through bus or other modes
It connects, in Fig. 8 for being connected by bus.
Memory 820 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey
Sequence, non-volatile computer executable program and module, such as the corresponding program of interactive method in the embodiment of the present application
Instruction/module.Non-volatile software program, instruction and the module that processor 810 is stored in memory 820 by operation,
Thereby executing the various function application and data processing of server, i.e. realization above method embodiment interactive method.
Memory 820 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;Storage data area can be stored to be created according to using for human-computer dialogue device
Data etc..In addition, memory 820 may include high-speed random access memory, it can also include nonvolatile memory, example
Such as at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments, it deposits
Optional reservoir 820 includes the memory remotely located relative to processor 810, these remote memories can pass through network connection
To human-computer dialogue device.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication
And combinations thereof.
Input unit 830 can receive the number or character information of input, and generates and set with the user of human-computer dialogue device
It sets and the related signal of function control.Output device 840 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 820, when by one or more of processors
When 810 execution, the interactive method in above-mentioned any means embodiment is executed.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology
Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer
Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to
So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or
Method described in certain parts of embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and
Range.