Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, member
Part, data structure etc..The present invention can also be put into practice in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage device.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware
Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing
Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server
Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution
In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each
Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with
Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions
The signals of data distinguish one entity or operation from another entity or operation, and not necessarily require or secretly
Show that there are any actual relationship or orders between these entities or operation.Moreover, the terms "include", "comprise", no
Only include those elements, but also include other elements that are not explicitly listed, or further include for this process, method,
Article or the intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence " including ... ", and
It is not excluded in process, method, article or equipment in the process, method, article or apparatus that includes the element that there is also other identical elements.
Regard the dialog procedure of person to person's machine conversational system as a Markovian decision mistake in embodiments of the present invention
Journey (Markov Decision Process, MDP).At each moment, interactive system is in state s, according to tactful π
(a | s) action a is taken, it observes that the reply o ' of user, state are changed into s ', receives reward r '.
The embodiment of the present invention provides a kind of interactive method, is applied to interactive system, as shown in Figure 1, man-machine
Dialogue method includes:
S11, using the current state of the interactive system as the input of first nerves network, it is described man-machine with determination
For conversational system to the active interlocution mode of user, first nerves network is the first Q neural networks.
Wherein, user and the interactive system are at least based on from starting dialogue accessed knowledge until current time
The statistical information and actualite information of point generate the current state.The actualite information is described current for characterizing
The topic vector information of topic, the statistical information for accessing knowledge point include at least:
The primary vector information for the knowledge point quantity being accessed by the user in actualite;
The secondary vector information for the knowledge point quantity for negating by user in actualite;
From the third vector information for starting to talk with until current time the knowledge point quantity that user negated;
From the 4th vector information for starting to talk with until current time the maximum knowledge point quantity that user negated continuously.
Access the reply situation that the statistical information of knowledge point detects user for recommended knowledge point by interactive system
It determines, specifically:
Interactive system parses the reply of user by semantic understanding module, and there are four types of types for analysis result:
Selecting one in the knowledge point of system recommendation, semantic expressiveness is inform (series=id), such as " first
It is a ", " second ", " the last one ";
Affirmative or negative acknowledge of the user to rhetorical question, semantic expressiveness negates for deny (), affirm () is affirmed, such as
" good, to introduce ", " having no interest ", " being not desired to understand ";
Last round of system does not have the recommendation and rhetorical question that system is paid no attention in recommendation and rhetorical question or user, inquires one again and knows
Know point kt, semantic expressiveness is inform (kp=kt)。
Terminate dialogue, semantic expressiveness is bye ().
In the present embodiment, accesses and know in the dialog procedure that active user had carried out with interactive system of combining closely
Statistical information and the actualite of point are known to determine the current state of conversational system, are determined for input first nerves network
Interactive system actively initiates the active interlocution mode of dialogue again so that the interactive method of the present embodiment is each time
It actively initiates that when dialogue the conversational feedback information having been carried out and actualite can be considered to determine that active is right
Words mode, to ensure that, selected active interlocution mode all has more specific aim each time, more meets current meaning of being close to the users
It is willing to, so that the active interlocution mode that interactive system is initiated is easier to be received by user, ensure that interactive
Smooth pleasant progress improves the human-computer dialogue experience of user.
S12, topic to be recommended is determined according to the active interlocution mode;
Wherein, the active interlocution mode include at least rhetorical question conversational mode, recommend conversational mode and neither ask in reply nor
Recommend, specifically:
Whether the knowledge point kt under-confirm (kp=kt), rhetorical question user session topic t is interested, such as " you are to us
Far field speech recognition technology it is interested?".
Recommend n knowledge point to user, such as:
" you can also ask me:1, the typical products for thinking must to speed in terms of intelligentized Furniture are introduced.2, day cat spirit is used
Your which technology?3, which the hardware product for thinking to speed has?".In order to simple, it can be assumed that this n knowledge point is from n
Topic, i.e., from each topic one knowledge point of stochastical sampling (it is of course also possible to be from m topic selection n knowledge point,
Wherein m is less than n).
- null, expression are neither recommended nor are asked in reply.
S13, come using the current state and the feature vector of the topic to be recommended as the input of nervus opticus network
Determine that the recommendation probability of the topic to be recommended, nervus opticus network are the 2nd Q neural networks;
When the active interlocution mode is rhetorical question conversational mode, the topic to be recommended is a candidate topics, described
It is determined using the current state and the feature vector of the topic to be recommended as the input of nervus opticus network and described waits pushing away
The recommendation probability for recommending topic includes:
Using the current state and the feature vector of the candidate topics as the input of nervus opticus network to determine
State the recommendation probability of candidate topics.
When the active interlocution mode is to recommend conversational mode, the topic to be recommended includes multiple candidate topics, institute
It states and determines described wait for using the current state and the feature vector of the topic to be recommended as the input of nervus opticus network
Recommend topic recommendation probability include:
Come using the current state and the feature vector of the multiple candidate topics as the input of nervus opticus network true
Surely correspond to multiple recommendation probability of the multiple candidate topics.
S14, select knowledge point to be recommended to be presented to the user from the topic to be recommended according to the recommendation probability value.
The present embodiment determines the active interlocution mode between user according to the current state of interactive system, to
It can ensure that the dialogic operation that interactive system is actively initiated is more targetedly, to meet current interactive progress
Situation is improved for experiencing.The corresponding words to be recommended determined again for active interlocution after determining active interlocution mode
Topic further uses advance current state and to be recommended theme of the trained nervus opticus network according to interactive system
Feature vector determine that topic to be recommended recommends probability, so as to according to recommending the probability to come from corresponding topic to be recommended
Obtain the knowledge point for initiating active interlocution.
As shown in Fig. 2, in some embodiments of the invention, it is described be at least based on user and the interactive system from
Start to talk with until current time the statistical information of accessed knowledge point and actualite information generates the current state packet
It includes:
S21, current system feature vector is generated based on the topic vector information and first to fourth vector information;
S22, it compresses the current system feature vector to obtain the current state using Recognition with Recurrent Neural Network.
The topic vector information and first to fourth vector information, which link up, may be constructed a feature vector, use
X is indicated.In addition to above- mentioned information, historical information is also critically important, can use Recognition with Recurrent Neural Network (for example, RNN/LSTM etc.) will be from right
Words start all x so far and are compressed into a state expression vector s, i.e. the input at each moment of network is the corresponding moment
X, last moment corresponding state be s.The present embodiment makes the selection of topic considers to go through by using Recognition with Recurrent Neural Network
History information so that redirect every time more personalized and accurate.
As shown in figure 3, in some embodiments, interactive method further includes:
S31, record simultaneously store dialogue empirical data in each round dialogue, described right for forming empirical data pond
Words empirical data include at least each round talk with the states of corresponding Current dialog systems, the action taken, subsequent time pair
The prize signal that the state and conversational system of telephone system receive;
S32, the first nerves net is trained based on the dialogue empirical data in the empirical data pond according to predetermined period
Network and/or the nervus opticus network.
Prize signal can come from following aspects in above-described embodiment:
The reply that user recommends interactive system or asks in reply
If user is selected from last round of recommendation knowledge point, it can give system one positive value reward,
Otherwise give system one negative value reward;
If user indicates affirmative acknowledgement (ACK) to the rhetorical question knowledge point of system, gives system one positive value reward, otherwise give
One negative value reward of system.
The specific reward value of above-mentioned two situations can be different, usually, asks in reply the dialogue of pattern more naturally, institute
With the absolute value for asking in reply (positive/negative) reward value of pattern can be bigger than recommendation pattern.
For system, if being centainly oriented to purpose, a forward direction is obtained if having reached the purpose of oneself
Excitation.And the target of system, it is different according to the type of conversational system, such as:
Publicize class:Enough information is illustrated to user, often introducing a knowledge point to user then obtains a forward direction
Reward, different rewards can be arranged in the knowledge point of different topics;
Shopping guide's class:Single purchase corelation behaviour is produced, then obtains a larger positive reward;
Commercial class:A business cooperation is facilitated, then obtains a larger positive reward;
Recruit class:Resume is obtained, then obtains a larger positive reward.
In terms of prize signal can derive from above-mentioned two incessantly, the designer of system can specifically set according to specific tasks
Meter.
There is above-mentioned prize signal, then it can be with nitrification enhancement come optimisation strategy so that basis in each state
The active interlocution mode and topic (knowledge point) of policy selection are all that the progressive award of acquisition is made to maximize.During the present invention implements
Dialog strategy is indicated with Q networks, with DQN (Deep-Q-Networks) come optimisation strategy.Unlike common DQN, this
There are three types of different strategies in inventive embodiments:Main strategy, Generalization bounds, rhetorical question strategy.Each strategy has oneself corresponding Q net
Network, they can share an empirical data pond D.When each undated parameter, a collection of (batch) number is sampled from empirical data pond
According to then according to root loss function (TD error) Optimal Parameters.At the initial stage of system optimization, if do not carried out to the topic of selection
Limitation then has prodigious randomness according to the topic that policy selection arrives, and effect may not be highly desirable.It is asked to solve this
Topic, the topic of selection can be limited in a certain range by system optimization initial stage, such as the fraternal topic of actualite, sub- words
Topic etc..
Dialog strategy decides how user is recommended and be asked in reply, and is broadly divided into three steps:
Step 1:Determine active interlocution mode:By main strategy πm(am| s) determine recommend, rhetorical question, still neither recommend nor
Any mode in rhetorical question.
As shown in figure 4, the structural schematic diagram of the Q neural networks of main strategy is indicated in the embodiment of the present invention, Q neural networks
Input is current state s, and output layer has 3 dimensions, the Q values that 3 kinds of modes of corresponding selection are obtained respectively, then each way of recommendation amIt is right
The probability answered is:
τ is the hyper parameter for the degree that a control strategy is explored in above-mentioned formula, and τ is bigger, and above-mentioned probability is average,
The degree that system is explored is bigger.Usually, τ starts larger, is then gradually reduced.When each decision, according to above-mentioned probability
It carries out sampling a kind of active interlocution mode.
Step 2:Carry out topic reasoning:Speculate user's next possible interested topic, that is, selects one or more
Possible topic needs to select a topic, if step if the active interlocution mode sampled in step 1 is rhetorical question
The one active interlocution mode sampled is to recommend, then needs to select n topic.
As shown in figure 5, being the structural representation of the Q neural networks of Q values when selecting topic t under current state for determining
Figure, the network include two branching networks, and the input of one of branching networks is current state s, after several layer networks
φ (s) is indicated to a state, and the input of another branching networks is the vectorization expression e of topic tt, into after excessively several layer networks
It obtains another and φ (s) is indicated with the vector of dimensionThe Q value Q (s, t) of topic t are then selected under corresponding current state
For the inner product of above-mentioned two vector, i.e.,:
For each possible candidate topics, corresponding Q values are calculated first with above-mentioned formula, then according to following formula
Calculate each probability for selecting each theme:
If active interlocution pattern is rhetorical question, a theme is sampled according to above-mentioned probability, if it is recommendation pattern, then root
N theme is sampled according to above-mentioned probability.
Step 3:A knowledge point is randomly choosed from each theme sampled as rhetorical question or recommends knowledge point.
As shown in fig. 6, the flow chart of the embodiment for the interactive method of the present invention, includes the following steps:
Step 1:Receive the enquirement or reply of user;
Step 2:Semantic understanding is carried out to user, there are four types of analysis results:It is selected in the knowledge point of system recommendation
One;Affirmative to the rhetorical question of user or negative acknowledge;User inquires a knowledge point k againt;User wants to exit dialogue;
Step 3:If user is not desired to exit, actualite is updated according to semantic analysis result;
Step 4:Determine the main contents of reply user.It is the knowledge selected from last round of recommendation if it is user
Point, or ask in reply system the affirmative acknowledgement (ACK) of knowledge point, then the knowledge point contents are directly provided;It is re-prompted if it is user
One problem, then inquire corresponding knowledge point contents from knowledge base;
Step 5:Update dialogue state.Extract the current time of actualite and session features as Recognition with Recurrent Neural Network
Input, the vectorization for obtaining the hiding expression at current time as current state indicates;
Step 6:Main strategy decision active interlocution mode:Recommend, rhetorical question, neither recommend nor ask in reply;
Step 7:According to the recommendation pattern selected in step 6, specific active interlocution content is determined.If it is recommendation mould
Formula then selects n topic according to Generalization bounds, and a knowledge point is then sampled from each topic forms recommendation list;If
Then rhetorical question pattern selects a knowledge point as rhetorical question content then according to rhetorical question one topic of policy selection from topic;Such as
Fruit is neither to recommend nor ask in reply, then when front-wheel active interlocution content is sky.
Step 8:By the active interlocution content displaying in the reply content and step 7 in step 4 to user, step is returned
Rapid one.
Above-mentioned steps are the online service processes of whole system, without reference to the training process of strategy.It services on line
In the process, empirical data can be stored in an empirical data pond by system, and each experience includes the shape of current interactive system
State, the action taken (selection active interlocution mode, for example, recommending conversational mode or rhetorical question conversational mode), next moment
The state of interactive system, the reward received.Training process can be decoupled with online service process, online lower progress.Every
At certain moment, training service samples a collection of (batch) data from empirical data pond, then according to the loss function (TD of root DQN
Error) optimisation strategy parameter.After parameter update, the service that is then pushed to newest parameter on line.
It should be noted that for each method embodiment above-mentioned, for simple description, therefore it is all expressed as a series of
Action merge, but those skilled in the art should understand that, the present invention is not limited by the described action sequence because
According to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiment.
As shown in fig. 7, the embodiment of the present invention also provides a kind of interactive system 700 comprising:
Active interlocution mode determines program module 710, for using the current state of the interactive system as first
The input of neural network, by the determination interactive system in a manner of the active interlocution of user;
Topic to be recommended determines program module 720, for determining topic to be recommended according to the active interlocution mode;
Recommend determine the probability program module 730, for the feature vector of the current state and the topic to be recommended
The recommendation probability of the topic to be recommended is determined as the input of nervus opticus network;
Recommend knowledge point option program module 740, for being selected from the topic to be recommended according to the recommendation probability value
Knowledge point to be recommended is selected to be presented to the user.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit
It includes the programs executed instruction to be stored in storage media one or more, it is described execute instruction can by electronic equipment (including but
It is not limited to computer, server or the network equipment etc.) it reads and executes, it is man-machine for executing any of the above-described of the present invention
Dialogue method.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, the computer program production
Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes that program refers to
It enables, when described program instruction is computer-executed, the computer is made to execute any of the above-described interactive method.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising:At least one processor,
And the memory being connect at least one processor communication, wherein the memory is stored with can be by described at least one
The instruction that a processor executes, described instruction is executed by least one processor, so that at least one processor energy
Enough execute interactive method.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program,
It is characterized in that, the step of which is executed by processor interactive method.
The interactive system of the embodiments of the present invention can be used for executing the interactive method of the embodiment of the present invention, and
Reach the technique effect that the realization interactive method of the embodiments of the present invention is reached accordingly, which is not described herein again.This
Related function module can be realized in inventive embodiments by hardware processor (hardware processor).
Fig. 8 is the hardware configuration signal of the electronic equipment for the execution interactive method that another embodiment of the application provides
Figure, as shown in figure 8, the equipment includes:
One or more processors 810 and memory 820, in Fig. 8 by taking a processor 810 as an example.
Execute interactive method equipment can also include:Input unit 830 and output device 840.
Processor 810, memory 820, input unit 830 and output device 840 can pass through bus or other modes
It connects, in Fig. 8 for being connected by bus.
Memory 820 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey
Sequence, non-volatile computer executable program and module, such as the corresponding program of interactive method in the embodiment of the present application
Instruction/module.Processor 810 is stored in non-volatile software program, instruction and module in memory 820 by operation,
Above method embodiment interactive method is realized in various function application to execute server and data processing.
Memory 820 may include storing program area and storage data field, wherein storing program area can store operation system
System, the required application program of at least one function;Storage data field can be stored to be created according to using for human-computer dialogue device
Data etc..In addition, memory 820 may include high-speed random access memory, can also include nonvolatile memory, example
Such as at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments, it deposits
It includes the memory remotely located relative to processor 810 that reservoir 820 is optional, these remote memories can pass through network connection
To human-computer dialogue device.The example of above-mentioned network includes but not limited to internet, intranet, LAN, mobile radio communication
And combinations thereof.
Input unit 830 can receive the number or character information of input, and generates and set with the user of human-computer dialogue device
It sets and the related signal of function control.Output device 840 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 820, when by one or more of processors
When 810 execution, the interactive method in above-mentioned any means embodiment is executed.
The said goods can perform the method that the embodiment of the present application is provided, and has the corresponding function module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present application is provided.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment:The characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes:Smart mobile phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment:This kind of equipment belongs to the scope of personal computer, there is calculating and processing work(
Can, generally also have mobile Internet access characteristic.This Terminal Type includes:PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device:This kind of equipment can show and play multimedia content.Such equipment includes:Audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server:The equipment for providing the service of calculating, the composition of server include that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein the unit illustrated as separating component can
It is physically separated with being or may not be, the component shown as unit may or may not be physics list
Member, you can be located at a place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of module achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology
Scheme substantially in other words can be expressed in the form of software products the part that the relevant technologies contribute, the computer
Software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions to
So that computer equipment (can be personal computer, server either network equipment etc.) execute each embodiment or
Method described in certain parts of embodiment.
Finally it should be noted that:Above example is only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that:It still may be used
With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features;
And these modifications or replacements, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and
Range.