CN108766429A - Voice interactive method and device - Google Patents

Voice interactive method and device Download PDF

Info

Publication number
CN108766429A
CN108766429A CN201810568760.0A CN201810568760A CN108766429A CN 108766429 A CN108766429 A CN 108766429A CN 201810568760 A CN201810568760 A CN 201810568760A CN 108766429 A CN108766429 A CN 108766429A
Authority
CN
China
Prior art keywords
voice
voice messaging
target word
prompt tone
messaging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810568760.0A
Other languages
Chinese (zh)
Other versions
CN108766429B (en
Inventor
路华
黄世维
黄硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810568760.0A priority Critical patent/CN108766429B/en
Publication of CN108766429A publication Critical patent/CN108766429A/en
Application granted granted Critical
Publication of CN108766429B publication Critical patent/CN108766429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L2013/021Overlap-add techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the present application discloses voice interactive method and device.One specific implementation mode of this method includes:Extraction includes the first voice messaging of target word sound bite;Prompt tone is superimposed at the target word sound bite, voice output is superimposed the first voice messaging after prompt tone, and the content which is used to prompt currently to be reported is target word;In response to collecting the second voice messaging of user feedback, which is matched with the target word;In response to determining that second voice messaging matches with the target word, voice output third voice messaging associated with the target word.This embodiment improves the efficiency of interactive voice.

Description

Voice interactive method and device
Technical field
The invention relates to field of computer technology, and in particular to voice interactive method and device.
Background technology
With the development of computer technology, the type of interactive voice product is more and more abundant.In the product of pure interactive voice In, user's expression is not limited by graphical interfaces, and degree of freedom is high, it usually needs limits the answer of user.Therefore, pure Under interactive voice environment, inform that user there are those limitations particularly important in high efficiency and low cost.
Existing mode is typically given user by graphical interfaces and is accordingly prompted, and user is reading explanation or study course Afterwards, understand the phonetic order that can be used.Existing another way, the mode that can be output by voice inform that user can make Phonetic order.
Invention content
The embodiment of the present application proposes voice interactive method and device.
In a first aspect, the embodiment of the present application provides a kind of voice interactive method, this method includes:Extraction includes target word First voice messaging of sound bite;Prompt tone is superimposed at target word sound bite, voice output is superimposed the after prompt tone One voice messaging, the content that prompt tone is used to prompt currently to be reported are target word;In response to collecting the second of user feedback Voice messaging matches the second voice messaging with target word;In response to determining that the second voice messaging matches with target word, Voice output third voice messaging associated with target word.
In some embodiments, prompt tone is superimposed at target word sound bite, voice output is superimposed the after prompt tone One voice messaging, including:In the prompt tone of the section start superimposed pulse type of target word sound bite, voice output is superimposed prompt tone The first voice messaging afterwards, wherein prompt tone at the end of target word sound bite before terminate.
In some embodiments, prompt tone is superimposed at target word sound bite, voice output is superimposed the after prompt tone One voice messaging, including:It is superimposed the prompt tone of sustained in the section start of target word sound bite, voice output is superimposed prompt tone The first voice messaging afterwards, wherein prompt tone terminates at the end of target word sound bite.
In some embodiments, in response to determining that the second voice messaging matches with target word, voice output and target word Associated third voice messaging, including:In response to determining that the second voice messaging matches with target word, determine that the first voice is believed The type of breath determines third voice messaging associated with target word, voice output third based on the type of the first voice messaging Voice messaging.
In some embodiments, the type based on the first voice messaging determines third voice letter associated with target word Breath, voice output third voice messaging, including:In response to determining that the type of the first voice messaging is news report class, packet is generated Information search request containing target word;Information search request is sent to server, receives the search result that server returns;It will search Voice messaging corresponding to hitch fruit is as third voice messaging, voice output third voice messaging.
In some embodiments, the type based on the first voice messaging determines third voice letter associated with target word Breath, voice output third voice messaging, including:In response to determining that the type of the first voice messaging is service inquiry class, packet is generated Service inquiry request containing target word;Service inquiry request is sent to server, receives the query result that server returns;It will look into The voice messaging corresponding to result is ask as third voice messaging, voice output third voice messaging.
In some embodiments, the type based on the first voice messaging determines third voice letter associated with target word Breath, voice output third voice messaging, including:In response to determining that the type of the first voice messaging is validation of information class, generates and use The jump instruction of preset next voice messaging is jumped in instruction, next voice messaging is determined as third voice letter Breath.
In some embodiments, the volume of prompt tone is less than the volume of target word sound bite.
Second aspect, the embodiment of the present application provide a kind of voice interaction device, which includes:Extraction unit, by with It is set to the first voice messaging that extraction includes target word sound bite;First output unit is configured in target word voice sheet Prompt tone is superimposed at section, voice output is superimposed the first voice messaging after prompt tone, what prompt tone was used to prompt currently to be reported Content is target word;Matching unit is configured in response to collect the second voice messaging of user feedback, and the second voice is believed Breath is matched with target word;Second output unit is configured in response to determine that the second voice messaging matches with target word, Voice output third voice messaging associated with target word.
In some embodiments, the first output unit is further configured to:It is folded in the section start of target word sound bite The prompt tone of impulse type, voice output is added to be superimposed the first voice messaging after prompt tone, wherein prompt tone is in target word voice sheet Terminate before at the end of section.
In some embodiments, the first output unit is further configured to:It is folded in the section start of target word sound bite The prompt tone of ideotype is accommodated, voice output is superimposed the first voice messaging after prompt tone, wherein prompt tone is in target word voice sheet Terminate at the end of section.
In some embodiments, matching unit is further configured to:In response to determining the second voice messaging and target word Match, determine the type of the first voice messaging, based on the type of the first voice messaging, determines third associated with target word Voice messaging, voice output third voice messaging.
In some embodiments, matching unit includes:First generation module is configured in response to determine the first voice letter The type of breath is news report class, generates the information search request for including target word;First sending module is configured to service Device sends information search request, receives the search result that server returns;First output module is configured to search result institute Corresponding voice messaging is as third voice messaging, voice output third voice messaging.
In some embodiments, matching unit includes:Second generation module is configured in response to determine the first voice letter The type of breath is service inquiry class, generates the service inquiry comprising target word and asks;Second sending module is configured to service Device sends service inquiry request, receives the query result that server returns;Second output module is configured to query result institute Corresponding voice messaging is as third voice messaging, voice output third voice messaging.
In some embodiments, matching unit includes:Third generation module is configured in response to determine the first voice letter The type of breath is validation of information class, and generation is used to indicate the jump instruction for jumping to preset next voice messaging, will be next Voice messaging is determined as third voice messaging.
In some embodiments, the volume of prompt tone is less than the volume of target word sound bite.
The third aspect, the embodiment of the present application provide a kind of terminal device, including:One or more processors;Storage dress Set, be stored thereon with one or more programs, when one or more programs are executed by one or more processors so that one or Multiple processors realize the method such as any embodiment in voice interactive method.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method such as any embodiment in voice interactive method is realized when program is executed by processor.
Voice interactive method and device provided by the embodiments of the present application include the first of target word sound bite by extraction Voice messaging, is then superimposed prompt tone at the target word sound bite, and voice output is superimposed the first voice letter after prompt tone Breath, ought collect the second voice messaging of user feedback later, and based on the matching of second voice messaging and target word, determination waits for The third voice messaging of voice output, the last voice output third voice messaging.It is therefore not required to using graphical interfaces or The phonetic order that voice informing user can input, also not needing user's ancillary cost time reads or listens explanation and study course, utilizes The mode of superposition prompt tone can prompt which phonetic order of user that can assign, and improve the efficiency of interactive voice.
Description of the drawings
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the voice interactive method of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the voice interactive method of the application;
Fig. 4 is the flow chart according to another embodiment of the voice interactive method of the application;
Fig. 5 is the structural schematic diagram according to one embodiment of the voice interaction device of the application;
Fig. 6 is adapted for the structural schematic diagram of the computer system of the terminal device for realizing the embodiment of the present application.
Specific implementation mode
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, is illustrated only in attached drawing and invent relevant part with related.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the exemplary system architecture of the voice interactive method or voice interaction device that can apply the application 100。
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted by network 104 with server 105 with using terminal equipment 101,102,103, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as interactive voice class is answered on terminal device 101,102,103 With, shopping class application, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be hardware, can also be software.When terminal device 101,102,103 is hard Can be the various electronic equipments for supporting interactive voice when part, including but not limited to smart mobile phone, tablet computer, e-book is read Read device, pocket computer on knee and desktop computer etc..When terminal device 101,102,103 is software, can install In above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing distributed clothes in it Business), single software or software module can also be implemented as.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to being installed on terminal device 101,102,103 Interactive voice class application provide support background server.Background server can to receive information search request, industry The data such as business inquiry request carry out the processing such as analyzing, and handling result is fed back to terminal device.
Server 105 can be hardware, can also be software.When server is hardware, multiple services may be implemented into The distributed server cluster of device composition, can also be implemented as individual server.When server is software, may be implemented into more A software or software module (such as providing Distributed Services), can also be implemented as single software or software module.Herein It is not specifically limited.
It should be noted that the voice interactive method that is provided of the embodiment of the present application generally by terminal device 101,102, 103 execute, and correspondingly, voice interaction device is generally positioned in terminal device 101,102,103.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the flow 200 of one embodiment of the voice interactive method according to the application is shown.The language Sound exchange method, includes the following steps:
Step 201, extraction includes the first voice messaging of target word sound bite.
In the present embodiment, the executive agent (such as terminal device shown in FIG. 1 101,102,103) of voice interactive method The first voice messaging for waiting for voice output can be extracted.Wherein, above-mentioned first voice messaging can include target word sound bite.
Above-mentioned target word sound bite can be voice institute structure in above-mentioned first voice messaging, being converted by target word At sound bite.Above-mentioned target word can be for generate instruction (such as information search instruction, service inquiry instruction, redirect Instruction) word.As an example, in the first voice messaging " Arsenal 3 to 0 defeats Chelsea ", " Arsenal ", " Chelsea " are equal Target word can be used as.Sound bite corresponding to " Arsenal ", " Chelsea " is then target word sound bite.User is in language After sound answers " Arsenal ", above-mentioned executive agent can generate the instruction of the language information search comprising character string " Arsenal ".User After voice answering " Chelsea ", above-mentioned executive agent can generate the instruction of the information search comprising character string " Chelsea ".
Step 202, prompt tone is superimposed at target word sound bite, voice output is superimposed the first voice letter after prompt tone Breath.
In the present embodiment, above-mentioned executive agent can be superimposed prompt tone at target word sound bite, and voice output is folded Add the first voice messaging after prompt tone.Wherein, the content that above-mentioned prompt tone can be used for prompting currently to be reported is target word. As an example, above-mentioned prompt tone can be impulse type prompt tone (such as " stining ", " rub-a-dub "), volume gradually weakens at any time.On The prompt tone that prompt tone can also be sustained is stated, to during end since prompt tone, prompt tone keeps phase unisonance Amount.It should be noted that it (is, for example, less than target word that above-mentioned prompt tone can have different volumes from target word sound bite Tablet section), tone color (such as people's subjective feeling more soft tone color) etc., to reduce the interference to user.
Herein, it is superimposed prompt tone at target word sound bite, can be the section start superposition in target word sound bite Prompt tone;Can also be target word sound bite starting before preset time (such as target word sound bite starting before 0.1 second or 0.2 second etc.) at be superimposed prompt tone.It should be noted that above-mentioned prompt tone can be in above-mentioned target word voice Terminate at the end of segment, can also terminate before above-mentioned target word sound bite terminates.
In some optional realization methods of the present embodiment, above-mentioned executive agent can be in above-mentioned target word sound bite Section start superimposed pulse type prompt tone, voice output is superimposed the first voice messaging after above-mentioned prompt tone.Wherein, above-mentioned to carry Show sound can at the end of above-mentioned target word sound bite before terminate.
In some optional realization methods of the present embodiment, above-mentioned executive agent can be in above-mentioned target word sound bite Section start superposition sustained prompt tone, voice output is superimposed the first voice messaging after above-mentioned prompt tone.Wherein, above-mentioned to carry Show that sound can terminate at the end of above-mentioned target word sound bite.
Step 203, in response to collecting the second voice messaging of user feedback, the second voice messaging and target word are carried out Matching.
In the present embodiment, above-mentioned executive agent, can be by response to collecting the second voice messaging of user feedback Two voice messagings are matched with target word.It can specifically execute in accordance with the following steps:
The first step, above-mentioned executive agent can utilize institute after voice output is superimposed the first voice messaging after prompt tone Voice signal in the microphone pick preset duration of installation.
Second step, above-mentioned executive agent can be handled the voice signal acquired, obtain voice messaging, and should Second voice messaging of the voice messaging as user feedback.It should be noted that above-mentioned executive agent can profit in various manners Processing to collected voice signal.As an example, can high-pass filtering processing be carried out to voice signal first, to eliminate Interference sound signal in (or weakening) above-mentioned voice signal.Then, various echo cancel methods can be utilized, to eliminating interference sound Voice signal after signal carries out echo cancellation process, the voice signal after the echo signal that is eliminated.It finally, can be to eliminating Voice signal after echo signal carries out automatic growth control processing, to increase the voice signal after eliminating echo signal, obtains Voice messaging, and using the voice messaging as the second voice messaging of user feedback.
Third walks, and above-mentioned executive agent can utilize the second voice messaging of acoustic model pair trained to be in advance identified, Obtain voice recognition result (such as character string corresponding to above-mentioned second voice messaging).Herein, above-mentioned acoustic model can be By machine learning method, Training is carried out to the training sample being made of a large amount of voice messagings and is obtained.It herein, can be with It is the training for carrying out acoustic model using various models, such as hidden Markov model (Hidden Markov Model, HMM), follows Ring neural network (Recurrent Neural Networks, RNN), deep neural network (Deep Neural Network, DNN) etc., the combination of multiple models can also be used.
4th step, above-mentioned executive agent can match upper speech recognition result with above-mentioned target word.As showing Example, it may be determined that whether upper speech recognition result and above-mentioned target word are consistent.If consistent, the second voice messaging can be determined Match with above-mentioned target word;Conversely, can then determine mismatch.As another example, above-mentioned executive agent can determine Whether speech recognition result includes above-mentioned target word.If including the second voice messaging and above-mentioned target word phase can be determined Matching;Conversely, can then determine mismatch.
It should be noted that above-mentioned electronic equipment can also determine the second voice messaging and above-mentioned target by other means Whether word matches.For example, the second voice messaging can be determined by the comparison of the second voice messaging and target word sound bite It is whether close with target word sound bite.If so, can determine that the second voice messaging matches with above-mentioned target word.
Step 204, in response to determining that the second voice messaging matches with target word, voice output is associated with target word Third voice messaging.
In the present embodiment, above-mentioned executive agent, can be with language in response to determining that the second voice messaging matches with target word Sound exports third voice messaging associated with target word.Herein, it can be first carried out preset, associated with above-mentioned target word Instruction (such as being used to indicate the information search instruction that information search is carried out using the target word as search term), by implementing result (example As searched for obtained information) it is determined as third voice messaging associated with target word.
It, can will be preset it should be noted that in response to determining upper speech recognition result and above-mentioned target word mismatch , for prompt user retransmit voice messaging voice messaging be determined as third voice messaging.
It is a schematic diagram according to the application scenarios of the voice interactive method of the present embodiment with continued reference to Fig. 3, Fig. 3.? In the application scenarios of Fig. 3, user's hand-held terminal device 301 carries out interactive voice with terminal device 301.
First, first voice letter of the extraction of terminal device 301 comprising target word sound bite " Arsenal " and " Chelsea " Cease " Arsenal 3:0 defeats Chelsea ".Respectively prompt tone has been superimposed at target word sound bite " Arsenal " and " Chelsea ". First voice messaging after then voice output superposition prompt tone.Then, user is in hear that terminal device 301 reported After one voice messaging, " Arsenal " and " Chelsea " can be putd question to by learning.And say the second voice messaging " Arsenal ". Later, terminal device 301 scans for target word " Arsenal ", and the recommended information of the Arsenal searched out is converted Chinese idiom Sound is reported as third voice messaging.
The method that above-described embodiment of the application provides is believed by the first voice of the extraction comprising target word sound bite Breath is then superimposed prompt tone at the target word sound bite, and voice output is superimposed the first voice messaging after prompt tone, later When the second voice messaging for collecting user feedback, based on the matching of second voice messaging and target word, determination waits for that voice is defeated The third voice messaging gone out, the last voice output third voice messaging.It is therefore not required to be accused using graphical interfaces or voice Know the phonetic order that user can input, also not needing user's ancillary cost time reads or listen explanation and study course, is carried using superposition Show that the mode of sound can prompt which phonetic order of user that can assign, improves efficiency and the flexibility of interactive voice.
With further reference to Fig. 4, it illustrates the flows 400 of another embodiment of voice interactive method.The interactive voice The flow 400 of method, includes the following steps:
Step 401, extraction includes the first voice messaging of target word sound bite.
In the present embodiment, the executive agent (such as terminal device shown in FIG. 1 101,102,103) of voice interactive method The first voice messaging comprising target word sound bite can be extracted.Wherein, above-mentioned target word sound bite can be above-mentioned Sound bite in one voice messaging, being made of the voice that target word is converted into.Above-mentioned target word can be for generating The word of instruction (such as information search instruction, service inquiry instruction, jump instruction).
Step 402, the prompt tone of sustained is superimposed in the section start of target word sound bite, voice output is superimposed prompt tone The first voice messaging afterwards.
In the present embodiment, above-mentioned executive agent can be superimposed sustained in the section start of above-mentioned target word sound bite Prompt tone, voice output are superimposed the first voice messaging after above-mentioned prompt tone, wherein above-mentioned prompt tone is in above-mentioned target word voice Terminate at the end of segment.Herein, the volume of above-mentioned prompt tone can be less than the volume of above-mentioned target word sound bite.
Step 403, in response to collecting the second voice messaging of user feedback, the second voice messaging and target word are carried out Matching.
In the present embodiment, above-mentioned executive agent, can be with after voice output is superimposed the first voice messaging after prompt tone Utilize the voice signal in installed microphone pick preset duration.It then, can be at the voice signal that is acquired Reason, obtains voice messaging, and using the voice messaging as the second voice messaging of user feedback.Later, instruction in advance can be utilized Experienced the second voice messaging of acoustic model pair is identified, and obtains voice recognition result.It finally, can be by above-mentioned speech recognition knot Fruit is matched with above-mentioned target word.As an example, can determine whether speech recognition result is consistent with above-mentioned target word. If consistent, it can determine that the second voice messaging matches with above-mentioned target word;Conversely, can then determine mismatch.
Step 404, in response to determining that the second voice messaging matches with target word, the type of the first voice messaging is determined.
In the present embodiment, match with above-mentioned target word in response to above-mentioned second voice messaging of determination, determine above-mentioned The type of one voice messaging.Herein, the type of above-mentioned first voice messaging can include but is not limited to news report class, business is looked into Asking multiple types, the different types of voice messagings such as class, validation of information class can be corresponding with different, related to above-mentioned target word The third voice messaging of connection.
Step 405, the type based on the first voice messaging determines third voice messaging associated with target word, voice Export third voice messaging.
In the present embodiment, above-mentioned executive agent can be determined and above-mentioned mesh based on the type of above-mentioned first voice messaging Mark the associated third voice messaging of word, the above-mentioned third voice messaging of voice output.Herein, different types of voice messaging can be with It is corresponding with third voice messaging different, associated with above-mentioned target word.
In some optional realization methods of the present embodiment, the type in response to above-mentioned first voice messaging of determination is new It hears and reports class, above-mentioned executive agent can firstly generate the information search request for including above-mentioned target word.It then, can be to service Device (such as server 105 shown in FIG. 1) sends above- mentioned information searching request, receives the search result that above-mentioned server returns. It later, can be using the voice messaging corresponding to mentioned above searching results as third voice messaging, the above-mentioned third voice of voice output Information.As an example, above-mentioned executive agent voice output the first voice messaging " Arsenal 3:0 defeats Chelsea ".Wherein, should " Arsenal " and " Chelsea " in first voice messaging is target word sound bite, and has been superimposed prompt tone.It is above-mentioned to hold Row main body includes target word sound bite " Arsenal " in the second voice messaging in response to determining user's reply, can be to service Device, which is sent, include the information search request of the target word " Arsenal ", with search and " Arsenal " relevant information (such as to Ah The gloomy recommended information received).Search result can be converted into voice, the transformed voice of voice output by above-mentioned executive agent.Turn Voice afterwards is third voice messaging.
In some optional realization methods of the present embodiment, the type in response to above-mentioned first voice messaging of determination is industry Business inquiry class, above-mentioned executive agent can firstly generate the request of the service inquiry comprising above-mentioned target word.It then, can be to service Device sends above-mentioned service inquiry request, receives the query result that above-mentioned server returns.It later, can be by above-mentioned query result institute Corresponding voice messaging is as third voice messaging, the above-mentioned third voice messaging of voice output.As an example, above-mentioned executive agent Voice output the first voice messaging " you can inquire your remaining sum and other information ".Wherein, in first voice messaging " remaining sum " and " other " are target word sound bite, and have been superimposed prompt tone.Above-mentioned executive agent is in response to determining user Include target word sound bite " remaining sum " in the second voice messaging replied, can be sent to server " remaining comprising the target word The service inquiry of volume " is asked, to query the balance.Query result can be converted into voice by above-mentioned executive agent, and voice output turns Voice after changing.Voice after turning is third voice messaging.
In some optional realization methods of the present embodiment, the type in response to above-mentioned first voice messaging of determination is letter Breath confirms that class, above-mentioned executive agent can generate the jump instruction for being used to indicate and jumping to preset next voice messaging, and Above-mentioned next voice messaging can be determined as third voice messaging.As an example, above-mentioned executive agent voice output One voice messaging " your destination is No. 5 meeting rooms, is confirmed ".Wherein, " destination " in first voice messaging and " really Recognize " it is target word sound bite, and it has been superimposed prompt tone.The second language that above-mentioned executive agent is replied in response to determining user Include target word sound bite " confirmation " in message breath, then can generate to be used to indicate and jump to preset next voice messaging The jump instruction of " now beginning to navigate for you ", and the voice messaging " now beginning to navigate for you " can be determined as third language Message ceases.Above-mentioned executive agent includes target word sound bite " purpose in the second voice messaging in response to determining user's reply Ground " can then generate the jump instruction for being used to indicate and jumping to preset next voice messaging " please re-enter destination ", And the voice messaging " please re-enter destination " can be determined as third voice messaging.It should be noted that validation of information The voice messaging of class can not include " confirmation " word.Such as " you are good, and provide Chinese meal and hamburger here, and beverage has cola And orange juice ".The type of the voice messaging can also be used as validation of information class.With this voice messaging as an example, therein " in Meal ", " hamburger ", " cola " and " orange juice " are target word sound bite, and have been superimposed prompt tone.Above-mentioned executive agent response Include either objective word sound bite (such as " Chinese meal ") in determining in the second voice messaging that user replys, then can generate jump Go to preset next voice messaging corresponding with " Chinese meal " " Chinese meal has steamed stuffed bun, dumpling and rice, please select " redirects finger It enables, and the voice messaging " Chinese meal has steamed stuffed bun, dumpling and rice, please select " can be determined as third voice messaging.
Figure 4, it is seen that compared with the corresponding embodiments of Fig. 2, the flow of the voice interactive method in the present embodiment 400 highlight the step of waiting for the third voice messaging of voice output for the determination of different types of first voice messaging.As a result, originally Embodiment description scheme using at target word sound bite superposition prompt tone in the way of prompt which phonetic order of user can To assign, efficiency and the flexibility of interactive voice are improved.Meanwhile which can support multiple interactive voice, and in interaction User is not needed in the process and reads explanation or study course, need not be reported the rule that user sends instruction yet, be further improved language The flexibility of sound interaction, further improves the efficiency of interactive voice.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides a kind of interactive voice dresses The one embodiment set, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to respectively In kind electronic equipment.
As shown in figure 5, the voice interaction device 500 described in the present embodiment includes:Extraction unit 501 is configured to extract Include the first voice messaging of target word sound bite;First output unit 502 is configured in above-mentioned target word sound bite Place's superposition prompt tone, voice output are superimposed the first voice messaging after prompt tone, and above-mentioned prompt tone is for prompting currently to be reported Content be target word;Matching unit 503 is configured in response to collect the second voice messaging of user feedback, will be above-mentioned Second voice messaging is matched with above-mentioned target word;Second output unit 504 is configured in response to determine above-mentioned second language Message breath matches with above-mentioned target word, voice output third voice messaging associated with above-mentioned target word.
In some embodiments, above-mentioned first output unit 502 can be further configured in above-mentioned target word voice The prompt tone of the section start superimposed pulse type of segment, voice output are superimposed the first voice messaging after above-mentioned prompt tone, wherein on Terminate before prompt tone is stated at the end of above-mentioned target word sound bite.
In some embodiments, above-mentioned first output unit 502 can be further configured in above-mentioned target word voice The prompt tone of the section start superposition sustained of segment, voice output are superimposed the first voice messaging after above-mentioned prompt tone, wherein on Prompt tone is stated at the end of above-mentioned target word sound bite to terminate.
In some embodiments, above-mentioned matching unit 503 can be further configured in response to above-mentioned second language of determination Message breath matches with above-mentioned target word, determines the type of above-mentioned first voice messaging, the class based on above-mentioned first voice messaging Type determines third voice messaging associated with above-mentioned target word, the above-mentioned third voice messaging of voice output.
In some embodiments, above-mentioned matching unit 503 may include the first generation module, the first sending module and first Output module (not shown).Wherein, above-mentioned first generation module may be configured in response to above-mentioned first voice of determination The type of information is news report class, generates the information search request for including above-mentioned target word.Above-mentioned first sending module can be with It is configured to send above- mentioned information searching request to server, receives the search result that above-mentioned server returns.Above-mentioned first is defeated Go out module to may be configured to using the voice messaging corresponding to mentioned above searching results as third voice messaging, voice output is above-mentioned Third voice messaging.
In some embodiments, above-mentioned matching unit 503 may include the second generation module, the second sending module and second Output module (not shown).Wherein, above-mentioned second generation module may be configured in response to above-mentioned first voice of determination The type of information is service inquiry class, generates the service inquiry comprising above-mentioned target word and asks.Above-mentioned second sending module can be with It is configured to send above-mentioned service inquiry request to server, receives the query result that above-mentioned server returns.Above-mentioned second is defeated Go out module to may be configured to using the voice messaging corresponding to above-mentioned query result as third voice messaging, voice output is above-mentioned Third voice messaging.
In some embodiments, above-mentioned matching unit 503 may include third generation module (not shown).Wherein, Above-mentioned third generation module may be configured in response to the type of above-mentioned first voice messaging of determination be validation of information class, generate It is used to indicate the jump instruction for jumping to preset next voice messaging, above-mentioned next voice messaging is determined as third language Message ceases.
In some embodiments, the volume of above-mentioned prompt tone is less than the volume of above-mentioned target word sound bite.
The device that above-described embodiment of the application provides, by the extraction of extraction unit 501 comprising target word sound bite First voice messaging, then the first output unit 502 be superimposed prompt tone at the target word sound bite, voice output superposition carries Show the first voice messaging after sound, matching unit 503 is when the second voice messaging for collecting user feedback later, by second language Message breath is matched with target word, if matching, 504 voice output of the second output unit third voice associated with target word Information.It is therefore not required to which the phonetic order that can be inputted using graphical interfaces or voice informing user, it is additional not need user yet It spends the time to read or listen explanation and study course, can prompt which phonetic order of user can be following in the way of superposition prompt tone It reaches, improves efficiency and the flexibility of interactive voice.
Below with reference to Fig. 6, it illustrates the computer systems 600 suitable for the terminal device for realizing the embodiment of the present application Structural schematic diagram.Terminal device shown in Fig. 6 is only an example, to the function of the embodiment of the present application and should not use model Shroud carrys out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various actions appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
It is connected to I/O interfaces 605 with lower component:Importation 606 including touch screen, touch tablet etc.;Including such as liquid The output par, c 607 of crystal display (LCD) etc. and loud speaker etc.;Storage section 608;And including such as LAN card, modulatedemodulate Adjust the communications portion 609 of the network interface card of device etc..Communications portion 609 executes communication process via the network of such as internet. Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as semiconductor memory etc., according to need It to be mounted on driver 610, in order to be mounted into storage section 608 as needed from the computer program read thereon.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed by communications portion 609 from network, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two arbitrarily combines.Computer readable storage medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or arbitrary above combination. The more specific example of computer readable storage medium can include but is not limited to:Electrical connection with one or more conducting wires, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium can any be included or store The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.And In the application, computer-readable signal media may include the data letter propagated in a base band or as a carrier wave part Number, wherein carrying computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by instruction execution system, device either device use or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.
Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part for a part for one module, program segment, or code of table, the module, program segment, or code includes one or more uses The executable instruction of the logic function as defined in realization.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it to note Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be arranged in the processor, for example, can be described as:A kind of processor packet Include extraction unit, the first output unit, matching unit and the second output unit.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit itself, for example, extraction unit is also described as, " extraction includes target word voice sheet The unit of first voice messaging of section ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device so that should Device:Extraction includes the first voice messaging of target word sound bite;Prompt tone, voice are superimposed at the target word sound bite The first voice messaging after output superposition prompt tone, the content which is used to prompt currently to be reported are target word;Response In the second voice messaging for collecting user feedback, which is matched with the target word;In response to determination Second voice messaging matches with the target word, voice output third voice messaging associated with the target word.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (18)

1. a kind of voice interactive method, including:
Extraction includes the first voice messaging of target word sound bite;
Prompt tone is superimposed at the target word sound bite, voice output is superimposed the first voice messaging after prompt tone, described The content that prompt tone is used to prompt currently to be reported is target word;
In response to collecting the second voice messaging of user feedback, by second voice messaging and target word progress Match;
Match with the target word in response to determination second voice messaging, voice output is associated with the target word Third voice messaging.
2. voice interactive method according to claim 1, wherein described to be superimposed prompt at the target word sound bite Sound, voice output are superimposed the first voice messaging after prompt tone, including:
In the prompt tone of the section start superimposed pulse type of the target word sound bite, after voice output is superimposed the prompt tone First voice messaging, wherein the prompt tone at the end of the target word sound bite before terminate.
3. voice interactive method according to claim 1, wherein described to be superimposed prompt at the target word sound bite Sound, voice output are superimposed the first voice messaging after prompt tone, including:
It is superimposed the prompt tone of sustained in the section start of the target word sound bite, after voice output is superimposed the prompt tone First voice messaging, wherein the prompt tone terminates at the end of the target word sound bite.
4. voice interactive method according to claim 1, wherein described in response to determination second voice messaging and institute It states target word to match, voice output third voice messaging associated with the target word, including:
Match with the target word in response to determination second voice messaging, determine the type of first voice messaging, Based on the type of first voice messaging, third voice messaging associated with the target word is determined, described in voice output Third voice messaging.
5. voice interactive method according to claim 4, wherein the type based on first voice messaging, really Fixed third voice messaging associated with the target word, third voice messaging described in voice output, including:
Type in response to determination first voice messaging is news report class, generates the information search for including the target word Request;
Described information searching request is sent to server, receives the search result that the server returns;
Using the voice messaging corresponding to described search result as third voice messaging, third voice messaging described in voice output.
6. voice interactive method according to claim 4, wherein the type based on first voice messaging, really Fixed third voice messaging associated with the target word, third voice messaging described in voice output, including:
Type in response to determination first voice messaging is service inquiry class, generates the service inquiry for including the target word Request;
The service inquiry request is sent to server, receives the query result that the server returns;
Using the voice messaging corresponding to the query result as third voice messaging, third voice messaging described in voice output.
7. voice interactive method according to claim 4, wherein the type based on first voice messaging, really Fixed third voice messaging associated with the target word, third voice messaging described in voice output, including:
In response to determination first voice messaging type be validation of information class, generation be used to indicate jump to it is preset next The jump instruction of voice messaging, is determined as third voice messaging by the next voice messaging.
8. according to the voice interactive method described in one of claim 1-7, wherein the volume of the prompt tone is less than the target The volume of word sound bite.
9. a kind of voice interaction device, including:
Extraction unit is configured to the first voice messaging that extraction includes target word sound bite;
First output unit is configured to be superimposed prompt tone at the target word sound bite, and voice output is superimposed prompt tone The first voice messaging afterwards, the content that the prompt tone is used to prompt currently to be reported are target word;
Matching unit is configured in response to collect the second voice messaging of user feedback, by second voice messaging with The target word is matched;
Second output unit is configured in response to determine that second voice messaging matches with the target word, and voice is defeated Go out third voice messaging associated with the target word.
10. voice interaction device according to claim 9, wherein first output unit is further configured to:
In the prompt tone of the section start superimposed pulse type of the target word sound bite, after voice output is superimposed the prompt tone First voice messaging, wherein the prompt tone at the end of the target word sound bite before terminate.
11. voice interaction device according to claim 9, wherein first output unit is further configured to:
It is superimposed the prompt tone of sustained in the section start of the target word sound bite, after voice output is superimposed the prompt tone First voice messaging, wherein the prompt tone terminates at the end of the target word sound bite.
12. voice interaction device according to claim 9, wherein the matching unit is further configured to:
Match with the target word in response to determination second voice messaging, determine the type of first voice messaging, Based on the type of first voice messaging, third voice messaging associated with the target word is determined, described in voice output Third voice messaging.
13. voice interaction device according to claim 12, wherein the matching unit includes:
First generation module is configured in response to determine that the type of first voice messaging is news report class, generates packet Information search request containing the target word;
First sending module is configured to send described information searching request to server, receives searching for the server return Hitch fruit;
First output module is configured to using the voice messaging corresponding to described search result as third voice messaging, voice Export the third voice messaging.
14. voice interaction device according to claim 12, wherein the matching unit includes:
Second generation module is configured in response to determine that the type of first voice messaging is service inquiry class, generates packet Service inquiry request containing the target word;
Second sending module is configured to send the service inquiry request to server, receives looking into for the server return Ask result;
Second output module is configured to using the voice messaging corresponding to the query result as third voice messaging, voice Export the third voice messaging.
15. voice interaction device according to claim 12, wherein the matching unit includes:
Third generation module is configured in response to determine that the type of first voice messaging is validation of information class, generates and use The jump instruction of preset next voice messaging is jumped in instruction, the next voice messaging is determined as third voice Information.
16. according to the voice interaction device described in one of claim 9-15, wherein the volume of the prompt tone is less than the mesh Mark the volume of word sound bite.
17. a kind of terminal device, including:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real Now such as method according to any one of claims 1-8.
18. a kind of computer-readable medium, is stored thereon with computer program, wherein the program is realized when being executed by processor Such as method according to any one of claims 1-8.
CN201810568760.0A 2018-06-05 2018-06-05 Voice interaction method and device Active CN108766429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810568760.0A CN108766429B (en) 2018-06-05 2018-06-05 Voice interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810568760.0A CN108766429B (en) 2018-06-05 2018-06-05 Voice interaction method and device

Publications (2)

Publication Number Publication Date
CN108766429A true CN108766429A (en) 2018-11-06
CN108766429B CN108766429B (en) 2020-08-21

Family

ID=63999018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810568760.0A Active CN108766429B (en) 2018-06-05 2018-06-05 Voice interaction method and device

Country Status (1)

Country Link
CN (1) CN108766429B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110058916A (en) * 2019-04-23 2019-07-26 深圳创维数字技术有限公司 A kind of phonetic function jump method, device, equipment and computer storage medium
CN110265016A (en) * 2019-06-25 2019-09-20 百度在线网络技术(北京)有限公司 Voice interactive method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003219332A (en) * 2002-01-23 2003-07-31 Canon Inc Program reservation apparatus and method, and program
CN1934848A (en) * 2004-03-18 2007-03-21 索尼株式会社 Method and apparatus for voice interactive messaging
CN102810316A (en) * 2012-06-29 2012-12-05 宇龙计算机通信科技(深圳)有限公司 Method for adding background voice during conversation and communication terminal
CN202759531U (en) * 2012-06-13 2013-02-27 青岛海尔电子有限公司 Television set volume control system
CN103292437A (en) * 2013-06-17 2013-09-11 广东美的制冷设备有限公司 Voice interactive air conditioner and control method thereof
CN106055605A (en) * 2016-05-25 2016-10-26 深圳市童伴互动文化产业发展有限公司 Voice interaction control method and apparatus thereof
CN107707828A (en) * 2017-09-26 2018-02-16 维沃移动通信有限公司 A kind of method for processing video frequency and mobile terminal
CN107895578A (en) * 2017-11-15 2018-04-10 百度在线网络技术(北京)有限公司 Voice interactive method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003219332A (en) * 2002-01-23 2003-07-31 Canon Inc Program reservation apparatus and method, and program
CN1934848A (en) * 2004-03-18 2007-03-21 索尼株式会社 Method and apparatus for voice interactive messaging
CN202759531U (en) * 2012-06-13 2013-02-27 青岛海尔电子有限公司 Television set volume control system
CN102810316A (en) * 2012-06-29 2012-12-05 宇龙计算机通信科技(深圳)有限公司 Method for adding background voice during conversation and communication terminal
CN103292437A (en) * 2013-06-17 2013-09-11 广东美的制冷设备有限公司 Voice interactive air conditioner and control method thereof
CN106055605A (en) * 2016-05-25 2016-10-26 深圳市童伴互动文化产业发展有限公司 Voice interaction control method and apparatus thereof
CN107707828A (en) * 2017-09-26 2018-02-16 维沃移动通信有限公司 A kind of method for processing video frequency and mobile terminal
CN107895578A (en) * 2017-11-15 2018-04-10 百度在线网络技术(北京)有限公司 Voice interactive method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡郁,严峻: "智能语音交互技术及其标准化", 《信息技术与标准化》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110058916A (en) * 2019-04-23 2019-07-26 深圳创维数字技术有限公司 A kind of phonetic function jump method, device, equipment and computer storage medium
CN110265016A (en) * 2019-06-25 2019-09-20 百度在线网络技术(北京)有限公司 Voice interactive method and device

Also Published As

Publication number Publication date
CN108766429B (en) 2020-08-21

Similar Documents

Publication Publication Date Title
EP3703049A1 (en) System for eatery ordering with mobile interface & point-of-sale terminal
CN108305626A (en) The sound control method and device of application program
CN103915095B (en) The method of speech recognition, interactive device, server and system
CN105955703B (en) Inquiry response dependent on state
CN107623614A (en) Method and apparatus for pushed information
CN108022586A (en) Method and apparatus for controlling the page
CN108769745A (en) Video broadcasting method and device
WO2009149340A1 (en) A system and method utilizing voice search to locate a procuct in stores from a phone
CN108337380A (en) Adjust automatically user interface is for hands-free interaction
CN108470034A (en) A kind of smart machine service providing method and system
AU2012238229A1 (en) Multi-modal customer care system
CN107863108A (en) Information output method and device
CN108156824A (en) Association centre's virtual assistant
CN108121800A (en) Information generating method and device based on artificial intelligence
CN110050303A (en) Speech-to-text conversion based on third-party agent content
CN109545193A (en) Method and apparatus for generating model
CN110265013A (en) The recognition methods of voice and device, computer equipment, storage medium
CN109739605A (en) The method and apparatus for generating information
CN109635094A (en) Method and apparatus for generating answer
CN109947911A (en) A kind of man-machine interaction method, calculates equipment and computer storage medium at device
CN108924218A (en) Method and apparatus for pushed information
CN110471733A (en) Information processing method and device
CN108494975A (en) Incoming call response method and equipment
CN109344330A (en) Information processing method and device
CN108766429A (en) Voice interactive method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant