CN109036413A

CN109036413A - Voice interactive method and terminal device

Info

Publication number: CN109036413A
Application number: CN201811087112.XA
Authority: CN
Inventors: 熊友军; 胡佳文; 黄高波; 张木森
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2018-09-18
Filing date: 2018-09-18
Publication date: 2018-12-18

Abstract

The present invention relates to field of computer technology, a kind of voice interactive method and terminal device are provided.This method comprises: obtaining the voice messaging of user's input under automatic interaction mode, and the voice messaging is identified；Determine whether recognition result meets preset mode switching condition；If the recognition result meets the preset mode switching condition, speech interaction mode is switched to man-machine interactively mode；The speech interaction mode includes the automatic interaction mode and the man-machine interactively mode.The present invention can be switched in time man-machine interactively mode, can be improved the accuracy for being intended to analysis to user speech using man-machine interactively mode, reduce the unusual condition during interactive voice, promote user experience when interaction exception occurs in automatic interaction mode process.

Description

Voice interactive method and terminal device

Technical field

The present invention relates to field of computer technology more particularly to a kind of voice interactive methods and terminal device.

Background technique

Interactive voice is the very important function of the terminal devices such as robot, mobile phone, intelligent sound box, car-mounted terminal, usually Terminal device and user, which carry out interactive voice, to be carried out by Cloud Server.This speech interaction mode be it is full automatic, Manpower is saved very much, it is with the obvious advantage for content introduction, the interactive voice chanted etc., but in some cases to user The intention of question analyzes inaccuracy, and interactive voice is easy to cause deviation, poor user experience occur.

Summary of the invention

In view of this, the embodiment of the invention provides voice interactive method and terminal device, to solve current interactive voice Method analyzes inaccuracy to the intention of user speech, and interactive voice is caused the problem of deviation occur.

The first aspect of the embodiment of the present invention provides voice interactive method, comprising:

The voice messaging of user's input is obtained under automatic interaction mode, and the voice messaging is identified；

Determine whether recognition result meets preset mode switching condition；

If the recognition result meets preset mode switching condition, speech interaction mode is switched to man-machine interactively mould Formula；The speech interaction mode includes the automatic interaction mode and the man-machine interactively mode.

The second aspect of the embodiment of the present invention provides terminal device, including memory, processor and is stored in described In memory and the computer program that can run on the processor, the processor are realized when executing the computer program Voice interactive method in first aspect.

The third aspect of the embodiment of the present invention provides computer readable storage medium, the computer readable storage medium It is stored with computer program, the voice interactive method in first aspect is realized when the computer program is executed by processor.

Existing beneficial effect is the embodiment of the present invention compared with prior art: by identifying to voice messaging, sentencing Determine whether recognition result meets preset mode switching condition, can judge the switching for whether needing to carry out speech interaction mode； By the way that after recognition result meets preset mode switching condition, speech interaction mode is switched to man-machine interactively mode, Neng Gou When interaction exception occurs in automatic interaction mode process, it is switched to man-machine interactively mode in time, can be mentioned using man-machine interactively mode Height is intended to the accuracy of analysis to user speech, reduces the unusual condition during interactive voice, promotes user experience.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is the implementation flow chart of voice interactive method provided in an embodiment of the present invention；

Fig. 2 is the implementation flow chart that speech interaction mode switches in voice interactive method provided in an embodiment of the present invention；

Fig. 3 is the implementation flow chart that user images are identified in voice interactive method provided in an embodiment of the present invention；

Fig. 4 is the implementation flow chart of man-machine interactively mode in voice interactive method provided in an embodiment of the present invention；

Fig. 5 is one provided in an embodiment of the present invention and implements exemplary schematic diagram；

Fig. 6 is the schematic diagram of voice interaction device provided in an embodiment of the present invention；

Fig. 7 is the schematic diagram of terminal device provided in an embodiment of the present invention.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

Fig. 1 is the implementation flow chart of voice interactive method provided in an embodiment of the present invention, and details are as follows:

In S101, the voice messaging of user's input is obtained under automatic interaction mode, and carry out to the voice messaging Identification.

In the present embodiment, executing subject can be the energy such as robot, mobile phone, intelligent sound box, car-mounted terminal and computer Enough and user carries out the terminal device of interactive voice, is not limited thereto.Automatic interaction mode is a kind of speech interaction mode, Under automatic interaction mode, by way of terminal device passes through automatic identification voice messaging and replied and user carries out voice friendship Mutually.

Terminal device can identify voice messaging in local, can also be known by server to voice messaging Not, it is not limited thereto.Carrying out identification to voice messaging may include, and it is corresponding to carry out identification generation voice messaging to voice messaging Text information, then to text information carry out semantics recognition obtain the corresponding semantic information of voice messaging.

In S102, determine whether recognition result meets preset mode switching condition.

In the present embodiment, preset mode switching condition is for judging whether to need to carry out speech interaction mode switching Condition.During terminal device carries out interactive voice with automatic interaction mode and user, item can be switched by preset mode Part judges whether to the switching of speech interaction mode.

As an embodiment of the present invention, S102 may include:

It whether searches in the corresponding text information of the voice messaging comprising pre-set text sensitive word；And/or

It whether searches in the corresponding semantic information of the voice messaging comprising default semantic sensitive word.

In the present embodiment, recognition result includes the corresponding text information of voice messaging and/or the corresponding language of voice messaging Adopted information.If including pre-set text sensitive word in the corresponding text information of voice messaging, determine that recognition result meets default mould Formula switching condition；Alternatively, if determining that recognition result accords with comprising default semantic sensitive word in the corresponding semantic information of voice messaging Close preset mode switching condition；Alternatively, if including pre-set text sensitive word and voice letter in the corresponding text information of voice messaging It ceases in corresponding semantic information comprising default semantic sensitive word, then determines that recognition result meets preset mode switching condition.

Pre-set text sensitive word and default semantic sensitive word are that there are abnormalities or user to language for characterization interactive voice Sound interaction is unsatisfied with the words and phrases waited.For example, pre-set text sensitive word may include " you are very stupid ", " understanding wrong ", " not understanding " Equal words and phrases, presetting semanteme sensitive word may include the words and phrases such as " not knowing ", " not understanding ".Pre-set text sensitive word and default semanteme Sensitive word can be arranged accordingly according to actual interactive voice scene, be not limited thereto.The corresponding text of voice messaging Comprising including default semantic sensitive word, table in pre-set text sensitive word and/or the corresponding semantic information of voice messaging in this information Situations such as bright generation interactive voice exception or user are unsatisfied with interactive voice, needs to carry out the switching of speech interaction mode.

The present embodiment by pre-set text sensitive word and default semantic sensitive word can accurate judgement whether need to carry out language The switching of sound interactive mode can make speech interaction mode switching more accurate.

In S103, if the recognition result meets the preset mode switching condition, speech interaction mode is switched To man-machine interactively mode；The speech interaction mode includes the automatic interaction mode and the man-machine interactively mode.

In the present embodiment, man-machine interactively mode is a kind of speech interaction mode, under man-machine interactively mode, by servicing people Member carries out interactive voice by terminal device and user in long-range mode.Attendant passes through the terminal of monitor terminal and user Equipment remotely obtains the voice messaging of user, and return information is sent to the terminal device of user by monitor terminal, by with The terminal device at family comes out return information voice broadcast.Wherein attendant is the staff for being responsible for artificial speech interaction.

Recognition result meets preset mode switching condition, shows that interactive voice exception or user occurs to interactive voice not Situations such as being satisfied with, can by speech interaction mode by automatic interaction pattern switching to man-machine interactively mode, voice messaging is sent To monitor terminal, the meaning of user speech information is understood by attendant and is replied, to overcome in automatic interaction mode Problem of the terminal device to the intention analysis inaccuracy of user speech.

The embodiment of the present invention determines whether recognition result meets preset mode switching item by identifying to voice messaging Part can judge the switching for whether needing to carry out speech interaction mode；Switch item by meeting preset mode in recognition result After part, speech interaction mode is switched to man-machine interactively mode, can automatic interaction mode process occur interaction it is abnormal when, and When be switched to man-machine interactively mode, using man-machine interactively mode can be improved to user speech be intended to analysis accuracy, reduce Unusual condition during interactive voice promotes user experience.

As an embodiment of the present invention, as shown in Fig. 2, S103 may include:

In S201, if the recognition result meets the preset mode switching condition, to monitor terminal sending mode Switching request.

In the present embodiment, after recognition result meets and states preset mode switching condition, terminal device is sent out to monitor terminal Send mode switch request.Mode switch request is used to request to carry out to monitor terminal the grant instruction of speech interaction mode switching.

As an embodiment of the present invention, may include: " to monitor terminal sending mode switching request " in S201

To the monitor terminal sending mode switching request and interactive voice procedural information；The interactive voice procedural information It is used to indicate the monitor terminal and plays the interactive voice procedural information, the mode switch request is used to indicate the monitoring Terminal returns to the first mode switching command after the first mode switching command for receiving attendant's input.

In the present embodiment, interactive voice procedural information is the voice during terminal device and user's progress interactive voice Information or video information judge whether to speech interaction mode switching for attendant.Interactive voice information can be Terminal device is to before monitor terminal sending mode switching request, the history interaction letter of the terminal device records in preset time period Breath is also possible to terminal device to after monitor terminal sending mode switching request, voice is carried out between terminal device and user Interactive information.

For example, terminal device to after monitor terminal sending mode switching request, do not receive monitor terminal reply it Before, continue to carry out interactive voice with automatic interaction mode and user.Terminal device will be after sending mode switching request and user The interactive voice procedural information for carrying out interactive voice is sent to monitor terminal, and monitor terminal believes the interactive voice process received Breath plays out, so that attendant judges whether to need to be switched to man-machine interactively mode according to interactive voice procedural information.If Attendant's judgement needs to be switched to man-machine interactively mode, first mode switching command can be inputted to monitor terminal, monitoring is eventually First mode switching command is back to terminal device by end；If attendant's judgement does not need to be switched to man-machine interactively mode, can To input refusal request instruction to monitor terminal, monitor terminal is back to terminal device for request instruction is refused.

The present embodiment enables interactive voice procedural information to make by the way that interactive voice procedural information is sent to monitor terminal Judge whether the reference information for being switched to man-machine interactively mode for attendant, makes attendant's reference information more fully, do More accurately pattern switching judges out, improves the accuracy of speech interaction mode switching.

In S202, after receiving the first mode switching command that the monitor terminal is sent, by speech interaction mode It is switched to man-machine interactively mode.

In the present embodiment, first mode switching command is used to indicate terminal device for speech interaction mode from automatic interaction Pattern switching is man-machine interactively mode.If terminal device receives the first mode switching command of monitor terminal transmission, by language Sound interactive mode is switched to man-machine interactively mode；If terminal device receives the refusal request instruction of monitor terminal transmission, no The switching of speech interaction mode is carried out, keeps carrying out interactive voice with automatic interaction mode and user.

The present embodiment by the attendant of monitor terminal by being made whether to cut to monitor terminal sending mode switching request The judgement for changing to man-machine interactively mode improves the standard of speech interaction mode switching by the way that whether artificial judgment carries out pattern switching True property, keeps the switching of speech interaction mode more appropriate, to promote user experience.

As an embodiment of the present invention, as shown in figure 3, the above method can also include:

In S301, user images are obtained under the automatic interaction mode, and face knowledge is carried out to the user images Not, the emotional information of user is obtained.

In the present embodiment, terminal device can be able to be by image acquisition device user images, user images Image comprising user's face area.Recognition of face can be carried out to user images, the feelings of user are identified by face feature Thread obtains the emotional information of user.

In S302, determine whether the emotional information meets preset mode switching condition.

In the present embodiment, determine whether emotional information meets preset mode switching condition and may include, determine mood letter Whether breath belongs to default sensitive emotional information.Wherein presetting sensitive emotional information can be " dissatisfied ", " impatient ", " not open The emotional information of the heart " etc., for characterizing, user is dissatisfied to interactive voice, experience is poor.For example, a satisfaction threshold can be set It is pre- to determine that the emotional information belongs to if the emotional information identified is satisfied with angle value lower than preset satisfaction threshold value for value If sensitive emotional information, i.e. emotional information meet preset mode switching condition.

In S303, if the emotional information meets the preset mode switching condition, speech interaction mode is switched To the man-machine interactively mode.

In the present embodiment, emotional information meets preset mode switching condition, shows that interactive voice exception occurs or uses Situations such as family is unsatisfied with interactive voice, poor user experience need to carry out the switching of speech interaction mode, can be by interactive voice Mode is from automatic interaction pattern switching to the man-machine interactively mode.If emotional information does not meet preset mode switching condition, Show that interactive voice is normal, then maintains to carry out interactive voice with automatic interaction mode and user.

The present embodiment carries out recognition of face to user images, identifies the emotional information of user, is determined according to user emotion The switching for whether carrying out speech interaction mode, speech recognition and image recognition are combined, and judge whether to language from multi-angle Sound interactive mode switching, from the timeliness and accuracy for improving speech interaction mode switching.

As an embodiment of the present invention, as shown in figure 4, the above method can also include:

In S401, under the man-machine interactively mode, monitor terminal is sent by the voice messaging that user inputs, so that The monitor terminal plays the voice messaging, and after receiving attendant's input to the return information of the voice messaging Return to the return information.

In the present embodiment, under man-machine interactively mode, the voice messaging that user inputs is sent monitoring by terminal device Terminal.Monitor terminal receives and plays voice messaging, so that attendant knows voice messaging, and returns to voice messaging It is multiple.Attendant can receive attendant's input to the return information of monitor terminal input voice information, monitor terminal Return information after, return information is back to terminal device.

As an embodiment of the present invention, the above method can also include:

Under the man-machine interactively mode, the monitor terminal is sent by collected user images.

In the present embodiment, under man-machine interactively mode, terminal device can acquire the voice messaging and user's figure of user The voice messaging of user and user images are all sent to monitor terminal by picture, so that attendant can be from sound and image two Aspect knows the state of user speech interaction, to more preferably determine that the interaction of user is intended to, makes the reply being more suitable for.This implementation Example enables attendant more accurately to know the state of user speech interaction, mentions by the way that user images are sent to monitor terminal The reply for for being more suitable for, to promote user experience.

In S402, the return information that the monitor terminal returns is received, and the return information is subjected to voice broadcasting.

In the present embodiment, terminal device receives the return information that monitor terminal returns, and carries out voice to return information Play, thus realize attendant remotely with the interactive voice of user.

Optionally, under automatic interaction mode and under man-machine interactively mode, the speech intonation that terminal device plays is identical.Example Such as can all it be played out by same voice playing device according to same play mode.By making automatic interaction mode and artificial friendship The speech intonation played under mutual mode is identical, and speech interaction mode switching can be made more smoothly, and user is made to be detectable voice friendship The switching of mutual mode guarantees the interactive voice experience of user.

As an embodiment of the present invention, the above method can also include:

It, will be described if receiving the second mode switching command of monitor terminal transmission under the man-machine interactively mode Speech interaction mode is switched to the automatic interaction mode.

In the present embodiment, under man-machine interactively mode, attendant can be according to the shape of user during interactive voice State chooses whether man-machine interactively pattern switching returning automatic interaction mode.If attendant's selection switches back into automatic interaction mode, Second mode switching command can be inputted to monitor terminal.Second mode switching command is sent terminal device by monitor terminal. Terminal device is after the second mode switching command for receiving monitor terminal transmission, then by speech interaction mode by manual switching mould Formula is switched to automatic interaction mode.

Speech interaction mode is switched back into automatic interaction mode according to second mode switching command by the present embodiment, can be by taking Business personnel judge whether to switch back into automatic interaction mode, can not need switch back into automatic interaction in time when artificial speech is interactive Mode guarantees the efficiency of interactive voice.

As an implementation example of the invention, as shown in figure 5, voice interactive system may include terminal device, service Device and monitor terminal.Terminal device may include microphone array 51, voice extraction identification processing locality module 52, naturally semanteme Identify processing locality module 53, camera 54, speech pattern hand-off process module 55, voice broadcast module 56 and speaker 57.Clothes Business device may include that voice extracts identification cloud processing module 58 and natural semantics recognition cloud processing module 59.

Under automatic interaction mode, voice extracts identification processing locality module 52 and/or voice extracts the processing of identification cloud Module 58 obtains the voice of user by processing such as noise reduction, orientation pickups from microphone display 51, carries out analog-to-digital conversion, then pass through Local and/or server identify generation text.Natural semantics recognition processing locality module 53 and/or natural semantics recognition cloud Processing module 59 carries out semantics recognition to the text of generation, predicts the intention of user, and obtain replying text accordingly.Voice It is sound that broadcasting module 56, which will reply text conversion, then is broadcasted by speaker 57.

Under man-machine interactively mode, monitor terminal by the voices of 51 monitoring users of microphone array on terminal device, And the video of user is obtained by the camera 54 of terminal device.Monitor terminal, which obtains attendant, believes the reply of user speech Return information conversion word flow is returned to terminal device by breath.The voice broadcast module 56 of terminal device is by word flow with voice Form broadcast.

Under automatic interaction mode, speech pattern hand-off process module 55 is according to the judgement of the voice and/or image of user It is no to meet preset mode switching condition；If judging to meet preset mode switching condition, switch to monitor terminal sending mode Request, and maintain to carry out interactive voice with automatic interaction mode and user.After monitor terminal receives mode switch request, obtain Take and play the voice or image of user in interactive process.If attendant's judgement needs to carry out pattern switching, prison can be passed through Control terminal sends first mode switching command to terminal device, and terminal device is man-machine interactively mould by automatic interaction pattern switching Formula；If attendant's judgement does not need to carry out pattern switching, refusal request can be sent to terminal device by monitor terminal and referred to It enables, terminal device keeps carrying out interactive voice with automatic interaction mode and user.

It, can be by monitor terminal to end if attendant's judgement needs to carry out pattern switching under man-machine interactively mode End equipment sends second mode switching command, and terminal device is automatic interaction mode by man-machine interactively pattern switching.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

Corresponding to voice interactive method described in foregoing embodiments, Fig. 6 shows voice provided in an embodiment of the present invention and hands over The schematic diagram of mutual device.For ease of description, only the parts related to this embodiment are shown.

Referring to Fig. 6, which includes identification module 61, determination module 62 and switching module 63.

Identification module 61 is believed for obtaining the voice messaging of user's input under automatic interaction mode, and to the voice Breath is identified；

Determination module 62, for determining whether recognition result meets preset mode switching condition；

Switching module 63, if meeting the preset mode switching condition for the recognition result, by interactive voice mould Formula is switched to man-machine interactively mode；The speech interaction mode includes the automatic interaction mode and the man-machine interactively mode.

Optionally, the determination module 62 is used for:

Optionally, the switching module 63 is used for:

If the recognition result meets the preset mode switching condition, to monitor terminal sending mode switching request；

After receiving the first mode switching command that the monitor terminal is sent, speech interaction mode is switched to manually Interactive mode.

Optionally, the switching module 63 is used for:

Optionally, further include processing module, the processing module is used for:

User images are obtained under the automatic interaction mode, and recognition of face is carried out to the user images, are used The emotional information at family；

Determine whether the emotional information meets preset mode switching condition；

If the emotional information meets the preset mode switching condition, speech interaction mode is switched to described artificial Interactive mode.

Optionally, the processing module is used for:

Under the man-machine interactively mode, monitor terminal is sent by the voice messaging that user inputs, so that the monitoring Voice messaging described in terminal plays, and after receiving attendant's input to the return information of the voice messaging described in return Return information；

The return information that the monitor terminal returns is received, and the return information is subjected to voice broadcasting.

Optionally, the processing module is used for:

Fig. 7 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in fig. 7, the terminal of the embodiment is set Standby 7 include: processor 70, memory 71 and are stored in the meter that can be run in the memory 71 and on the processor 70 Calculation machine program 72, such as program.The processor 70 realizes above-mentioned each embodiment of the method when executing the computer program 72 In step, such as step 101 shown in FIG. 1 is to 103.Alternatively, reality when the processor 70 executes the computer program 72 The function of each module/unit in existing above-mentioned each Installation practice, such as the function of module 61 to 63 shown in Fig. 6.

Illustratively, the computer program 72 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 71, and are executed by the processor 70, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 72 in the terminal device 7 is described.

The terminal device 7 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that Fig. 7 The only example of terminal device 7 does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net Network access device, bus, display etc..

Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk or interior of terminal device 7 It deposits.The memory 71 is also possible to the External memory equipment of the terminal device 7, such as be equipped on the terminal device 7 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7 Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device Other programs and data.The memory 71 can be also used for temporarily storing the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code Dish, CD, computer storage, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the meter The content that calculation machine readable medium includes can carry out increase and decrease appropriate according to the requirement made laws in jurisdiction with patent practice, Such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and electricity Believe signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of voice interactive method characterized by comprising

Determine whether recognition result meets preset mode switching condition；

If the recognition result meets the preset mode switching condition, speech interaction mode is switched to man-machine interactively mould Formula；The speech interaction mode includes the automatic interaction mode and the man-machine interactively mode.

2. voice interactive method as described in claim 1, which is characterized in that whether the judgement recognition result meets default mould Formula switching condition includes:

3. voice interactive method as described in claim 1, which is characterized in that if the recognition result meets described preset Mode changeover condition, then speech interaction mode is switched to man-machine interactively mode includes:

After receiving the first mode switching command that the monitor terminal is sent, speech interaction mode is switched to man-machine interactively Mode.

4. voice interactive method as claimed in claim 3, which is characterized in that described to monitor terminal sending mode switching request Include:

To the monitor terminal sending mode switching request and interactive voice procedural information；The interactive voice procedural information is used for Indicate that the monitor terminal plays the interactive voice procedural information, the mode switch request is used to indicate the monitor terminal The first mode switching command is returned after the first mode switching command for receiving attendant's input.

5. voice interactive method as described in claim 1, which is characterized in that further include:

User images are obtained under the automatic interaction mode, and recognition of face is carried out to the user images, obtain user's Emotional information；

If the emotional information meets the preset mode switching condition, speech interaction mode is switched to the man-machine interactively Mode.

6. voice interactive method as described in claim 1, which is characterized in that further include:

Under the man-machine interactively mode, monitor terminal is sent by the voice messaging that user inputs, so that the monitor terminal The voice messaging is played, and returns to the reply after receiving attendant's input to the return information of the voice messaging Information；

7. voice interactive method as claimed in claim 6, which is characterized in that further include:

8. voice interactive method as described in any one of claim 1 to 7, which is characterized in that further include:

Under the man-machine interactively mode, if the second mode switching command of monitor terminal transmission is received, by the voice Interactive mode is switched to the automatic interaction mode.

9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 8 when executing the computer program The step of any one the method.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 8 of realization the method.