CN108510986A

CN108510986A - Voice interactive method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN108510986A
Application number: CN201810186219.3A
Authority: CN
Inventors: 孙鹏飞
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2018-03-07
Filing date: 2018-03-07
Publication date: 2018-09-07

Abstract

The embodiment of the present disclosure discloses a kind of voice interactive method, device, electronic equipment and computer readable storage medium.The method includes：In response to activating the predeterminable event of interactive voice, exports and preset voice messaging；Feedback information of the target object to the default voice messaging of the interactive voice is obtained, the feedback information is non-voice information；When the feedback information of the target object meets preset condition, interactive voice information is exported.The embodiment of the present disclosure actively initiates interactive voice process by intelligent sound output equipment based on predeterminable event, and when the feedback information of user meets preset condition, the just follow-up specific interactive voice information of output, intelligent sound output equipment can be made to be applied to more usage scenarios, and voice output can carried out when determining that user is in interactive voice state, it avoids omitting important voice messaging, improves user experience.

Description

Voice interactive method, device, electronic equipment and computer readable storage medium

Technical field

This disclosure relates to field of artificial intelligence, and in particular to a kind of voice interactive method, device, electronic equipment and meter Calculation machine readable storage medium storing program for executing.

Background technology

With the development of artificial intelligence technology, the correlated performance of natural-sounding treatment technology has obtained great promotion.Language Sound identification is more and more being applied on various intelligent sound output equipments, such as intelligent sound box, smart mobile phone, intelligence Tablet computer, internet of things equipment etc..It has been more and more intelligence that natural-sounding treatment technology, which is applied in interactive process, The essential road of energy voice-output device, natural-sounding interaction is just as man-machine interaction mode new after touch screen.

Invention content

A kind of voice interactive method of embodiment of the present disclosure offer, device, electronic equipment and computer readable storage medium.

In a first aspect, providing a kind of voice interactive method in the embodiment of the present disclosure.

Specifically, the voice interactive method, including：

In response to activating the predeterminable event of interactive voice, exports and preset voice messaging；

Feedback information of the target object to the default voice messaging of the interactive voice is obtained, the feedback information is Non-voice information；

When the feedback information of the target object meets preset condition, interactive voice information is exported.

Optionally, it is described in response to activate interactive voice predeterminable event, export preset voice messaging, including it is following at least One of：

In response to reaching the preset time, the default voice messaging is exported；

In response to receiving presupposed information, the default voice messaging is exported；

When in response to sensing the target object within the scope of interactive voice, the default voice messaging is exported.

When optionally, in response to sensing the target object within the scope of interactive voice, the output default voice letter Breath, including：

Obtain the first image data within the scope of the interactive voice；

When identifying the target object according to described first image data, the default voice messaging is exported.

Optionally, feedback information of the target object to the default voice messaging of the interactive voice is obtained, including：

Obtain the second image data after the default voice messaging output；

Determine whether the target object receives the default voice messaging according to second image data.

Optionally, determine whether the target object receives the default voice and believe according to second image data Breath, including：

When determining that the target object is within the scope of interactive voice according to second image data, the target is determined Object receives the default voice messaging；Alternatively,

Determining the orientation of the facial information of target object described in second image data and the default voice letter When the orientation of the output equipment of breath is in the first default error range, determine that the target object receives the default voice letter Breath.

Optionally, the target object of the interactive voice is obtained to the feedback information of the default voice messaging, further includes：

Obtain the second image data after the default voice messaging output；

By comparing the first picture number obtained before second image data and the output default voice messaging According to determining whether the target object receives the default voice messaging.

Optionally, by comparing first obtained before second image data and the output default voice messaging Image data, determines whether the target object receives the default voice messaging, including：

Identify the mesh in the first face and the second image data of target object described in described first image data Mark the second face of object；

By comparing the facial information of first face and second face, determine whether the target object receives To the default voice messaging.

Determine whether to receive location information of the target object within the scope of interactive voice.

Optionally, voice interactive method further includes：

When the feedback information of the target object is unsatisfactory for preset condition, the default voice messaging is retransmitted.

Optionally, the default voice messaging is retransmitted, including：

When determining the target object not within the scope of interactive voice, delay sends the default voice messaging；Alternatively,

When determining that the target object is within the scope of interactive voice, improves volume and send the default voice messaging.

Second aspect, the embodiment of the present disclosure provide a kind of voice interaction device, including：

First output module is configured to respond to the predeterminable event of activation interactive voice, exports and presets voice messaging；

First acquisition module is configured as obtaining the target object of the interactive voice to the anti-of the default voice messaging Feedforward information, the feedback information are non-voice information；

Second output module is configured as when the feedback information of the target object meets preset condition, exports voice Interactive information.

Optionally, first output module, including at least one of：

First response submodule, is configured to respond to reach the preset time, exports the default voice messaging；

Second response submodule, is configured to respond to receive presupposed information, exports the default voice messaging；

Third responds submodule, defeated when being configured to respond to sense the target object within the scope of interactive voice Go out the default voice messaging.

Optionally, first output module, including：

First acquisition submodule is configured as obtaining the first image data within the scope of the interactive voice；

First output sub-module, when being configured as identifying the target object according to described first image data, output The default voice messaging.

Optionally, first acquisition module, including：

Second acquisition submodule is configured as obtaining the second image data after the default voice messaging output；

First determination sub-module is configured as determining whether the target object receives according to second image data The default voice messaging.

Optionally, first determination sub-module, including：

Second determination sub-module is configured as determining the target object in interactive voice according to second image data Within the scope of when, determine that the target object receives the default voice messaging；Alternatively,

Third determination sub-module is configured as in the facial information for determining target object described in second image data Orientation and the default voice messaging output equipment orientation in the first default error range when, determine the target pair As receiving the default voice messaging.

Optionally, first acquisition module further includes：

Third acquisition submodule is configured as obtaining the second image data after the default voice messaging output；

4th determination sub-module is configured as by comparing second image data and the output default voice letter The first image data obtained before is ceased, determines whether the target object receives the default voice messaging.

Optionally, the 4th determination sub-module, including：

It identifies submodule, is configured as the first face and second of target object described in identification described first image data Second face of the target object in image data；

5th determination sub-module is configured as the facial information by comparing first face and second face, Determine whether the target object receives the default voice messaging.

Optionally, first acquisition module further includes：

6th determination sub-module is configured to determine whether to receive position of the target object within the scope of interactive voice Confidence ceases.

Optionally, voice interaction device further includes：

Sending module is configured as when the feedback information of the target object is unsatisfactory for preset condition, retransmits institute State default voice messaging.

Optionally, the sending module, including：

First sending submodule is configured as when determining the target object not within the scope of interactive voice, delay hair Send the default voice messaging；Alternatively,

Second sending submodule is configured as when determining that the target object is within the scope of interactive voice, improves volume Send the default voice messaging.

The function can also execute corresponding software realization by hardware realization by hardware.The hardware or Software includes one or more modules corresponding with above-mentioned function.

In a possible design, the structure of voice interaction device includes memory and processor, the memory The computer instruction of voice interactive method in above-mentioned first aspect is executed for storing one or more support voice interaction device, The processor is configurable for executing the computer instruction stored in the memory.The voice interaction device can be with Including communication interface, for voice interaction device and other equipment or communication.

The third aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor；Wherein, described Memory is for storing one or more computer instruction, wherein one or more computer instruction is by the processor It executes to realize the method and step described in first aspect.

Fourth aspect, the embodiment of the present disclosure provide a kind of computer readable storage medium, for storaged voice interaction dress Computer instruction used is set, it includes refer to for executing the computer in above-mentioned first aspect involved by voice interactive method It enables.

The technical solution that the embodiment of the present disclosure provides can include the following benefits：

The embodiment of the present disclosure actively exports default voice by the triggering of intelligent sound output equipment internal preset event Information, and acquisition target object presets this feedback information of voice messaging after voice messaging is preset in output, and in feedback letter When breath meets preset condition, subsequent interactive voice information is exported.The embodiment of the present disclosure is based on by intelligent sound output equipment Predeterminable event actively initiates interactive voice process, and when the feedback information of user meets preset condition, just the follow-up tool of output The interactive voice information of body can make intelligent sound output equipment be applied to more usage scenarios, and can be in determination User carries out voice output when being in interactive voice state, avoids omitting important voice messaging, improves user experience.

It should be understood that above general description and following detailed description is only exemplary and explanatory, not The disclosure can be limited.

Description of the drawings

In conjunction with attached drawing, by the detailed description of following non-limiting embodiment, the other feature of the disclosure, purpose and excellent Point will be apparent.In the accompanying drawings：

Fig. 1 shows the flow chart of the voice interactive method according to one embodiment of the disclosure；

Fig. 2 shows the flow charts of the step S101 of embodiment according to Fig. 1；

Fig. 3 shows the flow chart of the step S102 of embodiment according to Fig. 1；

Fig. 4 shows the another flow chart of the step S102 of embodiment according to Fig. 1；

Fig. 5 shows the flow chart of the step S402 of embodiment according to Fig.4,；

Fig. 6 shows the structure diagram of the voice interaction device according to one embodiment of the disclosure；

Fig. 7 is adapted for for realizing that the structure of the electronic equipment of the voice interactive method according to one embodiment of the disclosure is shown It is intended to.

Specific implementation mode

Hereinafter, the illustrative embodiments of the disclosure will be described in detail with reference to the attached drawings, so that those skilled in the art can Easily realize them.In addition, for the sake of clarity, the portion unrelated with description illustrative embodiments is omitted in the accompanying drawings Point.

In the disclosure, it should be appreciated that the term of " comprising " or " having " etc. is intended to refer to disclosed in this specification Feature, number, step, behavior, the presence of component, part or combinations thereof, and be not intended to exclude other one or more features, Number, step, behavior, component, part or combinations thereof there is a possibility that or be added.

It also should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure It can be combined with each other.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 shows the flow chart of the voice interactive method according to one embodiment of the disclosure.As shown in Figure 1, the voice Exchange method includes the following steps S101-S103：

In step S101, in response to activating the predeterminable event of interactive voice, exports and preset voice messaging；

In step s 102, feedback information of the target object to the default voice messaging of the interactive voice is obtained, The feedback information is non-voice information；

In step s 103, when the feedback information of the target object meets preset condition, interactive voice information is exported.

After current natural-sounding interaction is substantially activated by voice or physical button by user, into subsequent people Machine interactive process.Although this method can be suitable for most of scene, due to processing capacity and the original of electric quantity consumption Cause so that natural-sounding interaction cannot be activated independently.If natural-sounding equipment active activation, intelligent sound output equipment but without Method ensures that interactive voice is accurately received by user, and then enters a chaotic state.For example, intelligent sound output equipment is repeatedly Active activation but enters unknowable state because that can not receive the feedback of user.Again alternatively, intelligent sound output equipment master Dynamic activation, but user does not receive information, intelligent sound output equipment lacks acquiescence and has received information, then causes important information to lose The problem of leakage.

In view of the above-mentioned problems, the embodiment of the present disclosure proposes above-mentioned voice interactive method, intelligent sound output equipment is in master When interaction of the dynamic triggering based on natural-sounding, intelligent sound output equipment sends detection voice, i.e., one section pre-set it is pre- If voice messaging, which can include the identification information of target user, such as " Mr. Wang ".Hereafter, intelligent language Sound output equipment obtains feedback information of the target object for above-mentioned detection voice, which can be non-voice information, Intelligent sound output equipment judges the subsequent voice information for determining to send according to acquired feedback information.Intelligent sound exports Equipment can obtain the feedback information of target object by external sensor.Intelligent sound output equipment detects voice in output Afterwards, first：Target object receives the voice messaging, it is therefore desirable to enter subsequent natural-sounding and interact；Second：Target object It is not received by the voice messaging, therefore intelligent sound output equipment needs to suspend subsequent natural-sounding interaction；Therefore, intelligence Voice-output device can be judged simultaneously according to the data of external sensor whether target object receives the voice messaging The mode that follow-up natural-sounding interaction is carried out according to judgement is selected.For example, intelligent sound output equipment passes through image sensing Device obtains the image of target object, and is meeting preset condition, such as target pair according to the attention of image recognition target object As hear detection voice after, when being primarily focused on intelligent sound output equipment, it is believed that target object meet hand over Mutual demand, therefore intelligent sound output equipment can send subsequent interactive voice information, if intelligent sound output equipment does not have When thering is the attention for recognizing target object to be unsatisfactory for preset condition, then other measures can be taken, such as repeat playing detection Voice, or play detection voice again after a period of time etc..Target object can be specific people or object, can also be anyone Or object.

For example, smart mobile phone can judge that starting a natural-sounding hands over according to information such as internal schedule, e-mail arrivals The target object on mutual opportunity, natural-sounding interaction, the information content etc. of natural-sounding interaction.Such as when marking in schedule When time expires, smart mobile phone initiates a natural-sounding interaction for prompting.Wherein interactive object is got the bid for schedule The information of the object of note or mobile phone owner, natural-sounding interaction are the prompting message of reminded contents.At this point, smart mobile phone can be first First send the identification information of target object, such as " Mr. Wang, hello ".Internal information and outer can also be used in combination in smart mobile phone Portion's sensor judges opportunity, such as time point for being identified in schedule of smart mobile phone and by the preposition of smart mobile phone Camera captures the opportunity initiation natural-sounding interaction that user is operating smart mobile phone.For another example intelligent sound box can root Intelligence is arrived at according to wireless distances detection sensor (RFID, BLUETOOTH, WIFI) information or imaging sensor the identification user of assembly Energy voice mail interactive region, and then actively initiated the natural-sounding interaction that a weather is reminded.Herein, what is either used Whether kind mode uses outer sensor, it is characterised in that in the case where user does not initiate natural-sounding interaction, by intelligence Energy voice-output device actively initiates natural-sounding interactive process.Intelligent sound output equipment is further defined herein actively to initiate Natural-sounding interactive process refers in a natural-sounding interactive process, and send out natural-sounding signal first is intelligent sound Output equipment.

By the embodiment of the present disclosure, intelligent sound output equipment can actively initiate a natural-sounding according to pre-setting Interaction, and the success status of this time interaction is determined by the feedback information of target object and selects subsequent natural-sounding interaction side Formula.

In an optional realization method of the present embodiment, the step S101, i.e., in response to the pre- of activation interactive voice If event, the step of presetting voice messaging is exported, further comprises at least one of：

In the optional realization method, predeterminable event can be set in intelligent sound output equipment in advance, be used In the event of triggering interactive voice, including at least one of：

Reach the preset time, such as calendar prompting time set by user, alarm time etc.；

Receive presupposed information, such as receive new mail, important email, new information etc.；

The target object is sensed within the scope of interactive voice.

Predeterminable event can be specifically arranged according to usage scenario, not be limited herein.

In an optional realization method of the present embodiment, as shown in Fig. 2, the step S101, i.e., in response to activating language The predeterminable event of sound interaction, exports the step of presetting voice messaging, further comprises the steps S201-S202：

In step s 201, the first image data within the scope of the interactive voice is obtained；

In step S202, when identifying the target object according to described first image data, the default language is exported Message ceases.

In the optional realization method, intelligent sound output equipment before sending out default voice messaging and detecting voice, The first image data within the scope of the interactive voice of intelligent sound output equipment is first obtained, and is identified from the first image data When going out the target object for interactive voice, just exports and preset voice messaging.This mode is suitable for appearing in target object Within the scope of interactive voice, actively initiate to appear in user for example, according to the setting of user with the interactive voice of target object When within the scope of interactive voice, song is exported to user, or actively inquires whether the user needs to open the electric appliance of other linkages and sets It is standby etc., it can be specifically arranged according to application scenarios.

In an optional realization method of the present embodiment, as shown in figure 3, the step S102, that is, obtain the voice The step of interactive target object is to the feedback information of the default voice messaging, further comprises the steps S301-S302：

In step S301, the second image data after the default voice messaging output is obtained；

In step s 302, determine whether the target object receives the default language according to second image data Message ceases.

In the optional realization method, intelligent sound output equipment leads to after voice messaging i.e. detection voice is preset in output The second image data obtained within the scope of interactive voice is crossed, determines the feedback information of target object.For example, in the second image data In if having identified target object, you can think that target object has received detection voice, and then continue subsequent voice letter Breath；For another example identifying whether target object is paying close attention to intelligent sound output equipment by the second image data, if it is It is considered that target object has received detection voice, and then continue subsequent voice messaging, it specifically can be according to practical application field Scape is arranged.

In an optional realization method of the present embodiment, the step S302 is that is, true according to second image data The step of whether fixed target object receives the default voice messaging, further comprises the steps：

In the optional realization method, it is default can to determine whether the feedback information of target object meets by two ways Condition：First, by the second image data obtained within the scope of interactive voice, determine that target object, can in the second image Detection voice is had received with think target object, and intelligent sound output equipment can continue to output subsequent interactive voice letter Breath；Second, identifying the facial information of target object by the second image data, this mode not only needs target object second In image data, but also will when orientation of the target object towards orientation and intelligent sound output equipment is generally consistent, It is considered that target object has received detection voice, intelligent sound output equipment can continue to output subsequent interactive voice Information.Target object can determine that facial information includes but not limited to face contour, face towards orientation based on facial information Direction and pupil focal length etc..First default error range can be set based on target object towards orientation and intelligent sound output Whether standby orientation is probably unanimously arranged, and herein the size of the first default error range can determine based on experience value.

In an optional realization method of the present embodiment, as shown in figure 4, the step S102, that is, obtain the voice The step of interactive target object is to the feedback information of the default voice messaging, further comprises the steps S401-S402：

In step S401, the second image data after the default voice messaging output is obtained；

In step S402, by comparing acquisition before second image data and the output default voice messaging The first image data, determine whether the target object receives the default voice messaging.

In the optional realization method, send out that default voice messaging is front and back to be obtained within the scope of interactive voice by comparing The difference of first image data and the second image data determines whether target object receives default voice messaging.For example, the There is no target object in one image data, and occur target object in the second image data, that can consider target object It after hearing default voice messaging, is moved within the scope of interactive voice, to receive subsequent voice messaging；For another example the first figure As all including target object in data and the second image data, and target object is changed into from intelligent sound output equipment is not concerned with Intelligent sound output equipment is paid close attention to, it may be considered that target object has heard default voice messaging, and prepares to receive subsequent language Sound interactive information.

In an optional realization method of the present embodiment, as shown in figure 5, the step S402, i.e., by comparing described The first image data obtained before second image data and the output default voice messaging, determines that the target object is No the step of receiving the default voice messaging, further comprise the steps S501-S502：

In step S501, the first face and the second picture number of target object described in described first image data are identified Second face of the target object in；

In step S502, by comparing the facial information of first face and second face, the mesh is determined Whether mark object receives the default voice messaging.

In the optional realization method, determine that target object is detected receiving default voice messaging by image data The variation of facial information determines whether target object receives default voice messaging before and after voice.It is obtained by imaging sensor Intelligent sound output equipment sends out the first image data and the second image data before and after detection voice, and therefrom identifies Go out the state change of target object facial information, determines whether target object receives based on the state change of facial information later To default voice messaging.For example, face orientation, face contour and the pupil focal length etc. by identifying target object determine target The attention of object is never transformed into intelligent sound output equipment in intelligent sound output equipment, it is believed that target object receives Default voice messaging is arrived.It, can be by collecting training sample during realization, and train corresponding artificial intelligence mould Type identifies whether target object receives default voice and believe by artificial intelligence model according to state change before and after target object Breath.In this way, the accuracy for judging target object feedback information can be improved.

In an optional realization method of the present embodiment, the step S102 obtains the target of the interactive voice The step of object is to the feedback information of the default voice messaging, further comprises：

In the optional realization method, the anti-of target object can also be determined by obtaining the location information of target object Feedforward information.The location information of target object can pass through the positions such as WIFI equipment, bluetooth equipment, ZigBee equipment, radar equipment Sensor determines.For example, target object carries WIFI equipment, the WIFI that intelligent sound output equipment passes through acquisition target object Information determines target object whether within the scope of interactive voice, if target object is within the scope of interactive voice, it is believed that mesh Mark object has received default voice messaging.This kind of method can be determined by position sensor specific target object whether Within the scope of interactive voice, realization method is relatively simple, and cost is relatively low.

In an optional realization method of the present embodiment, voice interactive method further includes：

It is if the feedback information of target object is unsatisfactory for preset condition, i.e., true by judging in the optional realization method When the object that sets the goal does not receive default voice messaging, default voice messaging can be retransmitted, can be sent out again immediately Send default voice messaging, default voice messaging can also be sent again after a period of time, specifically can according to actual conditions come Setting, is not limited herein.

In an optional realization method of the present embodiment, the step of the above-mentioned retransmission default voice messaging, packet It includes：

In the optional realization method, if it is determined that when target object is not within the scope of interactive voice, can postpone to send Default voice messaging when appearing within the scope of interactive voice so as to target object, sends preset voice messaging again；If target Object is within the scope of interactive voice, when without hearing default voice messaging, can improve volume and sending default voice letter again Breath, to cause the attention of target object.In this way, intelligent sound output equipment can not caused to enter chaotic shape In the case of state, it is ensured that target object can receive interactive voice information, prevent from omitting important information.

The exemplary application scene of the embodiment of the present disclosure is described in detail below by specific example.

Embodiment one：

In the present embodiment, a kind of natural-sounding interaction example based on intelligent sound box is given.By transferring local deposit The schedule information of storage, intelligent sound box obtain one " remind me to prepare birthday gift to son in the afternoon ".In one embodiment, Intelligent sound box finds target object by camera, such as by the intelligent video camera head on intelligent sound box to the area of observation coverage Target identification is carried out in domain, and is " owner user's Mr. Wang " by obtaining a user to the processing of facial information, and is passed through Operating system obtains the time as " at 3 points in afternoon ", and schedule information and element information are associated by natural language processing module at this time It calculates, obtains needing to activate a natural-sounding interactive process, and voice messaging is sent by smart mobile phone.In the present embodiment In, intelligent terminal sends the voice messaging of " Mr. Wang, good afternoon " first, and enters the status monitoring to target object Mr. Wang Process.Intelligent terminal persistently knows the facial information of target object by intelligent video camera head before sending the voice messaging Not, testing result is not detect the facial information of target object.After sending voice messaging, target object is turned round or is rotated Head is identified the facial information of target object towards smart machine, intelligent terminal, and testing result is to find target object Facial information, and further assess target object face orientation.Intelligent terminal further detects the face of target object Direction is consistent with intelligent terminal, that is, the collected face-image of imaging sensor faces for forward direction.At this point, intelligent terminal is known It is clipped to the posture transfer of target object, then judges to enter follow-up interactive process.At this point, intelligent terminal do not receive it is any When the voice messaging feedback of target object, it is issued by subsequent voice messaging " remembering to prepare birthday gift to son ".Another In the case of outer one kind, smart machine does not monitor the posture transfer of target object after sending voice messaging, then pause is follow-up Interactive process.Meanwhile intelligent terminal improves volume and sends the information of " Mr. Wang, good afternoon " and reenter target Object gesture monitoring state.In other situations, smart machine recognizes the face of target object after sending voice messaging Portion's information, but the facial information of target object is mismatched with owner's Mr. Wang, then closes natural-sounding interactive process and exist simultaneously Installed System Memory stores up the unfinished state of the prompting.

Embodiment two：

In the present embodiment, a kind of natural-sounding interaction example based on smart mobile phone is given.In the present embodiment, intelligence Energy mobile phone is in breath screen dormant state, therefore can not pass through display screen and user's initiation information exchange.Intelligent mobile phone system identifies To receiving a mail and the mail is marked as " urgent ".It crosses mobile phone at this point, computer expert is used in user and is connected to net Network, therefore smart mobile phone actively initiates a natural-sounding interaction at this time according to the presence of network share state-detection to user. Smart mobile phone sends " Mr. Wang, hello ", and hereafter smart mobile phone starts front camera and is detected to state.Work as smart mobile phone After the face-image for detecting user, further voice messaging " you have one to be tamping anxious mail, please check and accept " is sent.If The front camera of smart mobile phone does not detect the facial information of target object in scheduled time section, then terminates subsequent voice Interaction.

Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.

Fig. 6 shows that the structure diagram of the voice interaction device according to one embodiment of the disclosure, the device can be by soft Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in fig. 6, the interactive voice dress It sets including the first output module 601, the first acquisition module 602 and the second output module 603：

First output module 601 is configured to respond to the predeterminable event of activation interactive voice, exports and presets voice letter Breath；

First acquisition module 602 is configured as obtaining the target object of the interactive voice to the default voice messaging Feedback information, the feedback information be non-voice information；

Second output module 603 is configured as when the feedback information of the target object meets preset condition, exports language Sound interactive information.

In an optional realization method of the present embodiment,

First output module, including at least one of：

In an optional realization method of the present embodiment, first output module 601, including：

In an optional realization method of the present embodiment, first acquisition module 602, including：

In an optional realization method of the present embodiment, first determination sub-module, including：

In an optional realization method of the present embodiment, first acquisition module 602 further includes：

In an optional realization method of the present embodiment, the 4th determination sub-module, including：

In an optional realization method of the present embodiment, voice interaction device further includes：

In an optional realization method of the present embodiment, the sending module, including：

Above-mentioned voice interaction device and the voice interactive method described in Fig. 1 to embodiment illustrated in fig. 5 and related embodiment Corresponding consistent, detail can refer to the above-mentioned description to voice interactive method, and details are not described herein.

Fig. 7 is adapted for the structural representation of the electronic equipment for realizing the voice interactive method according to disclosure embodiment Figure.

As shown in fig. 7, electronic equipment 700 includes central processing unit (CPU) 701, it can be according to being stored in read-only deposit Program in reservoir (ROM) 702 is held from the program that storage section 708 is loaded into random access storage device (RAM) 703 Various processing in the above-mentioned embodiment shown in FIG. 1 of row.In RAM703, be also stored with electronic equipment 700 operate it is required Various programs and data.CPU701, ROM702 and RAM703 are connected with each other by bus 704.Input/output (I/O) interface 705 are also connected to bus 704.

It is connected to I/O interfaces 705 with lower component：Importation 706 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section 708 including hard disk etc.； And the communications portion 709 of the network interface card including LAN card, modem etc..Communications portion 709 via such as because The network of spy's net executes communication process.Driver 710 is also according to needing to be connected to I/O interfaces 705.Detachable media 711, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 710, as needed in order to be read from thereon Computer program be mounted into storage section 708 as needed.

Particularly, according to embodiment of the present disclosure, it is soft to may be implemented as computer above with reference to Fig. 1 methods described Part program.For example, embodiment of the present disclosure includes a kind of computer program product comprising be tangibly embodied in and its readable Computer program on medium, the computer program include the program code of the method for executing Fig. 1.In such implementation In mode, which can be downloaded and installed by communications portion 709 from network, and/or from detachable media 711 are mounted.

Flow chart in attached drawing and block diagram, it is illustrated that according to the system, method and computer of the various embodiments of the disclosure The architecture, function and operation in the cards of program product.In this regard, each box in course diagram or block diagram can be with A part for a module, section or code is represented, a part for the module, section or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.

Being described in unit or module involved in disclosure embodiment can be realized by way of software, also may be used It is realized in a manner of by hardware.Described unit or module can also be arranged in the processor, these units or module Title do not constitute the restriction to the unit or module itself under certain conditions.

As on the other hand, the disclosure additionally provides a kind of computer readable storage medium, the computer-readable storage medium Matter can be computer readable storage medium included in device described in the above embodiment；Can also be individualism, Without the computer readable storage medium in supplying equipment.There are one computer-readable recording medium storages or more than one journey Sequence, described program is used for executing by one or more than one processor is described in disclosed method.

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of voice interactive method, which is characterized in that including：

Feedback information of the target object to the default voice messaging of the interactive voice is obtained, the feedback information is non-language Message ceases；

2. voice interactive method according to claim 1, which is characterized in that described in response to the default of activation interactive voice Event exports and presets voice messaging, including at least one of：

3. voice interactive method according to claim 2, which is characterized in that in response to sensing the target object in language When in sound interactive region, the default voice messaging is exported, including：

Obtain the first image data within the scope of the interactive voice；

4. voice interactive method according to claim 1, which is characterized in that obtain the target object pair of the interactive voice The feedback information of the default voice messaging, including：

Obtain the second image data after the default voice messaging output；

5. voice interactive method according to claim 4, which is characterized in that according to second image data determination Whether target object receives the default voice messaging, including：

When determining that the target object is within the scope of interactive voice according to second image data, the target object is determined Receive the default voice messaging；Alternatively,

Determining the orientation of the facial information of target object described in second image data and the default voice messaging When the orientation of output equipment is in the first default error range, determine that the target object receives the default voice messaging.

6. voice interactive method according to claim 1, which is characterized in that obtain the target object pair of the interactive voice The feedback information of the default voice messaging further includes：

Obtain the second image data after the default voice messaging output；

By comparing the first image data obtained before second image data and the output default voice messaging, really Whether the fixed target object receives the default voice messaging.

7. voice interactive method according to claim 6, which is characterized in that by comparing second image data and The first image data obtained before the default voice messaging is exported, it is described default to determine whether the target object receives Voice messaging, including：

Identify the target pair in the first face and the second image data of target object described in described first image data The second face of elephant；

By comparing the facial information of first face and second face, determine whether the target object receives institute State default voice messaging.

8. a kind of voice interaction device, which is characterized in that including：

First acquisition module is configured as obtaining feedback letter of the target object to the default voice messaging of the interactive voice Breath, the feedback information are non-voice information；

Second output module is configured as when the feedback information of the target object meets preset condition, exports interactive voice Information.

9. a kind of electronic equipment, which is characterized in that including memory and processor；Wherein,

The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute Processor is stated to execute to realize claim 1-7 any one of them method and steps.

10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction quilt Claim 1-7 any one of them method and steps are realized when processor executes.