CN108510986A - Voice interactive method, device, electronic equipment and computer readable storage medium - Google Patents
Voice interactive method, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN108510986A CN108510986A CN201810186219.3A CN201810186219A CN108510986A CN 108510986 A CN108510986 A CN 108510986A CN 201810186219 A CN201810186219 A CN 201810186219A CN 108510986 A CN108510986 A CN 108510986A
- Authority
- CN
- China
- Prior art keywords
- voice
- target object
- voice messaging
- interactive
- default
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72451—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to schedules, e.g. using calendar applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72457—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to geographic location
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72484—User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
Abstract
The embodiment of the present disclosure discloses a kind of voice interactive method, device, electronic equipment and computer readable storage medium.The method includes:In response to activating the predeterminable event of interactive voice, exports and preset voice messaging;Feedback information of the target object to the default voice messaging of the interactive voice is obtained, the feedback information is non-voice information;When the feedback information of the target object meets preset condition, interactive voice information is exported.The embodiment of the present disclosure actively initiates interactive voice process by intelligent sound output equipment based on predeterminable event, and when the feedback information of user meets preset condition, the just follow-up specific interactive voice information of output, intelligent sound output equipment can be made to be applied to more usage scenarios, and voice output can carried out when determining that user is in interactive voice state, it avoids omitting important voice messaging, improves user experience.
Description
Technical field
This disclosure relates to field of artificial intelligence, and in particular to a kind of voice interactive method, device, electronic equipment and meter
Calculation machine readable storage medium storing program for executing.
Background technology
With the development of artificial intelligence technology, the correlated performance of natural-sounding treatment technology has obtained great promotion.Language
Sound identification is more and more being applied on various intelligent sound output equipments, such as intelligent sound box, smart mobile phone, intelligence
Tablet computer, internet of things equipment etc..It has been more and more intelligence that natural-sounding treatment technology, which is applied in interactive process,
The essential road of energy voice-output device, natural-sounding interaction is just as man-machine interaction mode new after touch screen.
Invention content
A kind of voice interactive method of embodiment of the present disclosure offer, device, electronic equipment and computer readable storage medium.
In a first aspect, providing a kind of voice interactive method in the embodiment of the present disclosure.
Specifically, the voice interactive method, including:
In response to activating the predeterminable event of interactive voice, exports and preset voice messaging;
Feedback information of the target object to the default voice messaging of the interactive voice is obtained, the feedback information is
Non-voice information;
When the feedback information of the target object meets preset condition, interactive voice information is exported.
Optionally, it is described in response to activate interactive voice predeterminable event, export preset voice messaging, including it is following at least
One of:
In response to reaching the preset time, the default voice messaging is exported;
In response to receiving presupposed information, the default voice messaging is exported;
When in response to sensing the target object within the scope of interactive voice, the default voice messaging is exported.
When optionally, in response to sensing the target object within the scope of interactive voice, the output default voice letter
Breath, including:
Obtain the first image data within the scope of the interactive voice;
When identifying the target object according to described first image data, the default voice messaging is exported.
Optionally, feedback information of the target object to the default voice messaging of the interactive voice is obtained, including:
Obtain the second image data after the default voice messaging output;
Determine whether the target object receives the default voice messaging according to second image data.
Optionally, determine whether the target object receives the default voice and believe according to second image data
Breath, including:
When determining that the target object is within the scope of interactive voice according to second image data, the target is determined
Object receives the default voice messaging;Alternatively,
Determining the orientation of the facial information of target object described in second image data and the default voice letter
When the orientation of the output equipment of breath is in the first default error range, determine that the target object receives the default voice letter
Breath.
Optionally, the target object of the interactive voice is obtained to the feedback information of the default voice messaging, further includes:
Obtain the second image data after the default voice messaging output;
By comparing the first picture number obtained before second image data and the output default voice messaging
According to determining whether the target object receives the default voice messaging.
Optionally, by comparing first obtained before second image data and the output default voice messaging
Image data, determines whether the target object receives the default voice messaging, including:
Identify the mesh in the first face and the second image data of target object described in described first image data
Mark the second face of object;
By comparing the facial information of first face and second face, determine whether the target object receives
To the default voice messaging.
Optionally, the target object of the interactive voice is obtained to the feedback information of the default voice messaging, further includes:
Determine whether to receive location information of the target object within the scope of interactive voice.
Optionally, voice interactive method further includes:
When the feedback information of the target object is unsatisfactory for preset condition, the default voice messaging is retransmitted.
Optionally, the default voice messaging is retransmitted, including:
When determining the target object not within the scope of interactive voice, delay sends the default voice messaging;Alternatively,
When determining that the target object is within the scope of interactive voice, improves volume and send the default voice messaging.
Second aspect, the embodiment of the present disclosure provide a kind of voice interaction device, including:
First output module is configured to respond to the predeterminable event of activation interactive voice, exports and presets voice messaging;
First acquisition module is configured as obtaining the target object of the interactive voice to the anti-of the default voice messaging
Feedforward information, the feedback information are non-voice information;
Second output module is configured as when the feedback information of the target object meets preset condition, exports voice
Interactive information.
Optionally, first output module, including at least one of:
First response submodule, is configured to respond to reach the preset time, exports the default voice messaging;
Second response submodule, is configured to respond to receive presupposed information, exports the default voice messaging;
Third responds submodule, defeated when being configured to respond to sense the target object within the scope of interactive voice
Go out the default voice messaging.
Optionally, first output module, including:
First acquisition submodule is configured as obtaining the first image data within the scope of the interactive voice;
First output sub-module, when being configured as identifying the target object according to described first image data, output
The default voice messaging.
Optionally, first acquisition module, including:
Second acquisition submodule is configured as obtaining the second image data after the default voice messaging output;
First determination sub-module is configured as determining whether the target object receives according to second image data
The default voice messaging.
Optionally, first determination sub-module, including:
Second determination sub-module is configured as determining the target object in interactive voice according to second image data
Within the scope of when, determine that the target object receives the default voice messaging;Alternatively,
Third determination sub-module is configured as in the facial information for determining target object described in second image data
Orientation and the default voice messaging output equipment orientation in the first default error range when, determine the target pair
As receiving the default voice messaging.
Optionally, first acquisition module further includes:
Third acquisition submodule is configured as obtaining the second image data after the default voice messaging output;
4th determination sub-module is configured as by comparing second image data and the output default voice letter
The first image data obtained before is ceased, determines whether the target object receives the default voice messaging.
Optionally, the 4th determination sub-module, including:
It identifies submodule, is configured as the first face and second of target object described in identification described first image data
Second face of the target object in image data;
5th determination sub-module is configured as the facial information by comparing first face and second face,
Determine whether the target object receives the default voice messaging.
Optionally, first acquisition module further includes:
6th determination sub-module is configured to determine whether to receive position of the target object within the scope of interactive voice
Confidence ceases.
Optionally, voice interaction device further includes:
Sending module is configured as when the feedback information of the target object is unsatisfactory for preset condition, retransmits institute
State default voice messaging.
Optionally, the sending module, including:
First sending submodule is configured as when determining the target object not within the scope of interactive voice, delay hair
Send the default voice messaging;Alternatively,
Second sending submodule is configured as when determining that the target object is within the scope of interactive voice, improves volume
Send the default voice messaging.
The function can also execute corresponding software realization by hardware realization by hardware.The hardware or
Software includes one or more modules corresponding with above-mentioned function.
In a possible design, the structure of voice interaction device includes memory and processor, the memory
The computer instruction of voice interactive method in above-mentioned first aspect is executed for storing one or more support voice interaction device,
The processor is configurable for executing the computer instruction stored in the memory.The voice interaction device can be with
Including communication interface, for voice interaction device and other equipment or communication.
The third aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor;Wherein, described
Memory is for storing one or more computer instruction, wherein one or more computer instruction is by the processor
It executes to realize the method and step described in first aspect.
Fourth aspect, the embodiment of the present disclosure provide a kind of computer readable storage medium, for storaged voice interaction dress
Computer instruction used is set, it includes refer to for executing the computer in above-mentioned first aspect involved by voice interactive method
It enables.
The technical solution that the embodiment of the present disclosure provides can include the following benefits:
The embodiment of the present disclosure actively exports default voice by the triggering of intelligent sound output equipment internal preset event
Information, and acquisition target object presets this feedback information of voice messaging after voice messaging is preset in output, and in feedback letter
When breath meets preset condition, subsequent interactive voice information is exported.The embodiment of the present disclosure is based on by intelligent sound output equipment
Predeterminable event actively initiates interactive voice process, and when the feedback information of user meets preset condition, just the follow-up tool of output
The interactive voice information of body can make intelligent sound output equipment be applied to more usage scenarios, and can be in determination
User carries out voice output when being in interactive voice state, avoids omitting important voice messaging, improves user experience.
It should be understood that above general description and following detailed description is only exemplary and explanatory, not
The disclosure can be limited.
Description of the drawings
In conjunction with attached drawing, by the detailed description of following non-limiting embodiment, the other feature of the disclosure, purpose and excellent
Point will be apparent.In the accompanying drawings:
Fig. 1 shows the flow chart of the voice interactive method according to one embodiment of the disclosure;
Fig. 2 shows the flow charts of the step S101 of embodiment according to Fig. 1;
Fig. 3 shows the flow chart of the step S102 of embodiment according to Fig. 1;
Fig. 4 shows the another flow chart of the step S102 of embodiment according to Fig. 1;
Fig. 5 shows the flow chart of the step S402 of embodiment according to Fig.4,;
Fig. 6 shows the structure diagram of the voice interaction device according to one embodiment of the disclosure;
Fig. 7 is adapted for for realizing that the structure of the electronic equipment of the voice interactive method according to one embodiment of the disclosure is shown
It is intended to.
Specific implementation mode
Hereinafter, the illustrative embodiments of the disclosure will be described in detail with reference to the attached drawings, so that those skilled in the art can
Easily realize them.In addition, for the sake of clarity, the portion unrelated with description illustrative embodiments is omitted in the accompanying drawings
Point.
In the disclosure, it should be appreciated that the term of " comprising " or " having " etc. is intended to refer to disclosed in this specification
Feature, number, step, behavior, the presence of component, part or combinations thereof, and be not intended to exclude other one or more features,
Number, step, behavior, component, part or combinations thereof there is a possibility that or be added.
It also should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure
It can be combined with each other.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the flow chart of the voice interactive method according to one embodiment of the disclosure.As shown in Figure 1, the voice
Exchange method includes the following steps S101-S103:
In step S101, in response to activating the predeterminable event of interactive voice, exports and preset voice messaging;
In step s 102, feedback information of the target object to the default voice messaging of the interactive voice is obtained,
The feedback information is non-voice information;
In step s 103, when the feedback information of the target object meets preset condition, interactive voice information is exported.
After current natural-sounding interaction is substantially activated by voice or physical button by user, into subsequent people
Machine interactive process.Although this method can be suitable for most of scene, due to processing capacity and the original of electric quantity consumption
Cause so that natural-sounding interaction cannot be activated independently.If natural-sounding equipment active activation, intelligent sound output equipment but without
Method ensures that interactive voice is accurately received by user, and then enters a chaotic state.For example, intelligent sound output equipment is repeatedly
Active activation but enters unknowable state because that can not receive the feedback of user.Again alternatively, intelligent sound output equipment master
Dynamic activation, but user does not receive information, intelligent sound output equipment lacks acquiescence and has received information, then causes important information to lose
The problem of leakage.
In view of the above-mentioned problems, the embodiment of the present disclosure proposes above-mentioned voice interactive method, intelligent sound output equipment is in master
When interaction of the dynamic triggering based on natural-sounding, intelligent sound output equipment sends detection voice, i.e., one section pre-set it is pre-
If voice messaging, which can include the identification information of target user, such as " Mr. Wang ".Hereafter, intelligent language
Sound output equipment obtains feedback information of the target object for above-mentioned detection voice, which can be non-voice information,
Intelligent sound output equipment judges the subsequent voice information for determining to send according to acquired feedback information.Intelligent sound exports
Equipment can obtain the feedback information of target object by external sensor.Intelligent sound output equipment detects voice in output
Afterwards, first:Target object receives the voice messaging, it is therefore desirable to enter subsequent natural-sounding and interact;Second:Target object
It is not received by the voice messaging, therefore intelligent sound output equipment needs to suspend subsequent natural-sounding interaction;Therefore, intelligence
Voice-output device can be judged simultaneously according to the data of external sensor whether target object receives the voice messaging
The mode that follow-up natural-sounding interaction is carried out according to judgement is selected.For example, intelligent sound output equipment passes through image sensing
Device obtains the image of target object, and is meeting preset condition, such as target pair according to the attention of image recognition target object
As hear detection voice after, when being primarily focused on intelligent sound output equipment, it is believed that target object meet hand over
Mutual demand, therefore intelligent sound output equipment can send subsequent interactive voice information, if intelligent sound output equipment does not have
When thering is the attention for recognizing target object to be unsatisfactory for preset condition, then other measures can be taken, such as repeat playing detection
Voice, or play detection voice again after a period of time etc..Target object can be specific people or object, can also be anyone
Or object.
For example, smart mobile phone can judge that starting a natural-sounding hands over according to information such as internal schedule, e-mail arrivals
The target object on mutual opportunity, natural-sounding interaction, the information content etc. of natural-sounding interaction.Such as when marking in schedule
When time expires, smart mobile phone initiates a natural-sounding interaction for prompting.Wherein interactive object is got the bid for schedule
The information of the object of note or mobile phone owner, natural-sounding interaction are the prompting message of reminded contents.At this point, smart mobile phone can be first
First send the identification information of target object, such as " Mr. Wang, hello ".Internal information and outer can also be used in combination in smart mobile phone
Portion's sensor judges opportunity, such as time point for being identified in schedule of smart mobile phone and by the preposition of smart mobile phone
Camera captures the opportunity initiation natural-sounding interaction that user is operating smart mobile phone.For another example intelligent sound box can root
Intelligence is arrived at according to wireless distances detection sensor (RFID, BLUETOOTH, WIFI) information or imaging sensor the identification user of assembly
Energy voice mail interactive region, and then actively initiated the natural-sounding interaction that a weather is reminded.Herein, what is either used
Whether kind mode uses outer sensor, it is characterised in that in the case where user does not initiate natural-sounding interaction, by intelligence
Energy voice-output device actively initiates natural-sounding interactive process.Intelligent sound output equipment is further defined herein actively to initiate
Natural-sounding interactive process refers in a natural-sounding interactive process, and send out natural-sounding signal first is intelligent sound
Output equipment.
By the embodiment of the present disclosure, intelligent sound output equipment can actively initiate a natural-sounding according to pre-setting
Interaction, and the success status of this time interaction is determined by the feedback information of target object and selects subsequent natural-sounding interaction side
Formula.
In an optional realization method of the present embodiment, the step S101, i.e., in response to the pre- of activation interactive voice
If event, the step of presetting voice messaging is exported, further comprises at least one of:
In response to reaching the preset time, the default voice messaging is exported;
In response to receiving presupposed information, the default voice messaging is exported;
When in response to sensing the target object within the scope of interactive voice, the default voice messaging is exported.
In the optional realization method, predeterminable event can be set in intelligent sound output equipment in advance, be used
In the event of triggering interactive voice, including at least one of:
Reach the preset time, such as calendar prompting time set by user, alarm time etc.;
Receive presupposed information, such as receive new mail, important email, new information etc.;
The target object is sensed within the scope of interactive voice.
Predeterminable event can be specifically arranged according to usage scenario, not be limited herein.
In an optional realization method of the present embodiment, as shown in Fig. 2, the step S101, i.e., in response to activating language
The predeterminable event of sound interaction, exports the step of presetting voice messaging, further comprises the steps S201-S202:
In step s 201, the first image data within the scope of the interactive voice is obtained;
In step S202, when identifying the target object according to described first image data, the default language is exported
Message ceases.
In the optional realization method, intelligent sound output equipment before sending out default voice messaging and detecting voice,
The first image data within the scope of the interactive voice of intelligent sound output equipment is first obtained, and is identified from the first image data
When going out the target object for interactive voice, just exports and preset voice messaging.This mode is suitable for appearing in target object
Within the scope of interactive voice, actively initiate to appear in user for example, according to the setting of user with the interactive voice of target object
When within the scope of interactive voice, song is exported to user, or actively inquires whether the user needs to open the electric appliance of other linkages and sets
It is standby etc., it can be specifically arranged according to application scenarios.
In an optional realization method of the present embodiment, as shown in figure 3, the step S102, that is, obtain the voice
The step of interactive target object is to the feedback information of the default voice messaging, further comprises the steps S301-S302:
In step S301, the second image data after the default voice messaging output is obtained;
In step s 302, determine whether the target object receives the default language according to second image data
Message ceases.
In the optional realization method, intelligent sound output equipment leads to after voice messaging i.e. detection voice is preset in output
The second image data obtained within the scope of interactive voice is crossed, determines the feedback information of target object.For example, in the second image data
In if having identified target object, you can think that target object has received detection voice, and then continue subsequent voice letter
Breath;For another example identifying whether target object is paying close attention to intelligent sound output equipment by the second image data, if it is
It is considered that target object has received detection voice, and then continue subsequent voice messaging, it specifically can be according to practical application field
Scape is arranged.
In an optional realization method of the present embodiment, the step S302 is that is, true according to second image data
The step of whether fixed target object receives the default voice messaging, further comprises the steps:
When determining that the target object is within the scope of interactive voice according to second image data, the target is determined
Object receives the default voice messaging;Alternatively,
Determining the orientation of the facial information of target object described in second image data and the default voice letter
When the orientation of the output equipment of breath is in the first default error range, determine that the target object receives the default voice letter
Breath.
In the optional realization method, it is default can to determine whether the feedback information of target object meets by two ways
Condition:First, by the second image data obtained within the scope of interactive voice, determine that target object, can in the second image
Detection voice is had received with think target object, and intelligent sound output equipment can continue to output subsequent interactive voice letter
Breath;Second, identifying the facial information of target object by the second image data, this mode not only needs target object second
In image data, but also will when orientation of the target object towards orientation and intelligent sound output equipment is generally consistent,
It is considered that target object has received detection voice, intelligent sound output equipment can continue to output subsequent interactive voice
Information.Target object can determine that facial information includes but not limited to face contour, face towards orientation based on facial information
Direction and pupil focal length etc..First default error range can be set based on target object towards orientation and intelligent sound output
Whether standby orientation is probably unanimously arranged, and herein the size of the first default error range can determine based on experience value.
In an optional realization method of the present embodiment, as shown in figure 4, the step S102, that is, obtain the voice
The step of interactive target object is to the feedback information of the default voice messaging, further comprises the steps S401-S402:
In step S401, the second image data after the default voice messaging output is obtained;
In step S402, by comparing acquisition before second image data and the output default voice messaging
The first image data, determine whether the target object receives the default voice messaging.
In the optional realization method, send out that default voice messaging is front and back to be obtained within the scope of interactive voice by comparing
The difference of first image data and the second image data determines whether target object receives default voice messaging.For example, the
There is no target object in one image data, and occur target object in the second image data, that can consider target object
It after hearing default voice messaging, is moved within the scope of interactive voice, to receive subsequent voice messaging;For another example the first figure
As all including target object in data and the second image data, and target object is changed into from intelligent sound output equipment is not concerned with
Intelligent sound output equipment is paid close attention to, it may be considered that target object has heard default voice messaging, and prepares to receive subsequent language
Sound interactive information.
In an optional realization method of the present embodiment, as shown in figure 5, the step S402, i.e., by comparing described
The first image data obtained before second image data and the output default voice messaging, determines that the target object is
No the step of receiving the default voice messaging, further comprise the steps S501-S502:
In step S501, the first face and the second picture number of target object described in described first image data are identified
Second face of the target object in;
In step S502, by comparing the facial information of first face and second face, the mesh is determined
Whether mark object receives the default voice messaging.
In the optional realization method, determine that target object is detected receiving default voice messaging by image data
The variation of facial information determines whether target object receives default voice messaging before and after voice.It is obtained by imaging sensor
Intelligent sound output equipment sends out the first image data and the second image data before and after detection voice, and therefrom identifies
Go out the state change of target object facial information, determines whether target object receives based on the state change of facial information later
To default voice messaging.For example, face orientation, face contour and the pupil focal length etc. by identifying target object determine target
The attention of object is never transformed into intelligent sound output equipment in intelligent sound output equipment, it is believed that target object receives
Default voice messaging is arrived.It, can be by collecting training sample during realization, and train corresponding artificial intelligence mould
Type identifies whether target object receives default voice and believe by artificial intelligence model according to state change before and after target object
Breath.In this way, the accuracy for judging target object feedback information can be improved.
In an optional realization method of the present embodiment, the step S102 obtains the target of the interactive voice
The step of object is to the feedback information of the default voice messaging, further comprises:
Determine whether to receive location information of the target object within the scope of interactive voice.
In the optional realization method, the anti-of target object can also be determined by obtaining the location information of target object
Feedforward information.The location information of target object can pass through the positions such as WIFI equipment, bluetooth equipment, ZigBee equipment, radar equipment
Sensor determines.For example, target object carries WIFI equipment, the WIFI that intelligent sound output equipment passes through acquisition target object
Information determines target object whether within the scope of interactive voice, if target object is within the scope of interactive voice, it is believed that mesh
Mark object has received default voice messaging.This kind of method can be determined by position sensor specific target object whether
Within the scope of interactive voice, realization method is relatively simple, and cost is relatively low.
In an optional realization method of the present embodiment, voice interactive method further includes:
When the feedback information of the target object is unsatisfactory for preset condition, the default voice messaging is retransmitted.
It is if the feedback information of target object is unsatisfactory for preset condition, i.e., true by judging in the optional realization method
When the object that sets the goal does not receive default voice messaging, default voice messaging can be retransmitted, can be sent out again immediately
Send default voice messaging, default voice messaging can also be sent again after a period of time, specifically can according to actual conditions come
Setting, is not limited herein.
In an optional realization method of the present embodiment, the step of the above-mentioned retransmission default voice messaging, packet
It includes:
When determining the target object not within the scope of interactive voice, delay sends the default voice messaging;Alternatively,
When determining that the target object is within the scope of interactive voice, improves volume and send the default voice messaging.
In the optional realization method, if it is determined that when target object is not within the scope of interactive voice, can postpone to send
Default voice messaging when appearing within the scope of interactive voice so as to target object, sends preset voice messaging again;If target
Object is within the scope of interactive voice, when without hearing default voice messaging, can improve volume and sending default voice letter again
Breath, to cause the attention of target object.In this way, intelligent sound output equipment can not caused to enter chaotic shape
In the case of state, it is ensured that target object can receive interactive voice information, prevent from omitting important information.
The exemplary application scene of the embodiment of the present disclosure is described in detail below by specific example.
Embodiment one:
In the present embodiment, a kind of natural-sounding interaction example based on intelligent sound box is given.By transferring local deposit
The schedule information of storage, intelligent sound box obtain one " remind me to prepare birthday gift to son in the afternoon ".In one embodiment,
Intelligent sound box finds target object by camera, such as by the intelligent video camera head on intelligent sound box to the area of observation coverage
Target identification is carried out in domain, and is " owner user's Mr. Wang " by obtaining a user to the processing of facial information, and is passed through
Operating system obtains the time as " at 3 points in afternoon ", and schedule information and element information are associated by natural language processing module at this time
It calculates, obtains needing to activate a natural-sounding interactive process, and voice messaging is sent by smart mobile phone.In the present embodiment
In, intelligent terminal sends the voice messaging of " Mr. Wang, good afternoon " first, and enters the status monitoring to target object Mr. Wang
Process.Intelligent terminal persistently knows the facial information of target object by intelligent video camera head before sending the voice messaging
Not, testing result is not detect the facial information of target object.After sending voice messaging, target object is turned round or is rotated
Head is identified the facial information of target object towards smart machine, intelligent terminal, and testing result is to find target object
Facial information, and further assess target object face orientation.Intelligent terminal further detects the face of target object
Direction is consistent with intelligent terminal, that is, the collected face-image of imaging sensor faces for forward direction.At this point, intelligent terminal is known
It is clipped to the posture transfer of target object, then judges to enter follow-up interactive process.At this point, intelligent terminal do not receive it is any
When the voice messaging feedback of target object, it is issued by subsequent voice messaging " remembering to prepare birthday gift to son ".Another
In the case of outer one kind, smart machine does not monitor the posture transfer of target object after sending voice messaging, then pause is follow-up
Interactive process.Meanwhile intelligent terminal improves volume and sends the information of " Mr. Wang, good afternoon " and reenter target
Object gesture monitoring state.In other situations, smart machine recognizes the face of target object after sending voice messaging
Portion's information, but the facial information of target object is mismatched with owner's Mr. Wang, then closes natural-sounding interactive process and exist simultaneously
Installed System Memory stores up the unfinished state of the prompting.
Embodiment two:
In the present embodiment, a kind of natural-sounding interaction example based on smart mobile phone is given.In the present embodiment, intelligence
Energy mobile phone is in breath screen dormant state, therefore can not pass through display screen and user's initiation information exchange.Intelligent mobile phone system identifies
To receiving a mail and the mail is marked as " urgent ".It crosses mobile phone at this point, computer expert is used in user and is connected to net
Network, therefore smart mobile phone actively initiates a natural-sounding interaction at this time according to the presence of network share state-detection to user.
Smart mobile phone sends " Mr. Wang, hello ", and hereafter smart mobile phone starts front camera and is detected to state.Work as smart mobile phone
After the face-image for detecting user, further voice messaging " you have one to be tamping anxious mail, please check and accept " is sent.If
The front camera of smart mobile phone does not detect the facial information of target object in scheduled time section, then terminates subsequent voice
Interaction.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.
Fig. 6 shows that the structure diagram of the voice interaction device according to one embodiment of the disclosure, the device can be by soft
Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in fig. 6, the interactive voice dress
It sets including the first output module 601, the first acquisition module 602 and the second output module 603:
First output module 601 is configured to respond to the predeterminable event of activation interactive voice, exports and presets voice letter
Breath;
First acquisition module 602 is configured as obtaining the target object of the interactive voice to the default voice messaging
Feedback information, the feedback information be non-voice information;
Second output module 603 is configured as when the feedback information of the target object meets preset condition, exports language
Sound interactive information.
In an optional realization method of the present embodiment,
First output module, including at least one of:
First response submodule, is configured to respond to reach the preset time, exports the default voice messaging;
Second response submodule, is configured to respond to receive presupposed information, exports the default voice messaging;
Third responds submodule, defeated when being configured to respond to sense the target object within the scope of interactive voice
Go out the default voice messaging.
In an optional realization method of the present embodiment, first output module 601, including:
First acquisition submodule is configured as obtaining the first image data within the scope of the interactive voice;
First output sub-module, when being configured as identifying the target object according to described first image data, output
The default voice messaging.
In an optional realization method of the present embodiment, first acquisition module 602, including:
Second acquisition submodule is configured as obtaining the second image data after the default voice messaging output;
First determination sub-module is configured as determining whether the target object receives according to second image data
The default voice messaging.
In an optional realization method of the present embodiment, first determination sub-module, including:
Second determination sub-module is configured as determining the target object in interactive voice according to second image data
Within the scope of when, determine that the target object receives the default voice messaging;Alternatively,
Third determination sub-module is configured as in the facial information for determining target object described in second image data
Orientation and the default voice messaging output equipment orientation in the first default error range when, determine the target pair
As receiving the default voice messaging.
In an optional realization method of the present embodiment, first acquisition module 602 further includes:
Third acquisition submodule is configured as obtaining the second image data after the default voice messaging output;
4th determination sub-module is configured as by comparing second image data and the output default voice letter
The first image data obtained before is ceased, determines whether the target object receives the default voice messaging.
In an optional realization method of the present embodiment, the 4th determination sub-module, including:
It identifies submodule, is configured as the first face and second of target object described in identification described first image data
Second face of the target object in image data;
5th determination sub-module is configured as the facial information by comparing first face and second face,
Determine whether the target object receives the default voice messaging.
In an optional realization method of the present embodiment, first acquisition module 602 further includes:
6th determination sub-module is configured to determine whether to receive position of the target object within the scope of interactive voice
Confidence ceases.
In an optional realization method of the present embodiment, voice interaction device further includes:
Sending module is configured as when the feedback information of the target object is unsatisfactory for preset condition, retransmits institute
State default voice messaging.
In an optional realization method of the present embodiment, the sending module, including:
First sending submodule is configured as when determining the target object not within the scope of interactive voice, delay hair
Send the default voice messaging;Alternatively,
Second sending submodule is configured as when determining that the target object is within the scope of interactive voice, improves volume
Send the default voice messaging.
Above-mentioned voice interaction device and the voice interactive method described in Fig. 1 to embodiment illustrated in fig. 5 and related embodiment
Corresponding consistent, detail can refer to the above-mentioned description to voice interactive method, and details are not described herein.
Fig. 7 is adapted for the structural representation of the electronic equipment for realizing the voice interactive method according to disclosure embodiment
Figure.
As shown in fig. 7, electronic equipment 700 includes central processing unit (CPU) 701, it can be according to being stored in read-only deposit
Program in reservoir (ROM) 702 is held from the program that storage section 708 is loaded into random access storage device (RAM) 703
Various processing in the above-mentioned embodiment shown in FIG. 1 of row.In RAM703, be also stored with electronic equipment 700 operate it is required
Various programs and data.CPU701, ROM702 and RAM703 are connected with each other by bus 704.Input/output (I/O) interface
705 are also connected to bus 704.
It is connected to I/O interfaces 705 with lower component:Importation 706 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 708 including hard disk etc.;
And the communications portion 709 of the network interface card including LAN card, modem etc..Communications portion 709 via such as because
The network of spy's net executes communication process.Driver 710 is also according to needing to be connected to I/O interfaces 705.Detachable media 711, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 710, as needed in order to be read from thereon
Computer program be mounted into storage section 708 as needed.
Particularly, according to embodiment of the present disclosure, it is soft to may be implemented as computer above with reference to Fig. 1 methods described
Part program.For example, embodiment of the present disclosure includes a kind of computer program product comprising be tangibly embodied in and its readable
Computer program on medium, the computer program include the program code of the method for executing Fig. 1.In such implementation
In mode, which can be downloaded and installed by communications portion 709 from network, and/or from detachable media
711 are mounted.
Flow chart in attached drawing and block diagram, it is illustrated that according to the system, method and computer of the various embodiments of the disclosure
The architecture, function and operation in the cards of program product.In this regard, each box in course diagram or block diagram can be with
A part for a module, section or code is represented, a part for the module, section or code includes one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it wants
It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong
The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer
The combination of order is realized.
Being described in unit or module involved in disclosure embodiment can be realized by way of software, also may be used
It is realized in a manner of by hardware.Described unit or module can also be arranged in the processor, these units or module
Title do not constitute the restriction to the unit or module itself under certain conditions.
As on the other hand, the disclosure additionally provides a kind of computer readable storage medium, the computer-readable storage medium
Matter can be computer readable storage medium included in device described in the above embodiment;Can also be individualism,
Without the computer readable storage medium in supplying equipment.There are one computer-readable recording medium storages or more than one journey
Sequence, described program is used for executing by one or more than one processor is described in disclosed method.
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.People in the art
Member should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature
Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (10)
1. a kind of voice interactive method, which is characterized in that including:
In response to activating the predeterminable event of interactive voice, exports and preset voice messaging;
Feedback information of the target object to the default voice messaging of the interactive voice is obtained, the feedback information is non-language
Message ceases;
When the feedback information of the target object meets preset condition, interactive voice information is exported.
2. voice interactive method according to claim 1, which is characterized in that described in response to the default of activation interactive voice
Event exports and presets voice messaging, including at least one of:
In response to reaching the preset time, the default voice messaging is exported;
In response to receiving presupposed information, the default voice messaging is exported;
When in response to sensing the target object within the scope of interactive voice, the default voice messaging is exported.
3. voice interactive method according to claim 2, which is characterized in that in response to sensing the target object in language
When in sound interactive region, the default voice messaging is exported, including:
Obtain the first image data within the scope of the interactive voice;
When identifying the target object according to described first image data, the default voice messaging is exported.
4. voice interactive method according to claim 1, which is characterized in that obtain the target object pair of the interactive voice
The feedback information of the default voice messaging, including:
Obtain the second image data after the default voice messaging output;
Determine whether the target object receives the default voice messaging according to second image data.
5. voice interactive method according to claim 4, which is characterized in that according to second image data determination
Whether target object receives the default voice messaging, including:
When determining that the target object is within the scope of interactive voice according to second image data, the target object is determined
Receive the default voice messaging;Alternatively,
Determining the orientation of the facial information of target object described in second image data and the default voice messaging
When the orientation of output equipment is in the first default error range, determine that the target object receives the default voice messaging.
6. voice interactive method according to claim 1, which is characterized in that obtain the target object pair of the interactive voice
The feedback information of the default voice messaging further includes:
Obtain the second image data after the default voice messaging output;
By comparing the first image data obtained before second image data and the output default voice messaging, really
Whether the fixed target object receives the default voice messaging.
7. voice interactive method according to claim 6, which is characterized in that by comparing second image data and
The first image data obtained before the default voice messaging is exported, it is described default to determine whether the target object receives
Voice messaging, including:
Identify the target pair in the first face and the second image data of target object described in described first image data
The second face of elephant;
By comparing the facial information of first face and second face, determine whether the target object receives institute
State default voice messaging.
8. a kind of voice interaction device, which is characterized in that including:
First output module is configured to respond to the predeterminable event of activation interactive voice, exports and presets voice messaging;
First acquisition module is configured as obtaining feedback letter of the target object to the default voice messaging of the interactive voice
Breath, the feedback information are non-voice information;
Second output module is configured as when the feedback information of the target object meets preset condition, exports interactive voice
Information.
9. a kind of electronic equipment, which is characterized in that including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute
Processor is stated to execute to realize claim 1-7 any one of them method and steps.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction quilt
Claim 1-7 any one of them method and steps are realized when processor executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810186219.3A CN108510986A (en) | 2018-03-07 | 2018-03-07 | Voice interactive method, device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810186219.3A CN108510986A (en) | 2018-03-07 | 2018-03-07 | Voice interactive method, device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108510986A true CN108510986A (en) | 2018-09-07 |
Family
ID=63376233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810186219.3A Pending CN108510986A (en) | 2018-03-07 | 2018-03-07 | Voice interactive method, device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108510986A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584877A (en) * | 2019-01-02 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | Interactive voice control method and device |
CN110262767A (en) * | 2019-06-03 | 2019-09-20 | 清华大学 | Based on voice input Rouser, method and the medium close to mouth detection |
CN111309198A (en) * | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Information output method and information output equipment |
CN111540383A (en) * | 2019-02-06 | 2020-08-14 | 丰田自动车株式会社 | Voice conversation device, control program, and control method thereof |
CN111625094A (en) * | 2020-05-25 | 2020-09-04 | 北京百度网讯科技有限公司 | Interaction method and device for intelligent rearview mirror, electronic equipment and storage medium |
CN111724772A (en) * | 2019-03-20 | 2020-09-29 | 阿里巴巴集团控股有限公司 | Interaction method and device of intelligent equipment and intelligent equipment |
CN112002317A (en) * | 2020-07-31 | 2020-11-27 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
CN112866066A (en) * | 2021-01-07 | 2021-05-28 | 上海喜日电子科技有限公司 | Interaction method, device, system, electronic equipment and storage medium |
CN113674749A (en) * | 2021-08-26 | 2021-11-19 | 珠海格力电器股份有限公司 | Control method, control device, electronic equipment and storage medium |
US11205431B2 (en) | 2019-01-02 | 2021-12-21 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for presenting state of voice interaction device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060036478A1 (en) * | 2004-08-12 | 2006-02-16 | Vladimir Aleynikov | System, method and computer program for interactive voice recognition scheduler, reminder and messenger |
CN103869945A (en) * | 2012-12-14 | 2014-06-18 | 联想(北京)有限公司 | Information interaction method, information interaction device and electronic device |
CN104879882A (en) * | 2015-04-30 | 2015-09-02 | 广东美的制冷设备有限公司 | Method and system for controlling air conditioner |
CN106225174A (en) * | 2016-08-22 | 2016-12-14 | 珠海格力电器股份有限公司 | Air-conditioner control method and system and air-conditioner |
CN107480851A (en) * | 2017-06-29 | 2017-12-15 | 北京小豆儿机器人科技有限公司 | A kind of intelligent health management system based on endowment robot |
-
2018
- 2018-03-07 CN CN201810186219.3A patent/CN108510986A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060036478A1 (en) * | 2004-08-12 | 2006-02-16 | Vladimir Aleynikov | System, method and computer program for interactive voice recognition scheduler, reminder and messenger |
CN103869945A (en) * | 2012-12-14 | 2014-06-18 | 联想(北京)有限公司 | Information interaction method, information interaction device and electronic device |
CN104879882A (en) * | 2015-04-30 | 2015-09-02 | 广东美的制冷设备有限公司 | Method and system for controlling air conditioner |
CN106225174A (en) * | 2016-08-22 | 2016-12-14 | 珠海格力电器股份有限公司 | Air-conditioner control method and system and air-conditioner |
CN107480851A (en) * | 2017-06-29 | 2017-12-15 | 北京小豆儿机器人科技有限公司 | A kind of intelligent health management system based on endowment robot |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111309198A (en) * | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Information output method and information output equipment |
CN109584877A (en) * | 2019-01-02 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | Interactive voice control method and device |
US11205431B2 (en) | 2019-01-02 | 2021-12-21 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for presenting state of voice interaction device, and storage medium |
CN111540383A (en) * | 2019-02-06 | 2020-08-14 | 丰田自动车株式会社 | Voice conversation device, control program, and control method thereof |
CN111724772A (en) * | 2019-03-20 | 2020-09-29 | 阿里巴巴集团控股有限公司 | Interaction method and device of intelligent equipment and intelligent equipment |
CN110262767A (en) * | 2019-06-03 | 2019-09-20 | 清华大学 | Based on voice input Rouser, method and the medium close to mouth detection |
CN110262767B (en) * | 2019-06-03 | 2022-03-11 | 交互未来(北京)科技有限公司 | Voice input wake-up apparatus, method, and medium based on near-mouth detection |
CN111625094A (en) * | 2020-05-25 | 2020-09-04 | 北京百度网讯科技有限公司 | Interaction method and device for intelligent rearview mirror, electronic equipment and storage medium |
CN111625094B (en) * | 2020-05-25 | 2023-07-14 | 阿波罗智联(北京)科技有限公司 | Interaction method and device of intelligent rearview mirror, electronic equipment and storage medium |
CN112002317A (en) * | 2020-07-31 | 2020-11-27 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
CN112002317B (en) * | 2020-07-31 | 2023-11-14 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
CN112866066A (en) * | 2021-01-07 | 2021-05-28 | 上海喜日电子科技有限公司 | Interaction method, device, system, electronic equipment and storage medium |
CN113674749A (en) * | 2021-08-26 | 2021-11-19 | 珠海格力电器股份有限公司 | Control method, control device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510986A (en) | Voice interactive method, device, electronic equipment and computer readable storage medium | |
CN109002759A (en) | text recognition method, device, mobile terminal and storage medium | |
CN108701281A (en) | Contextual information engine | |
CN106330687B (en) | Message treatment method, apparatus and system | |
CN106600223A (en) | Schedule creation method and device | |
CN109067626A (en) | Report the method, apparatus and storage medium of information | |
DK201770413A1 (en) | Operational safety mode | |
CN106462229A (en) | A reminding method and a smart bracelet | |
CN109059945A (en) | Method, terminal device and the computer readable storage medium of traffic information processing | |
CN110418207A (en) | Information processing method, device and storage medium | |
KR20180081444A (en) | Apparatus and method for processing contents | |
CN107682524A (en) | A kind of information displaying method and device, terminal and readable storage medium storing program for executing | |
CN105357641B (en) | A kind of location updating control method and user terminal | |
CN108536380A (en) | Screen control method, device and mobile terminal | |
CN111161559A (en) | Method and device for reminding station arrival of public transport means | |
CN113835570B (en) | Control method, device, equipment, storage medium and program for display screen in vehicle | |
CN108494851B (en) | Application program recommended method, device and server | |
CN104601780A (en) | Method for controlling call recording | |
CN110069740A (en) | A kind of weather based reminding method, device and Related product | |
CN113848747A (en) | Intelligent household equipment control method and device | |
CN109378001A (en) | A kind of voice interactive method, device, electronic equipment and readable storage medium storing program for executing | |
CN108830552A (en) | A kind of information prompting method and intelligent door equipment | |
CN107194221A (en) | Schedule synchronization method and device | |
CN112492518A (en) | Card determination method and device, electronic equipment and storage medium | |
CN106899687A (en) | A kind of information prompting method and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180907 |
|
RJ01 | Rejection of invention patent application after publication |