CN110187766A - A kind of control method of smart machine, device, equipment and medium - Google Patents
A kind of control method of smart machine, device, equipment and medium Download PDFInfo
- Publication number
- CN110187766A CN110187766A CN201910470773.9A CN201910470773A CN110187766A CN 110187766 A CN110187766 A CN 110187766A CN 201910470773 A CN201910470773 A CN 201910470773A CN 110187766 A CN110187766 A CN 110187766A
- Authority
- CN
- China
- Prior art keywords
- smart machine
- voice data
- interaction
- intended
- target object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000003993 interaction Effects 0.000 claims abstract description 173
- 230000001815 facial effect Effects 0.000 claims abstract description 153
- 238000004590 computer program Methods 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 18
- 238000003860 storage Methods 0.000 claims description 9
- 230000002452 interceptive effect Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 12
- 238000012512 characterization method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 235000012054 meals Nutrition 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- SBNFWQZLDJGRLK-UHFFFAOYSA-N phenothrin Chemical compound CC1(C)C(C=C(C)C)C1C(=O)OCC1=CC=CC(OC=2C=CC=CC=2)=C1 SBNFWQZLDJGRLK-UHFFFAOYSA-N 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a kind of control method of smart machine, device, equipment and media, to improve the intelligence degree of smart machine.The control method of the smart machine, comprising: obtain the collected voice data of smart machine;If the attribute information of the voice data meets preset condition, according to the collected facial image of the smart machine, judge that the interaction of target object belonging to the facial image is intended to;According to the judging result that the interaction is intended to, controls the smart machine and execute corresponding operation.
Description
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of control method of smart machine, device, equipment and
Medium.
Background technique
With the continuous development of artificial intelligence, the robot industry for relying on artificial intelligence has also obtained biggish development,
Consequent, service robot are also formally landed and are come into operation in different field.
Currently, receiving the voice data of user when service robot and user interact, just being handed over user
Mutually.Under such mode, the voice data that service robot obtains may be user by handing over when service robot with other users
The voice data issued is talked, user not interacts with service robot, but service robot can still reply user, intelligent
Degree is lower.
Summary of the invention
The embodiment of the present invention provides control method, device, equipment and the medium of a kind of smart machine, sets to improve intelligence
Standby intelligence degree.
In a first aspect, the embodiment of the invention provides a kind of control methods of smart machine, comprising:
Obtain the collected voice data of smart machine;
If the attribute information of voice data meets preset condition, according to the collected facial image of smart machine, people is judged
The interaction of target object belonging to face image is intended to;
According to the judging result that interaction is intended to, control smart machine executes corresponding operation.
The control method of smart machine provided in an embodiment of the present invention, first pass through judge voice data attribute information whether
Meet preset condition, the interaction of user be intended to tentatively be judged, may filter that the voice input being intended to without judging interaction,
Reduce the consumption that power is calculated system.After the attribute information of voice data meets preset condition, according further to face figure
As judging that the interaction of target object is intended to, the accuracy that interaction is intended to judgement is improved, according to the judging result that interaction is intended to, control
Smart machine processed execute corresponding operation (for example, for interaction be intended to it is stronger reply, for interaction be intended to it is weaker not
Replied), rather than replied as the prior art is directed to each voice data got, improve smart machine
Intelligence degree, improve the interactive experience of user.
In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, preset condition includes following
One or more: the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type, voice data
Angle between Sounnd source direction and the direction of smart machine is in predetermined angle section.
The control method of smart machine provided in an embodiment of the present invention, by the semantics recognition result institute for judging voice data
Whether determining intention belongs to pre-set business type, may filter that the intention voice weaker with the business correlation of smart machine is defeated
Enter or without judging that interaction is intended to input directly in response to intention and the stronger voice of business correlation of smart machine, reduces
The consumption of power is calculated system.
By judging the angle between the Sounnd source direction of voice data and the direction of smart machine whether in predetermined angle area
Between, it may filter that the voice input not in predetermined angle section, reduce the consumption for calculating system power, while avoiding other nothings
The interference of the voice data for the object that interaction is intended to or interaction intention is weaker.
In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, acquired according to smart machine
The facial image arrived judges that the interaction of target object belonging to facial image is intended to, comprising:
If the brightness of face region is greater than predetermined luminance threshold value in facial image, according to target object in facial image
Facial angle and lip motion feature, determine target object interaction be intended to;Or
If the brightness of face region is less than or equal to predetermined luminance threshold value in facial image, according to mesh in facial image
The facial angle for marking object determines that the interaction of target object is intended to.
Since light will affect the judgement to the lip motion feature of target object in facial image, determining mesh
When marking the interaction intention of object, if the brightness of face region is greater than predetermined luminance threshold value in facial image, in conjunction with face
The facial angle of target object and lip motion feature determine that the interaction of target object is intended in image, if face in facial image
The brightness of region is less than or equal to predetermined luminance threshold value, then determines mesh according to the facial angle of target object in facial image
The interaction for marking object is intended to, and so as to avoid the influence detected to lip motion feature, what guarantee target object interaction was intended to sentences
Determine the accuracy of result.
In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, acquired according to smart machine
The facial image arrived judges that the interaction of target object belonging to facial image is intended to, further includes:
According to the Sounnd source direction of voice data, determine that the interaction of target object is intended to.
The Sounnd source direction of voice data may be used as determining whether the voice data is around smart machine in setting regions
The voice data that is issued of target object, may determine that the interaction of target object is intended to based on this, for example, smart machine with
When target object in setting regions interacts, it can determine that the voice data that Sounnd source direction is in setting regions has friendship
Mutually it is intended to or interaction intention is stronger, and the voice data that Sounnd source direction is in other regions (except setting regions) does not have
Interaction is intended to or interaction intention is weaker, therefore, when the interaction for determining target object is intended to, further according to voice data
Sounnd source direction determines that the interaction of target object is intended to, can be improved the accuracy for the judgement result that target object interaction is intended to.
In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, acquired according to smart machine
The facial image arrived judges that the interaction of target object belonging to facial image is intended to, comprising:
The range of parameter values of parameter and the corresponding relationship of intention score value are judged according to interaction intention, determine each of target object
Interaction is intended to judge the corresponding intention score value of the parameter value of parameter, and interaction is intended to judge that parameter includes facial angle, voice data
Sounnd source direction and at least one of lip motion feature;
It is intended to judge the corresponding intention score value of parameter according to each interaction of target object, determines that characterization target object has and hand over
The confidence level for the probability being mutually intended to.
By calculating the confidence level for the probability that there is target object interaction to be intended to, it can determine that target object interaction is intended to
Power, and then according to target object interaction be intended to power carry out different responses, improve the intelligent journey of smart machine
Degree.
In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, sentenced according to what interaction was intended to
Break as a result, control smart machine executes corresponding operation, comprising:
When determining that confidence level is less than or equal to the first preset threshold, determination is not responding to voice data;Or
Determine confidence level be greater than the first preset threshold, and confidence level be less than or equal to the second preset threshold when, control intelligence
Energy equipment is greater than the first preset threshold with the return information of written form output voice data, the second preset threshold;Or
When determining that confidence level is greater than the second preset threshold, control smart machine is defeated with voice broadcasting modes and text mode
The return information of voice data out.
Due to the probabilistic confidence that the interaction according to the target object determined is intended to, smart machine and target object are determined
Interactive mode, further improve the intelligence degree of smart machine, and can be avoided smart machine and beat target object
It disturbs.
In a kind of possible embodiment, this method further include:
If the intention based on determined by the semantics recognition result of voice data belongs to non-default type of service, control intelligence is set
It is standby to export the corresponding return information of voice data.
In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, if the sound source of voice data
Outside predetermined angle section, determination does not respond voice data angle between direction and the direction of smart machine.
Second aspect, the embodiment of the invention provides a kind of control devices of smart machine, comprising:
Acquiring unit, for obtaining the voice data of smart machine acquisition;
Processing unit, if the attribute information for voice data meets preset condition, according to the collected people of smart machine
Face image judges that the interaction of target object belonging to facial image is intended to;
Control unit, the judging result for being intended to according to interaction, control smart machine execute corresponding operation.
The third aspect, the embodiment of the invention also provides a kind of control equipment of smart machine, comprising: at least one processing
Device, at least one processor and computer program instructions stored in memory, when computer program instructions are by processor
The control method for the smart machine that first aspect of the embodiment of the present invention provides is realized when execution.
Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, are stored thereon with computer program
The control for the smart machine that first aspect of the embodiment of the present invention provides is realized in instruction when computer program instructions are executed by processor
Method processed.
5th aspect, the embodiment of the invention also provides a kind of computer program products comprising program code works as program
When product is run on a computing device, program code is provided for making computer equipment execute first aspect of the embodiment of the present invention
Smart machine control method.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, is implemented with the present invention
Example is used to explain the present invention together, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of schematic flow diagram of the control method of smart machine provided in an embodiment of the present invention;
Fig. 2 is the schematic flow diagram of the control for the smart machine that the embodiment of the present invention one provides;
Fig. 3 is that the principle of angle between determining Sounnd source direction provided in an embodiment of the present invention and the direction of smart machine is illustrated
Figure;
Fig. 4 is the schematic flow diagram of the control of smart machine provided by Embodiment 2 of the present invention;
Fig. 5 is the schematic flow diagram of the control for the smart machine that the embodiment of the present invention three provides;
Fig. 6 is the structural schematic diagram of the control device of smart machine provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of the control equipment of smart machine provided in an embodiment of the present invention.
Specific embodiment
Embodiments herein is illustrated below in conjunction with attached drawing, it should be understood that embodiment described herein is only used
In description and interpretation the application, it is not used to limit the application.
Below with reference to illustrating attached drawing, to the control method of smart machine provided in an embodiment of the present invention, device, equipment and
The specific embodiment of medium is illustrated.
It should be noted that the control method of smart machine provided in an embodiment of the present invention can be by the control of smart machine
Device executes, and can also be executed by the external equipment (for example, server etc.) with smart device communication.
The embodiment of the invention provides a kind of control methods of smart machine, as shown in Figure 1, may include steps of:
Step 101 obtains the collected voice data of smart machine.
Wherein, the smart machine provided in present invention implementation can be intelligent robot, be also possible to intelligent sound box, hand
The other intelligent terminals such as machine, tablet computer, it is not limited in the embodiment of the present invention.
Specifically when obtaining the voice data of smart machine acquisition, if the control of smart machine provided in an embodiment of the present invention
Scheme is executed by the controller of smart machine or control centre, then can directly be adopted by the audio collecting device in smart machine
Collect the voice data of ambient enviroment;If the control program of smart machine provided in an embodiment of the present invention is by smart device communication
External equipment (such as server) executes, then the ambient enviroment of the available smart machine sound intermediate frequency acquisition equipment acquisition of external equipment
Voice data.
Wherein, audio collecting device can include but is not limited to the microphone or microphone array configured in smart machine
Deng.
Specifically, it when the voice data of microphone or microphone array acquisition ambient enviroment, can acquire in real time, it can also
With periodical acquisition, functional switch also can be set, be just acquired after functional switch unlatching, etc., the embodiment of the present invention
It does not limit this.
If the attribute information of step 102, voice data meets preset condition, according to the collected face figure of smart machine
Picture judges that the interaction of target object belonging to facial image is intended to.
Wherein, preset condition includes one or more of: being anticipated based on determined by the semantics recognition result of voice data
Figure belongs to the angle between the direction of pre-set business type, the Sounnd source direction of voice data and smart machine in predetermined angle section
It is interior.
It should be noted that facial image is preferably got in the voice data corresponding period in the embodiment of the present invention
The facial image of acquisition.
Step 103, the judging result being intended to according to interaction, control smart machine execute corresponding operation.
Three can be divided into according to preset condition difference in the control program of smart machine provided in an embodiment of the present invention
Embodiment combines three embodiments to be illustrated the control program of smart machine provided in an embodiment of the present invention separately below.
In embodiment one, the present embodiment, existed with the angle between the Sounnd source direction of voice data and the direction of smart machine
Preset condition is used as in predetermined angle section, with judge voice data whether pair in front of the smart machine in setting regions
As (object and other surrounding objects that are such as currently interacting), to realize only to setting area in front of smart machine
The voice data of object in domain is responded, and the interference of voice data in other regions is avoided, and can also be kept away to a certain extent
Exempt from the interference of the voice data for the object that other no interactions are intended to or interaction intention is weaker.
In the present embodiment, with the angle between the Sounnd source direction of voice data and the direction of smart machine in predetermined angle area
It is interior to be used as preset condition, the judgement that preliminary interaction is intended to first is carried out to voice data, if the Sounnd source direction and intelligence of voice data
When the angle between of energy equipment is in predetermined angle section, show that voice data sets area in front of smart machine
Object in domain then further interacts the judgement of intention to voice data;If the Sounnd source direction of voice data is set with intelligence
When the standby angle between is outside predetermined angle section, shows that voice data is not from front of smart machine and set area
Object in domain then may filter that the voice data, without interacting the judgement of intention, reduces and disappears to system calculation power
Consumption.
The embodiment of the invention provides a kind of control methods of smart machine, as shown in Fig. 2, may include steps of:
Step 201 obtains the collected voice data of smart machine.
Angle between step 202, the Sounnd source direction for determining voice data and the direction of smart machine.
Wherein, the angle between the Sounnd source direction of voice data and the direction of smart machine, can be with the court of smart machine
To being origin for benchmark line, smart machine, angulation is rotated clockwise with the reference line it is positive and calculated.Such as Fig. 3
It is shown, with the direction 30 of smart machine for benchmark line, angulation is rotated clockwise with the reference line and is positive, calculates voice
Angle between the Sounnd source directions of data and the direction of smart machine, the then Sounnd source direction 31 of the voice data of user A and intelligence
Angle between the direction of equipment is angle a, for example, 15 °, the Sounnd source direction 32 of the voice data of user B and smart machine
Towards between angle be angle b, for example, 335 °.
Certainly, in other embodiments of the present invention, the angle between the Sounnd source direction of voice data and the direction of smart machine
Degree, can also using smart machine be oriented reference line, smart machine as origin, with the reference line be rotated clockwise institute at
Angle, which is positive, rotates counterclockwise angulation with the reference line is negative and is calculated.Still by taking Fig. 3 as an example, smart machine
It is oriented direction 30, angulation is rotated clockwise with the reference line and is positive for benchmark line towards 30 with smart machine, with
The reference line rotates counterclockwise angulation and is negative, and calculates between the Sounnd source direction of voice data and the direction of smart machine
Angle, then the angle between the direction of the Sounnd source direction 31 of the voice data of user A and smart machine is angle a, for example,
15 °, the angle between the Sounnd source direction 32 of the voice data of user B and the direction of smart machine is angle c, for example, -25 °.
In a kind of possible embodiment, the angle between the Sounnd source direction of voice data and the direction of smart machine is determined
When spending, human body binaural effect can be simulated determine voice data Sounnd source direction and smart machine direction between angle,
Also other aiding sensors be can use and determine the angle of Sounnd source direction and smart machine between, the embodiment of the present invention is to this
Without limitation.
Step 203, the angle between the direction for the Sounnd source direction and smart machine for determining voice data are in predetermined angle
When in section, according to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to.
Wherein, predetermined angle section can be configured according to the actual situation, such as: in order to guarantee that interaction is intended to result
Accuracy can set predetermined angle section to [0,20 °] and [345 °, 360 °] or [- 15 °, 20 °], to improve intelligence
The interaction probability of equipment and user, predetermined angle section can also be arranged it is slightly bigger, for example, predetermined angle section is also
It can be set to [0,25 °] and [335 °, 360 °] or [- 25 °, 25 °], naturally it is also possible to be set as other numerical value, the present invention
Embodiment does not limit this.
In a kind of possible embodiment, between the direction for the Sounnd source direction and smart machine for determining voice data
Outside predetermined angle section, determination does not respond voice data angle.
With specific reference to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to
When, the range of parameter values of parameter and the corresponding relationship of intention score value are first judged according to interaction intention, determine each friendship of target object
Mutually it is intended to judge the corresponding intention score value of the parameter value of parameter, interaction is intended to judge that parameter includes facial angle, voice data
Then at least one of Sounnd source direction and lip motion feature are intended to judge that parameter is corresponding according to each interaction of target object
Intention score value determines the confidence level for the probability that there is characterization target object interaction to be intended to when it is implemented, sentencing according to interaction intention
The range of parameter values of disconnected parameter and the corresponding relationship of intention score value determine that each interaction of target object is intended to judge the parameter of parameter
It is worth corresponding intention score value, interaction is intended to judge that parameter includes facial angle, and the Sounnd source direction and lip motion of voice data are special
At least one of sign.When it is implemented, each interaction according to target object is intended to judge the corresponding intention score value of parameter, determine
When characterizing the confidence level for the probability that there is target object interaction to be intended to, it can be intended to judge parameter according to each interaction of target object
The corresponding intention score value of parameter value mean value, determine characterization target object have interaction be intended to probability confidence level.
In other embodiments of the present invention, other than it can determine confidence level by the way of mean value, it can also be used
He determines for example, each interaction according to target object is intended to judge the corresponding intention score value of parameter and characterizes target object tool mode
When the confidence level for the probability for thering is interaction to be intended to, it can also be intended to judge the weight coefficient of parameter according to pre-set each interaction,
Each interaction is intended to judge that the corresponding intention score value of parameter is weighted, and determines characterization target according to weighing computation results
The confidence level for the probability that there is object interaction to be intended to.
The mode of determining confidence level is not defined in the embodiment of the present invention, as long as mesh can be determined based on intention score value
The interaction for marking object is intended to strong and weak mode and is applicable in.
Step 204, the judging result being intended to according to interaction, control smart machine execute corresponding operation.
When it is implemented, control smart machine, which executes corresponding operation, includes the following three types mode:
Mode one, determine confidence level be less than or equal to the first preset threshold when, determination be not responding to voice data.
Specifically, confidence level is less than or equal to the first preset threshold, shows that target object interaction is intended to weaker or does not have
Have interactive intention, therefore, in order to improve the intelligence degree of smart machine, voice data can be not responding to, namely not with voice
Data said target object interacts.
Wherein, the first preset threshold can be configured according to the actual situation, such as: the first preset threshold can be arranged
0.7, for the interaction probability for improving smart machine and user, the first preset threshold can also be pre-seted to slightly a little bit smaller, example
Such as, the first preset threshold may be arranged as 0.6, naturally it is also possible to be set as other numerical value, the embodiment of the present invention does not do this
It limits.
Mode two, determine confidence level be greater than the first preset threshold, and confidence level be less than or equal to the second preset threshold when,
Smart machine is controlled with the return information of written form output voice data, the second preset threshold is greater than the first preset threshold.
Specifically, confidence level is greater than the first preset threshold, and confidence level is less than or equal to the second preset threshold, shows mesh
Marking object, there may be interactions to be intended to, but interaction intention is not strong, therefore, in order to guarantee the interactive experience of user, and reduces
Smart machine bothers target object, can control the return information that smart machine exports voice data with written form, with
Target object interacts.
It should be noted that the second preset threshold can be configured according to the actual situation, and such as: second can be preset
Threshold value is set as 0.8, in order to improve the intelligence degree of smart machine, the second preset threshold can also be arranged slightly big by one
Point, for example, the second preset threshold may be arranged as 0.9, naturally it is also possible to be set as other numerical value, the embodiment of the present invention is to this
Without limitation.
Mode three, determine confidence level be greater than the second preset threshold when, control smart machine with voice broadcasting modes and text
Word mode exports the return information of voice data.
Specifically, confidence level is greater than the second preset threshold, and showing target object, there are stronger interactions to be intended to, therefore,
In order to guarantee the interactive process with target object, smart machine can control with voice broadcasting modes and text mode output voice
The return information of data, interacts with target object.
Combine specific embodiment that the control method of smart machine provided in this embodiment is illustrated above, under
Face is in conjunction with specific embodiments to the specific mistake for judging that the interaction of target object belonging to facial image is intended in the embodiment of the present invention
Journey is described in detail.
When specifically executing step 203, since recognition result of the light to facial image has larger impact, in light condition
In the case where comparing badly, facial image recognition result may be especially inaccuracy to the identification of lip motion feature
, if still considering at this time, lip motion feature interacts the judgement of intention, and interaction is intended to judging result and is likely to occur deviation.
In the case where detecting that light is in relatively severe, adjustable interaction is intended to judge parameter, improves the accurate of judging result
Property and robustness to light.
Specifically, can determine judgement according to the brightness of face region in the collected facial image of smart machine
The interaction of target object is intended to used parameter (i.e. interaction is intended to judge parameter).It is specific as follows:
In a kind of possible embodiment, if the brightness of face region is greater than predetermined luminance threshold in facial image
Value determines that the interaction of target object is intended to according to the facial angle of target object in facial image and lip motion feature.
In alternatively possible embodiment, if the brightness of face region is less than or equal to default in facial image
Luminance threshold, in order to avoid the influence because of light problem to lip motion feature recognition result, so that interaction be caused to be intended to judgement
As a result there is the problem of deviation, according to the facial angle of target object in facial image, can determine the interaction meaning of target object
Figure.
Further, when obtaining the voice data of smart machine acquisition, multiple voice data, multiple voice numbers are got
According to Sounnd source direction may be different, if smart machine is interacted with the object in its front specific region, intelligence is set
Standby to can be determined that there is voice data of the Sounnd source direction in specific region interaction to be intended to, the voice data of other Sounnd source directions is not
It is intended to interaction.Based on this, smart machine in the facial angle and lip motion feature according to target object in facial image,
When determining that the interaction of target object is intended to, the Sounnd source direction that can be combined with voice data determines that the interaction of target object is intended to.
Similarly, smart machine is in the facial angle according to target object in facial image, when determining that the interaction of target object is intended to,
It can determine that the interaction of target object is intended in conjunction with the Sounnd source direction of voice data.
Wherein, the facial angle mentioned in the embodiment of the present invention is the face's direction and intelligence of target object in facial image
Equipment direction between angle value, equally can using smart machine be oriented reference line, smart machine as origin, with this
Reference line is rotated clockwise angulation and is positive, and is calculated, and for details, reference can be made to the directions of Sounnd source direction and smart machine
Between angle calculation, details are not described herein again.
In the embodiment of the present invention, predetermined luminance threshold value can be configured according to the actual situation, for example, can will preset bright
Degree threshold value is set as 70 candelas, may be arranged as 80 candelas, naturally it is also possible to be set as other numerical value, specific value can
It is configured with experimental actual conditions, it is not limited in the embodiment of the present invention.
When it is implemented, can be arrived based on continuous acquisition when determining the lip motion feature of target object in facial image
N (N is positive integer) frame facial image in some or all of the lip feature of target object in facial image judge mesh
Whether mark object has lip motion feature, can also be based on some or all of collecting in facial image people in preset duration
The lip feature of target object in face image judges.
For example, if continuous acquisition to 5 frame facial images in target object lip feature difference when, it is determined that
There are lip motion features for target object in facial image;For another example, if continuous acquisition to 5 frame facial images in have 3 frame faces
When the lip feature difference of the target object in image, it is determined that there are lip motion features for target object in facial image;Again
Such as, if (such as 20ms) collects the lip feature difference of the target object in facial image in preset duration, face figure is determined
There are lip motion features for target object as in.
Based on any of the above-described embodiment, in a kind of possible embodiment, according to the collected face figure of smart machine
Picture, judge target object belonging to facial image interaction be intended to when, according to interaction intention judge parameter range of parameter values and
The corresponding relationship of intention score value determines that each interaction of target object is intended to judge the corresponding intention score value of the parameter value of parameter, hands over
Mutually it is intended to judge that parameter includes facial angle, at least one of the Sounnd source direction of voice data and lip motion feature.In turn
It is strong and weak can to judge that the interaction of target object is intended to based on obtained intention score value.
It should be noted that interaction is intended to judge that the numberical range of parameter can be configured according to the actual situation, this hair
Bright embodiment does not limit this.
Wherein, intention score value can be with the form characterization of fractional value (intention score value full marks is 1.0), can also be with nature
The form characterization of number (intention score value full marks are 100), it is not limited in the embodiment of the present invention.
It should be noted that judging the range of parameter values of parameter and the corresponding relationship of intention score value according to interaction intention,
When determining that each interaction of target object is intended to judge the corresponding intention score value of the parameter value of parameter, following situation is specifically included:
Situation one, interaction are intended to judge that parameter includes lip motion feature, and the parameter value of parameter is judged according to interaction intention
The corresponding relationship of range and intention score value determines the corresponding intention score value of the lip motion feature of target object.
In a kind of possible embodiment, it can determine that there are lip motion features in the facial image of acquisition first
Number of image frames determines then according to there are the corresponding relationships of the number of image frames numberical range of lip motion feature and intention score value
The corresponding intention score value of the lip motion feature of target object.
For example, it is assumed that the number of image frames numberical range of lip motion feature is [0,10] in facial image, wherein face figure
The corresponding relationship of the number of image frames numberical range of lip motion feature and intention score value as in are as follows: the picture frame of lip motion feature
Number numerical value be [6,10] corresponding intention score value be 0.9, [4,6) corresponding intention score value is 0.8, [2,4) corresponding intention divides
Value is 0.7, [0,2) corresponding intention score value is 0.6.
Determine acquisition facial image in there are the number of image frames of lip feature be 5 when, determine that lip feature is corresponding
Intention score value is 0.8;Determine acquisition facial image in there are the number of image frames of lip feature be 8 when, determine lip feature
Corresponding intention score value is 0.9.
In another possible embodiment, the time is calculated in order to save, reduces calculation amount, quickly determines that lip motion is special
Levy corresponding intention score value, can only in facial image there are lip motion feature and there is no lip motion features to set respectively
Set corresponding intention score value.For example, target object is there are when lip motion feature in determining facial image, by lip motion spy
It levies corresponding intention score value and is determined as the first default intention score value, lip motion is not present in target object in determining facial image
When feature, the corresponding intention score value of lip motion feature is determined as the second default intention score value.
Wherein, the first default intention point can be set to 0.7, naturally it is also possible to be set as other numerical value, the second default meaning
It can be set to 0 to point, may be set to be 0.1, it is not limited in the embodiment of the present invention.
Situation two judges parameter according to interaction intention if interaction is intended to judge that parameter includes the Sounnd source direction of voice data
Range of parameter values and intention score value corresponding relationship, when determining the corresponding intention score value of the Sounnd source direction of voice data, first
Can determine voice data Sounnd source direction and smart machine direction between angle, according to the Sounnd source direction of voice data with
The corresponding relationship of angular values range and intention score value between the direction of smart machine, determine the Sounnd source direction of voice data with
The corresponding intention score value of angle between the direction of smart machine, and the intention score value is determined as to the Sounnd source direction of voice data
Corresponding intention score value.
In an example it is assumed that the numerical value model of the angle between the Sounnd source direction of voice data and the direction of smart machine
It encloses for [0 °, 20 °] and [340 °, 360 °], wherein the angle between the Sounnd source direction of voice data and the direction of smart machine
Numberical range and intention score value corresponding relationship are as follows: the angle between the Sounnd source direction of voice data and the direction of smart machine
It is 0.9 for [0,5 °] and [355 °, 360 °] corresponding intention score value, (5 °, 10 °] and [350 °, 355 °) corresponding intention score value
It is 0.8, (10 °, 15 °] and [345 °, 350 °) corresponding intention score value is 0.7, (15 °, 20 °] and [340 °, 345 °) it is corresponding
Intention score value is 0.6.
When angle between the direction for the Sounnd source direction and smart machine for determining voice data is 18 °, voice number is determined
According to the corresponding intention score value of Sounnd source direction be 0.6;Between the direction for the Sounnd source direction and smart machine for determining voice data
Angle when being 2 °, determine that the corresponding intention of the Sounnd source direction of voice data is divided into 0.9;In the Sounnd source direction for determining voice data
When angle between the direction of smart machine is 340 °, determine that the corresponding intention score value of the Sounnd source direction of voice data is 0.6.
Situation three, if interaction is intended to judge that parameter includes facial angle, according to facial angle numberical range and intention score value
Corresponding relationship, determine the corresponding intention score value of facial angle.
In an example it is assumed that facial angle numberical range is [0 °, 20 °] and [340 °, 360 °], wherein face
The corresponding relationship of angular values range and intention score value are as follows: facial angle is [0,5 °] and [355 °, 360 °] corresponding intention point
Value is 0.9, and (5 °, 10 °] and [350 °, 355 °) corresponding intention score value is 0.8, (10 °, 15 °] and [345 °, 350 °) it is corresponding
Intention score value is 0.7, and (15 °, 20 °] and [340 °, 345 °) corresponding intention score value is 0.6.
When determining facial angle is 18 °, determine that the corresponding intention score value of facial angle is 0.6;Determining facial angle
When being 2 °, determine that interaction is intended to 0.9;When determining facial angle is 355 °, determine that the corresponding intention score value of facial angle is
0.9;When determining facial angle is 340 °, determine that the corresponding intention score value of face is 0.6.
Determine that the interaction of target object is intended to based on intention score value in addition to above-mentioned, in alternatively possible embodiment,
In order to save calculation amount, according to the collected facial image of smart machine, the interaction of target object belonging to facial image is judged
When intention, it can be intended to judge whether parameter is in the interaction and is intended to judge the numberical range of parameter according to each interaction, to sentence
Whether disconnected target object has an interactive intention, and interaction is intended to judge that parameter includes facial angle, the Sounnd source direction of voice data with
At least one of lip motion feature.
Further, if each interaction is intended to judge that parameter is in the interaction and is intended to judge the numberical range of parameter,
Determine that there is target object interaction to be intended to;If any interaction is intended to judge that parameter is not in the interaction and is intended to judge the numerical value of parameter
Range, it is determined that target object does not have interaction and is intended to.
For example, if interaction is intended to judge that parameter includes the Sounnd source direction of facial angle and voice data, in facial angle
In the angle in preset facial angle numberical range and between the Sounnd source direction of voice data and the direction of smart machine pre-
If angular interval in when, determine target object have interaction be intended to;For another example, if interaction is intended to judge that parameter includes face angle
Degree, the Sounnd source direction of voice data and lip motion feature, then in facial angle in preset facial angle numberical range, voice
Angle between the Sounnd source directions of data and the direction of smart machine is in preset angular interval and the facial image of acquisition
When the frame number numerical value of middle lip motion feature is in the frame number numberical range of preset lip feature, determines that target object has and hand over
Mutually it is intended to.
Wherein, interaction is intended to judge that parameter values range can be configured according to practical application scene.For example, it is assumed that people
Face angular values range is [0,20 °] and [345 °, 360 °], between the Sounnd source direction of voice data and the direction of smart machine
Angular interval is [0,25 °] and [340 °, 360 °], and the frame number numberical range of the image of lip motion feature is [2,10].
In an example it is assumed that facial angle is 15 °, between the Sounnd source direction of voice data and the direction of smart machine
Angle be 350 ° and the facial image of acquisition in the frame number of image comprising target object lip motion feature be 5, face
It include target object lip in angle and facial image between angle, the Sounnd source direction of voice data and the direction of smart machine
The frame number of the image of portion's motion characteristic is in corresponding numberical range or section, it is determined that target object has interaction meaning
Figure.
In another example, it is assumed that facial angle is 40 °, between the Sounnd source direction of voice data and the direction of smart machine
Angle be 350 ° and the facial image of acquisition in the frame number of image comprising target object lip motion feature be 5, voice
It is special comprising target object lip motion in angle and facial image between the Sounnd source direction of data and the direction of smart machine
The frame number of the image of sign is in corresponding numberical range, but facial angle exceeds corresponding numberical range, it is determined that target
Object does not have interaction and is intended to.
It should be noted that when each interaction of configuration is intended to judge the corresponding relationship of parameter and intention point, for different
Interaction is intended to judge that parameter can configure different numberical range and intention score value corresponding relationship, can also partly or entirely hand over
Mutually it is intended to judge the identical numberical range of parameter configuration and intention score value corresponding relationship.For example, determining interaction intention judgement
When the corresponding intention score value of parameter, the Sounnd source direction of facial angle and voice data is configured into identical numberical range and intention
Score value corresponding relationship.
In one example, it is assumed that by pair of facial angle and the numberical range of the Sounnd source direction of voice data and intention score value
Should be related to and be each configured to: numberical range [0,5 °] and [355 °, 360 °], corresponding intention score value are 0.9;Numberical range (5 °,
10 °] and [350 °, 355 °), corresponding intention score value is 0.8;Numberical range (10 °, 15 °] and [345 °, 350 °), corresponding meaning
It is 0.7 to score value;Numberical range (15 °, 20 °] and [340 °, 345 °), corresponding intention score value is 0.6.
Determining that facial angle is 18 °, angle of the Sounnd source direction and smart machine of voice data between is 16 °
When, then according to above-mentioned corresponding relationship, determine that facial angle intention score value corresponding with the Sounnd source direction of voice data is 0.6.
In a kind of possible embodiment, the embodiment of the present invention is judging parameter according to interaction intention, determines target pair
When the interaction of elephant is intended to, interaction can also be intended to judgement and be combined, and determined and combine corresponding intention score value.
For example, it is assumed that joining the absolute value of difference between facial angle and the Sounnd source direction of voice data as object judgement
Number, object judgement parameter values range are [0 °, 30 °].
Assuming that the corresponding relationship of the numberical range of object judgement parameter and intention score value are as follows: object judgement parameter be [0,
5 °], corresponding intention score value be 0.9, object judgement parameter be (5 °, 10 °], corresponding intention score value be 0.8, object judgement ginseng
Number for (10 °, 20 °], corresponding intention score value be 0.7, object judgement parameter be (20 °, 30 °], corresponding intention score value is
0.6。
Determining that facial angle is 18 °, when angle of the Sounnd source direction and smart machine of voice data between is 5 °,
It determines that object judgement parameter is 13 °, and then can determine that the corresponding intention score value of object judgement parameter is 0.7;Determining face
Angle is 18 °, when angle of the Sounnd source direction and smart machine of voice data between is 16 °, determines object judgement parameter
It is 2 °, and then can determines that the corresponding intention score value of object judgement parameter is 0.9.
When judging that parameter determines the corresponding intention score value of combination with specific reference to interaction intention, it can be intended to based on each interaction
Judge that intention score value is individually determined in parameter, multiple interactions can also be intended to judge that parameter is combined then determining combination and corresponds to
Intention score value, can be with flexible choice, it is not limited in the embodiment of the present invention.
Based on any of the above-described embodiment, in a kind of possible embodiment, in order to improve the intelligent journey of smart machine
Degree, after the angle between the direction for the Sounnd source direction and smart machine for determining voice data is in predetermined angle section,
Can also the intention further to target object determined by the semantics recognition result based on voice data judge, and in base
When the intention determined by the semantics recognition result of voice data belongs to pre-set business type, according to the collected people of smart machine
Face image judges that the interaction of target object belonging to facial image is intended to.
Under which, voice data Sounnd source direction and smart machine direction between angle and be based on voice
When being intended to be all satisfied respective preset condition determined by the semantics recognition result of data, just triggering is collected according to smart machine
Facial image, judge target object belonging to facial image interaction be intended to.If any one is unsatisfactory for, do not trigger according to intelligence
The energy collected facial image of equipment judges that the interaction of target object belonging to facial image is intended to.
When it is implemented, the semanteme to voice data identifies, the industry of voice data is determined according to semantics recognition result
When service type, natural language understanding (Natural Language Underatanding, abbreviation NLU) technology can be used, it is right
Voice data is identified, can also be identified using other methods to the semanteme of voice data, and according to semantics recognition knot
Fruit determines the type of service of voice data.
In a kind of possible embodiment, pre-set business type can be stronger with the business correlation of smart machine
Type of service, in this embodiment, the angle between the Sounnd source direction of voice data and the direction of smart machine is in preset angle
It is intended to stronger with the business correlation of smart machine in degree section and based on determined by the semantics recognition result of voice data
Type of service when, trigger according to the collected facial image of smart machine, judge the friendship of target object belonging to facial image
Mutually it is intended to;If the angle between the Sounnd source direction of voice data and the direction of smart machine determines not outside predetermined angle section
Voice data is responded;If being intended to the business with smart machine based on determined by the semantics recognition result of voice data
The weaker type of service of correlation, determination do not respond voice data, or control smart machine output is pre-set
Return information, pre-set return information can be business recommended information or inform that the target object business is in business processing
The information of range.
In another possible embodiment, pre-set business type can be weaker with the business correlation of smart machine
Type of service, in this embodiment, the angle between the Sounnd source direction of voice data and the direction of smart machine is in preset angle
It is intended to weaker with the business correlation of smart machine in degree section and based on determined by the semantics recognition result of voice data
Type of service when, trigger according to the collected facial image of smart machine, judge the friendship of target object belonging to facial image
Mutually it is intended to;If the angle between the Sounnd source direction of voice data and the direction of smart machine determines not outside predetermined angle section
Voice data is responded;If being intended to the business with smart machine based on determined by the semantics recognition result of voice data
It is stronger to determine that the interaction of target object is intended to for the stronger type of service of correlation, no longer needs to the interaction meaning for judging the target object
Figure, directly in response to the voice data.
It should be noted that can be carried out according to actual needs with the stronger type of service of smart machine business correlation
Setting, for example, can be question and answer type (for example, it may be " now several with the stronger type of service of smart machine business correlation
Select "), inquiry type (for example, " could provide the route on the road XX "), business consultation type is (for example, " the preferential set meal of today is
What "), briefing type (for example, " today, temperature was how many "), operational order type is being (for example, smart machine is leading field
Scape is in use, operational order type can be " removing meeting room with me ").
Weaker type of service, can be configured according to actual needs with smart machine business correlation.For example, and intelligence
The weaker type of service of energy appliance services correlation can be chat type (for example, " you have had a meal "), non-question and answer type (example
Such as, " I is XXX ") etc..
Embodiment two, in the present embodiment is belonged to pre- with intention determined by the semantics recognition result based on voice data
If type of service as preset condition, with judge to be intended to according to determined by voice data whether smart machine the scope of business
It is interior, to realize the only response to intention is interacted in voice data in the smart machine scope of business.
The present embodiment, using determined by the semantics recognition result based on voice data intention belong to pre-set business type as
Preset condition first carries out the judgement that preliminary interaction is intended to voice data, if determined by the semantics recognition result of voice data
Intention is when belonging to pre-set business type, show the business correlation of the intention based on determined by voice data and smart machine compared with
By force, then the judgement of intention is further interacted to voice data;If being intended to determined by the semantics recognition result of voice data
When belonging to non-default type of service, show that the intention based on determined by voice data is weaker with the business correlation of smart machine,
It then may filter that the voice data, without interacting the judgement of intention, reduce the consumption for calculating system power.
The embodiment of the invention also provides a kind of control methods of smart machine, as shown in figure 4, may include walking as follows
It is rapid:
Step 401 obtains the collected voice data of smart machine.
Step 402, the intention of target object is determined based on the semantics recognition result of voice data.
When it is implemented, the semanteme to voice data identifies, the meaning of target object is determined according to semantics recognition result
When figure, natural language understanding technology can be used, voice data is identified, it can also be using other methods to voice data
Semanteme identified, and the intention of target object is determined according to semantics recognition result.
Step 403, determine be intended to pre-set business type based on determined by the semantics recognition result of voice data when,
According to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to.
With specific reference to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to
Mode it is similar with the mode of above-described embodiment one, details are not described herein again.
Wherein, facial image is preferably the people for getting and acquiring in the voice data corresponding period in the embodiment of the present invention
Face image.
It should be noted that pre-set business type can be with the stronger type of service of smart machine business correlation, can
To be configured according to actual needs, for example, can be question and answer type with the stronger type of service of smart machine business correlation
(for example, it may be " several points now "), consulting type (for example, " route on the road XX could be provided "), business consultation type (example
Such as, " what the preferential set meal of today is "), briefing type (for example, " today, temperature was how many "), operational order type
(for example, smart machine is leading scene in use, operational order type can be " removing meeting room with me ").
Step 404, the judging result being intended to according to interaction, control smart machine execute corresponding operation.
Wherein, similar with the step 204 in embodiment one when step 404 is embodied, for details, reference can be made in embodiment one
Associated description, details are not described herein again.
In a kind of possible embodiment, however, it is determined that the service class based on determined by the semantics recognition result of voice data
Type is non-default type of service, and control smart machine exports pre-set return information, and pre-set return information can be with
It is the information that business recommended information or informing target object business are in business processing range.
Wherein, non-default type of service can be the type of service weaker with smart machine business correlation, can basis
Actual needs is configured.For example, the type of service weaker with smart machine business correlation can for chat type (for example,
" you have had a meal "), non-question and answer type (for example, " I is XXX ") etc..
Based on any of the above-described embodiment, in a kind of possible embodiment, in order to improve the intelligent journey of smart machine
Degree can also be into after determining that the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type
One step judges the angle between the Sounnd source direction for determining voice data direction corresponding with smart machine, and in determination
When angle between the Sounnd source direction of voice data direction corresponding with smart machine is in predetermined angle section, set according to intelligence
Standby collected facial image judges that the interaction of target object belonging to facial image is intended to;In the Sounnd source direction of voice data
When angle between direction corresponding with smart machine is outside predetermined angle section, determination is not responded voice data.
Under which, voice data Sounnd source direction and smart machine direction between angle and be based on voice
When being intended to be all satisfied respective preset condition determined by the semantics recognition result of data, just triggering is collected according to smart machine
Facial image, judge target object belonging to facial image interaction be intended to.If any one is unsatisfactory for, do not trigger according to intelligence
The energy collected facial image of equipment judges that the interaction of target object belonging to facial image is intended to.
Embodiment three, in the present embodiment is belonged to pre- with intention determined by the semantics recognition result based on voice data
If type of service as preset condition, with judge to be intended to according to determined by voice data whether with business strong correlation, thus
It realizes and directly the voice data of business strong correlation is responded, and to the voice data of business not strong correlation, further
Judge that the interaction of target object is intended to.
The present embodiment, using determined by the semantics recognition result based on voice data intention belong to pre-set business type as
Preset condition first carries out the judgement that preliminary interaction is intended to voice data, if determined by the semantics recognition result of voice data
Intention is when belonging to pre-set business type, and the intention for showing voice data and the business correlation of smart machine are stronger, then directly loud
Should voice data, without carry out it is subsequent interaction be intended to judgement, reduce to system calculate power consumption;If the language of voice data
When being intended to belong to non-default type of service determined by adopted recognition result, show the intention of voice data and the business of smart machine
Correlation is weaker, then the judgement that can be intended to further progress interaction, to determine whether to respond the voice data.
The embodiment of the invention also provides a kind of control methods of smart machine, as shown in figure 5, may include walking as follows
It is rapid:
Step 501 obtains the collected voice data of smart machine.
Step 502, the intention of target object is determined based on the semantics recognition result of voice data.
When it is implemented, the semanteme to voice data identifies, the meaning of target object is determined according to semantics recognition result
When figure, natural language understanding technology can be used, voice data is identified, it can also be using other methods to voice data
Semanteme identified, and the intention of target object is determined according to semantics recognition result.
Step 503, determine be intended to pre-set business type based on determined by the semantics recognition result of voice data when,
According to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to.
With specific reference to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to
Mode it is similar with the mode of above-described embodiment one, details are not described herein again.
Wherein, facial image is preferably the people for getting and acquiring in the voice data corresponding period in the embodiment of the present invention
Face image.
In the present embodiment, pre-set business type is the type of service weaker with smart machine business correlation, can basis
Actual needs is configured.For example, the type of service weaker with smart machine business correlation can for chat type (for example,
" you have had a meal "), non-question and answer type (for example, " I is XXX ") etc..
Step 504, the judging result being intended to according to interaction, control smart machine execute corresponding operation.
Wherein, similar with the step 204 in embodiment one when step 504 is embodied, for details, reference can be made in embodiment one
Associated description, details are not described herein again.
In a kind of possible embodiment, if it is determined that the semantics recognition result institute based on voice data is true in step 503
Fixed type of service is non-default type of service, and control smart machine is directly in response to the voice data, and belonging to voice data
Target object interacts.
Wherein, non-default type of service can be with the stronger type of service of smart machine business correlation, can basis
Actual needs is configured.For example, with the stronger type of service of smart machine business correlation can for question and answer type (for example,
Can be " now several points "), consulting type (for example, " could provide the route on the road XX "), business consultation type is (for example, " today
Preferential set meal what is "), briefing type (for example, " today, temperature was how many "), operational order type (for example, intelligence
Equipment is leading scene in use, operational order type can be " removing meeting room with me ").
Based on any of the above-described embodiment, in a kind of possible embodiment, in order to improve the intelligent journey of smart machine
Degree can also be into after determining that the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type
One step judges the angle between the Sounnd source direction for determining voice data direction corresponding with smart machine, and in determination
When angle between the Sounnd source direction of voice data direction corresponding with smart machine is in predetermined angle section, set according to intelligence
Standby collected facial image judges that the interaction of target object belonging to facial image is intended to;In the Sounnd source direction of voice data
When angle between direction corresponding with smart machine is outside predetermined angle section, determination is not responded voice data.
Under which, voice data Sounnd source direction and smart machine direction between angle and be based on voice
When being intended to be all satisfied respective preset condition determined by the semantics recognition result of data, just triggering is collected according to smart machine
Facial image, judge target object belonging to facial image interaction be intended to.If any one is unsatisfactory for, do not trigger according to intelligence
The energy collected facial image of equipment judges that the interaction of target object belonging to facial image is intended to.
Based on identical inventive concept, the embodiment of the invention also provides a kind of control devices of smart machine.
As shown in fig. 6, the control device of smart machine provided in an embodiment of the present invention, comprising:
Acquiring unit 601, the voice data for taking smart machine to acquire;
Processing unit 602, it is collected according to smart machine if the attribute information for voice data meets preset condition
Facial image judges that the interaction of target object belonging to facial image is intended to;
Control unit 603, the judging result for being intended to according to interaction, control smart machine execute corresponding operation.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, preset condition includes following
One or more: the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type, voice data
Angle between Sounnd source direction and the direction of smart machine is in predetermined angle section.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is specifically used
In:
If the brightness of face region is greater than predetermined luminance threshold value in facial image, according to target object in facial image
Facial angle and lip motion feature, determine target object interaction be intended to;Or
If the brightness of face region is less than or equal to predetermined luminance threshold value in facial image, according to mesh in facial image
The facial angle for marking object determines that the interaction of target object is intended to.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is also used
In:
According to the Sounnd source direction of voice data, determine that the interaction of target object is intended to.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is specifically used
In:
The range of parameter values of parameter and the corresponding relationship of intention score value are judged according to interaction intention, determine each of target object
Interaction is intended to judge the corresponding intention score value of the parameter value of parameter, and interaction is intended to judge that parameter includes facial angle, Sounnd source direction
At least one of with lip motion feature;
It is intended to judge the corresponding intention score value of parameter according to each interaction of target object, determines that characterization target object has and hand over
The confidence level for the probability being mutually intended to.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, control unit 603 is specifically used
In:
When determining that confidence level is less than or equal to the first preset threshold, determination is not responding to voice data;Or
Determine confidence level be greater than the first preset threshold, and confidence level be less than or equal to the second preset threshold when, control intelligence
Energy equipment is greater than the first preset threshold with the return information of written form output voice data, the second preset threshold;Or
When determining that confidence level is greater than the second preset threshold, control smart machine is defeated with voice broadcasting modes and text mode
The return information of voice data out.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is also used
In:
If the intention based on determined by the semantics recognition result of voice data belongs to non-default type of service, control intelligence is set
It is standby to export the corresponding return information of voice data.
In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is also used
In:
If the angle between the Sounnd source direction of voice data method corresponding with smart machine is outside predetermined angle section, really
It is fixed voice data not to be responded.
In addition, the control method and device in conjunction with the smart machine of Fig. 1-Fig. 6 embodiment of the present invention described can be by intelligence
Can the control equipment of equipment realize.Fig. 7 shows the hardware knot of the control equipment of smart machine provided in an embodiment of the present invention
Structure illustrate to.
The control equipment of smart machine may include processor 701 and the memory for being stored with computer program instructions
702。
Specifically, above-mentioned processor 701 may include central processing unit (CPU) or specific integrated circuit
(Application Specific Integrated Circuit, ASIC), or may be configured to implement implementation of the present invention
One or more integrated circuits of example.
Memory 702 may include the mass storage for data or instruction.For example it rather than limits, memory
702 may include hard disk drive (Hard Disk Drive, HDD), floppy disk drive, flash memory, CD, magneto-optic disk, tape or logical
With the combination of universal serial bus (Universal Serial Bus, USB) driver or two or more the above.It is closing
In the case where suitable, memory 702 may include the medium of removable or non-removable (or fixed).In a suitable case, it stores
Device 702 can be inside or outside data processing equipment.In a particular embodiment, memory 702 is nonvolatile solid state storage
Device.In a particular embodiment, memory 702 includes read-only memory (ROM).In a suitable case, which can be mask
ROM, programming ROM (PROM), erasable PROM (EPROM), the electric erasable PROM (EEPROM), electrically-alterable ROM of programming
(EAROM) or the combination of flash memory or two or more the above.
Processor 701 is by reading and executing the computer program instructions stored in memory 702, to realize above-mentioned implementation
The control method of any one smart machine in example.
In one example, the control equipment of smart machine may also include communication interface 703 and bus 710.Wherein, as schemed
Shown in 7, processor 701, memory 702, communication interface 703 connect by bus 710 and complete mutual communication.
Communication interface 703 is mainly used for realizing in the embodiment of the present invention between each module, device, unit and/or equipment
Communication.
Bus 710 includes hardware, software or both, and the component of the control equipment of smart machine is coupled to each other together.
For example it rather than limits, bus may include accelerated graphics port (AGP) or other graphics bus, enhancing Industry Standard Architecture
(EISA) bus, front side bus (FSB), super transmission (HT) interconnection, Industry Standard Architecture (ISA) bus, infinite bandwidth interconnect, are low
Number of pins (LPC) bus, memory bus, micro- channel architecture (MCA) bus, peripheral component interconnection (PCI) bus, PCI-
Express (PCI-X) bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association part (VLB) bus or
The combination of other suitable buses or two or more the above.In a suitable case, bus 710 may include one
Or multiple buses.Although specific bus has been described and illustrated in the embodiment of the present invention, the present invention considers any suitable bus
Or interconnection.
The voice data that the control equipment of smart machine can be acquired based on the smart machine of acquisition executes the present invention and implements
The control method of smart machine in example, to realize control method and device in conjunction with Fig. 1-Fig. 5 smart machine described.
In addition, the embodiment of the present invention can provide a kind of calculating in conjunction with the control method of the smart machine in above-described embodiment
Machine readable storage medium storing program for executing is realized.Computer program instructions are stored on the computer readable storage medium;The computer program
The control method of any one smart machine in above-described embodiment is realized in instruction when being executed by processor.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.)
Formula.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then also intention includes these modifications and variations the present invention.
Claims (10)
1. a kind of control method of smart machine characterized by comprising
Obtain the collected voice data of smart machine;
If the attribute information of the voice data meets preset condition, according to the collected facial image of the smart machine, sentence
The interactive of target object belonging to the facial image of breaking is intended to;
According to the judging result that the interaction is intended to, controls the smart machine and execute corresponding operation.
2. the method according to claim 1, wherein the preset condition includes one or more of: being based on
It is intended to belong to the Sounnd source direction of pre-set business type, the voice data determined by the semantics recognition result of the voice data
Angle between the direction of the smart machine is in predetermined angle section.
3. method according to claim 1 or 2, which is characterized in that described according to the collected face of the smart machine
Image judges that the interaction of target object belonging to the facial image is intended to, comprising:
If the brightness of face region is greater than predetermined luminance threshold value in the facial image, according to the facial image
The facial angle and lip motion feature of target object determine that the interaction of the target object is intended to;Or
If the brightness of face region is less than or equal to predetermined luminance threshold value in the facial image, according to the facial image
Described in target object facial angle, determine the target object interaction be intended to.
4. according to the method described in claim 3, it is characterized in that, described according to the collected face figure of the smart machine
Picture judges that the interaction of target object belonging to the facial image is intended to, further includes:
According to the Sounnd source direction of the voice data, determine that the interaction of the target object is intended to.
5. the method according to claim 1, wherein described according to the collected face figure of the smart machine
Picture judges that the interaction of target object belonging to the facial image is intended to, comprising:
The range of parameter values of parameter and the corresponding relationship of intention score value are judged according to interaction intention, determine each of the target object
Interaction is intended to judge the corresponding intention score value of the parameter value of parameter, and the interaction is intended to judge that parameter includes facial angle, described
At least one of the Sounnd source direction of voice data and lip motion feature;
It is intended to judge the corresponding intention score value of parameter according to each interaction of the target object, determines and characterize the target object tool
The confidence level for the probability for thering is interaction to be intended to.
6. according to the method described in claim 5, it is characterized in that, the judging result being intended to according to the interaction, control
The smart machine executes corresponding operation, comprising:
When determining that the confidence level is less than or equal to the first preset threshold, determination is not responding to the voice data;Or
Determine the confidence level be greater than first preset threshold, and the confidence level be less than or equal to the second preset threshold
When, the return information that the smart machine exports the voice data with written form is controlled, second preset threshold is greater than
First preset threshold;Or
When determining that the confidence level is greater than second preset threshold, the smart machine is controlled with voice broadcasting modes and text
Word mode exports the return information of the voice data.
7. according to the method described in claim 2, it is characterized in that, the method also includes:
If the intention based on determined by the semantics recognition result of the voice data belongs to non-default type of service, the intelligence is controlled
It can the corresponding return information of the equipment output voice data.
8. a kind of control device of smart machine characterized by comprising
Acquiring unit, for obtaining the voice data of smart machine acquisition;
Processing unit collects if the attribute information for the voice data meets preset condition according to the smart machine
Facial image, judge target object belonging to the facial image interaction be intended to;
Control unit, the judging result for being intended to according to the interaction, controls the smart machine and executes corresponding operation.
9. a kind of control equipment of smart machine characterized by comprising at least one processor, at least one processor with
And the computer program instructions of storage in the memory, it is real when the computer program instructions are executed by the processor
Now such as the control method of smart machine of any of claims 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that when the calculating
The control method such as smart machine of any of claims 1-7 is realized when machine program instruction is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910470773.9A CN110187766A (en) | 2019-05-31 | 2019-05-31 | A kind of control method of smart machine, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910470773.9A CN110187766A (en) | 2019-05-31 | 2019-05-31 | A kind of control method of smart machine, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110187766A true CN110187766A (en) | 2019-08-30 |
Family
ID=67719448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910470773.9A Pending CN110187766A (en) | 2019-05-31 | 2019-05-31 | A kind of control method of smart machine, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110187766A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110730115A (en) * | 2019-09-11 | 2020-01-24 | 北京小米移动软件有限公司 | Voice control method and device, terminal and storage medium |
CN112071326A (en) * | 2020-09-07 | 2020-12-11 | 三星电子(中国)研发中心 | Sound effect processing method and device |
CN112489639A (en) * | 2020-11-26 | 2021-03-12 | 北京百度网讯科技有限公司 | Audio signal processing method, device, system, electronic equipment and readable medium |
CN112634872A (en) * | 2020-12-21 | 2021-04-09 | 北京声智科技有限公司 | Voice equipment awakening method and device |
CN112650489A (en) * | 2020-12-31 | 2021-04-13 | 北京猎户星空科技有限公司 | Service control method, device, computer equipment and storage medium |
CN114489326A (en) * | 2021-12-30 | 2022-05-13 | 南京七奇智能科技有限公司 | Crowd-oriented gesture control device and method driven by virtual human interaction attention |
CN116753800A (en) * | 2023-08-18 | 2023-09-15 | 青岛亨通建设有限公司 | Assembled building construction measuring device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159111A (en) * | 2015-08-24 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Artificial intelligence-based control method and control system for intelligent interaction equipment |
CN108733420A (en) * | 2018-03-21 | 2018-11-02 | 北京猎户星空科技有限公司 | Awakening method, device, smart machine and the storage medium of smart machine |
CN109508687A (en) * | 2018-11-26 | 2019-03-22 | 北京猎户星空科技有限公司 | Man-machine interaction control method, device, storage medium and smart machine |
-
2019
- 2019-05-31 CN CN201910470773.9A patent/CN110187766A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159111A (en) * | 2015-08-24 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Artificial intelligence-based control method and control system for intelligent interaction equipment |
CN108733420A (en) * | 2018-03-21 | 2018-11-02 | 北京猎户星空科技有限公司 | Awakening method, device, smart machine and the storage medium of smart machine |
CN109508687A (en) * | 2018-11-26 | 2019-03-22 | 北京猎户星空科技有限公司 | Man-machine interaction control method, device, storage medium and smart machine |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110730115A (en) * | 2019-09-11 | 2020-01-24 | 北京小米移动软件有限公司 | Voice control method and device, terminal and storage medium |
CN112071326A (en) * | 2020-09-07 | 2020-12-11 | 三星电子(中国)研发中心 | Sound effect processing method and device |
CN112489639A (en) * | 2020-11-26 | 2021-03-12 | 北京百度网讯科技有限公司 | Audio signal processing method, device, system, electronic equipment and readable medium |
CN112634872A (en) * | 2020-12-21 | 2021-04-09 | 北京声智科技有限公司 | Voice equipment awakening method and device |
CN112650489A (en) * | 2020-12-31 | 2021-04-13 | 北京猎户星空科技有限公司 | Service control method, device, computer equipment and storage medium |
CN114489326A (en) * | 2021-12-30 | 2022-05-13 | 南京七奇智能科技有限公司 | Crowd-oriented gesture control device and method driven by virtual human interaction attention |
CN114489326B (en) * | 2021-12-30 | 2023-12-15 | 南京七奇智能科技有限公司 | Crowd-oriented virtual human interaction attention driven gesture control device and method |
CN116753800A (en) * | 2023-08-18 | 2023-09-15 | 青岛亨通建设有限公司 | Assembled building construction measuring device |
CN116753800B (en) * | 2023-08-18 | 2023-11-24 | 青岛亨通建设有限公司 | Assembled building construction measuring device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110187766A (en) | A kind of control method of smart machine, device, equipment and medium | |
EP3210164B1 (en) | Facial skin mask generation | |
CN108363706A (en) | The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue | |
CN106295567A (en) | The localization method of a kind of key point and terminal | |
CN110111418A (en) | Create the method, apparatus and electronic equipment of facial model | |
KR20120138627A (en) | A face tracking method and device | |
CN108431728A (en) | Information processing equipment, information processing method and program | |
CN109508687A (en) | Man-machine interaction control method, device, storage medium and smart machine | |
CN109325456A (en) | Target identification method, device, target identification equipment and storage medium | |
KR102291039B1 (en) | personalized sports service providing method and apparatus thereof | |
EP4390728A1 (en) | Model training method and apparatus, device, medium and program product | |
KR20200085696A (en) | Method of processing video for determining emotion of a person | |
US20230403504A1 (en) | Sound recording method and related device | |
CN109358751A (en) | A kind of wake-up control method of robot, device and equipment | |
WO2019171780A1 (en) | Individual identification device and characteristic collection device | |
CN112286364A (en) | Man-machine interaction method and device | |
CN114779922A (en) | Control method for teaching apparatus, control apparatus, teaching system, and storage medium | |
CN111597910A (en) | Face recognition method, face recognition device, terminal equipment and medium | |
CN112949418A (en) | Method and device for determining speaking object, electronic equipment and storage medium | |
CN112632349A (en) | Exhibition area indicating method and device, electronic equipment and storage medium | |
US11819996B2 (en) | Expression feedback method and smart robot | |
JP2019012506A (en) | Method and system for automatic activation of machine | |
CN105988580A (en) | Screen control method and device of mobile terminal | |
CN116958584B (en) | Key point detection method, regression model training method and device and electronic equipment | |
CN109153332A (en) | The sign language of vehicle user interface inputs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |