CN110187766A

CN110187766A - A kind of control method of smart machine, device, equipment and medium

Info

Publication number: CN110187766A
Application number: CN201910470773.9A
Authority: CN
Inventors: 朱晚贺; 张彦刚
Original assignee: Beijing Orion Star Technology Co Ltd
Current assignee: Beijing Orion Star Technology Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2019-08-30

Abstract

The invention discloses a kind of control method of smart machine, device, equipment and media, to improve the intelligence degree of smart machine.The control method of the smart machine, comprising: obtain the collected voice data of smart machine；If the attribute information of the voice data meets preset condition, according to the collected facial image of the smart machine, judge that the interaction of target object belonging to the facial image is intended to；According to the judging result that the interaction is intended to, controls the smart machine and execute corresponding operation.

Description

A kind of control method of smart machine, device, equipment and medium

Technical field

The present invention relates to field of artificial intelligence more particularly to a kind of control method of smart machine, device, equipment and Medium.

Background technique

With the continuous development of artificial intelligence, the robot industry for relying on artificial intelligence has also obtained biggish development, Consequent, service robot are also formally landed and are come into operation in different field.

Currently, receiving the voice data of user when service robot and user interact, just being handed over user Mutually.Under such mode, the voice data that service robot obtains may be user by handing over when service robot with other users The voice data issued is talked, user not interacts with service robot, but service robot can still reply user, intelligent Degree is lower.

Summary of the invention

The embodiment of the present invention provides control method, device, equipment and the medium of a kind of smart machine, sets to improve intelligence Standby intelligence degree.

In a first aspect, the embodiment of the invention provides a kind of control methods of smart machine, comprising:

Obtain the collected voice data of smart machine；

If the attribute information of voice data meets preset condition, according to the collected facial image of smart machine, people is judged The interaction of target object belonging to face image is intended to；

According to the judging result that interaction is intended to, control smart machine executes corresponding operation.

The control method of smart machine provided in an embodiment of the present invention, first pass through judge voice data attribute information whether Meet preset condition, the interaction of user be intended to tentatively be judged, may filter that the voice input being intended to without judging interaction, Reduce the consumption that power is calculated system.After the attribute information of voice data meets preset condition, according further to face figure As judging that the interaction of target object is intended to, the accuracy that interaction is intended to judgement is improved, according to the judging result that interaction is intended to, control Smart machine processed execute corresponding operation (for example, for interaction be intended to it is stronger reply, for interaction be intended to it is weaker not Replied), rather than replied as the prior art is directed to each voice data got, improve smart machine Intelligence degree, improve the interactive experience of user.

In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, preset condition includes following One or more: the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type, voice data Angle between Sounnd source direction and the direction of smart machine is in predetermined angle section.

The control method of smart machine provided in an embodiment of the present invention, by the semantics recognition result institute for judging voice data Whether determining intention belongs to pre-set business type, may filter that the intention voice weaker with the business correlation of smart machine is defeated Enter or without judging that interaction is intended to input directly in response to intention and the stronger voice of business correlation of smart machine, reduces The consumption of power is calculated system.

By judging the angle between the Sounnd source direction of voice data and the direction of smart machine whether in predetermined angle area Between, it may filter that the voice input not in predetermined angle section, reduce the consumption for calculating system power, while avoiding other nothings The interference of the voice data for the object that interaction is intended to or interaction intention is weaker.

In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, acquired according to smart machine The facial image arrived judges that the interaction of target object belonging to facial image is intended to, comprising:

If the brightness of face region is greater than predetermined luminance threshold value in facial image, according to target object in facial image Facial angle and lip motion feature, determine target object interaction be intended to；Or

If the brightness of face region is less than or equal to predetermined luminance threshold value in facial image, according to mesh in facial image The facial angle for marking object determines that the interaction of target object is intended to.

Since light will affect the judgement to the lip motion feature of target object in facial image, determining mesh When marking the interaction intention of object, if the brightness of face region is greater than predetermined luminance threshold value in facial image, in conjunction with face The facial angle of target object and lip motion feature determine that the interaction of target object is intended in image, if face in facial image The brightness of region is less than or equal to predetermined luminance threshold value, then determines mesh according to the facial angle of target object in facial image The interaction for marking object is intended to, and so as to avoid the influence detected to lip motion feature, what guarantee target object interaction was intended to sentences Determine the accuracy of result.

In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, acquired according to smart machine The facial image arrived judges that the interaction of target object belonging to facial image is intended to, further includes:

According to the Sounnd source direction of voice data, determine that the interaction of target object is intended to.

The Sounnd source direction of voice data may be used as determining whether the voice data is around smart machine in setting regions The voice data that is issued of target object, may determine that the interaction of target object is intended to based on this, for example, smart machine with When target object in setting regions interacts, it can determine that the voice data that Sounnd source direction is in setting regions has friendship Mutually it is intended to or interaction intention is stronger, and the voice data that Sounnd source direction is in other regions (except setting regions) does not have Interaction is intended to or interaction intention is weaker, therefore, when the interaction for determining target object is intended to, further according to voice data Sounnd source direction determines that the interaction of target object is intended to, can be improved the accuracy for the judgement result that target object interaction is intended to.

The range of parameter values of parameter and the corresponding relationship of intention score value are judged according to interaction intention, determine each of target object Interaction is intended to judge the corresponding intention score value of the parameter value of parameter, and interaction is intended to judge that parameter includes facial angle, voice data Sounnd source direction and at least one of lip motion feature；

It is intended to judge the corresponding intention score value of parameter according to each interaction of target object, determines that characterization target object has and hand over The confidence level for the probability being mutually intended to.

By calculating the confidence level for the probability that there is target object interaction to be intended to, it can determine that target object interaction is intended to Power, and then according to target object interaction be intended to power carry out different responses, improve the intelligent journey of smart machine Degree.

In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, sentenced according to what interaction was intended to Break as a result, control smart machine executes corresponding operation, comprising:

When determining that confidence level is less than or equal to the first preset threshold, determination is not responding to voice data；Or

Determine confidence level be greater than the first preset threshold, and confidence level be less than or equal to the second preset threshold when, control intelligence Energy equipment is greater than the first preset threshold with the return information of written form output voice data, the second preset threshold；Or

When determining that confidence level is greater than the second preset threshold, control smart machine is defeated with voice broadcasting modes and text mode The return information of voice data out.

Due to the probabilistic confidence that the interaction according to the target object determined is intended to, smart machine and target object are determined Interactive mode, further improve the intelligence degree of smart machine, and can be avoided smart machine and beat target object It disturbs.

In a kind of possible embodiment, this method further include:

If the intention based on determined by the semantics recognition result of voice data belongs to non-default type of service, control intelligence is set It is standby to export the corresponding return information of voice data.

In a kind of possible embodiment, in the above method provided in an embodiment of the present invention, if the sound source of voice data Outside predetermined angle section, determination does not respond voice data angle between direction and the direction of smart machine.

Second aspect, the embodiment of the invention provides a kind of control devices of smart machine, comprising:

Acquiring unit, for obtaining the voice data of smart machine acquisition；

Processing unit, if the attribute information for voice data meets preset condition, according to the collected people of smart machine Face image judges that the interaction of target object belonging to facial image is intended to；

Control unit, the judging result for being intended to according to interaction, control smart machine execute corresponding operation.

The third aspect, the embodiment of the invention also provides a kind of control equipment of smart machine, comprising: at least one processing Device, at least one processor and computer program instructions stored in memory, when computer program instructions are by processor The control method for the smart machine that first aspect of the embodiment of the present invention provides is realized when execution.

Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, are stored thereon with computer program The control for the smart machine that first aspect of the embodiment of the present invention provides is realized in instruction when computer program instructions are executed by processor Method processed.

5th aspect, the embodiment of the invention also provides a kind of computer program products comprising program code works as program When product is run on a computing device, program code is provided for making computer equipment execute first aspect of the embodiment of the present invention Smart machine control method.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, is implemented with the present invention Example is used to explain the present invention together, is not construed as limiting the invention.In the accompanying drawings:

Fig. 1 is a kind of schematic flow diagram of the control method of smart machine provided in an embodiment of the present invention；

Fig. 2 is the schematic flow diagram of the control for the smart machine that the embodiment of the present invention one provides；

Fig. 3 is that the principle of angle between determining Sounnd source direction provided in an embodiment of the present invention and the direction of smart machine is illustrated Figure；

Fig. 4 is the schematic flow diagram of the control of smart machine provided by Embodiment 2 of the present invention；

Fig. 5 is the schematic flow diagram of the control for the smart machine that the embodiment of the present invention three provides；

Fig. 6 is the structural schematic diagram of the control device of smart machine provided in an embodiment of the present invention；

Fig. 7 is the structural schematic diagram of the control equipment of smart machine provided in an embodiment of the present invention.

Specific embodiment

Embodiments herein is illustrated below in conjunction with attached drawing, it should be understood that embodiment described herein is only used In description and interpretation the application, it is not used to limit the application.

Below with reference to illustrating attached drawing, to the control method of smart machine provided in an embodiment of the present invention, device, equipment and The specific embodiment of medium is illustrated.

It should be noted that the control method of smart machine provided in an embodiment of the present invention can be by the control of smart machine Device executes, and can also be executed by the external equipment (for example, server etc.) with smart device communication.

The embodiment of the invention provides a kind of control methods of smart machine, as shown in Figure 1, may include steps of:

Step 101 obtains the collected voice data of smart machine.

Wherein, the smart machine provided in present invention implementation can be intelligent robot, be also possible to intelligent sound box, hand The other intelligent terminals such as machine, tablet computer, it is not limited in the embodiment of the present invention.

Specifically when obtaining the voice data of smart machine acquisition, if the control of smart machine provided in an embodiment of the present invention Scheme is executed by the controller of smart machine or control centre, then can directly be adopted by the audio collecting device in smart machine Collect the voice data of ambient enviroment；If the control program of smart machine provided in an embodiment of the present invention is by smart device communication External equipment (such as server) executes, then the ambient enviroment of the available smart machine sound intermediate frequency acquisition equipment acquisition of external equipment Voice data.

Wherein, audio collecting device can include but is not limited to the microphone or microphone array configured in smart machine Deng.

Specifically, it when the voice data of microphone or microphone array acquisition ambient enviroment, can acquire in real time, it can also With periodical acquisition, functional switch also can be set, be just acquired after functional switch unlatching, etc., the embodiment of the present invention It does not limit this.

If the attribute information of step 102, voice data meets preset condition, according to the collected face figure of smart machine Picture judges that the interaction of target object belonging to facial image is intended to.

Wherein, preset condition includes one or more of: being anticipated based on determined by the semantics recognition result of voice data Figure belongs to the angle between the direction of pre-set business type, the Sounnd source direction of voice data and smart machine in predetermined angle section It is interior.

It should be noted that facial image is preferably got in the voice data corresponding period in the embodiment of the present invention The facial image of acquisition.

Step 103, the judging result being intended to according to interaction, control smart machine execute corresponding operation.

Three can be divided into according to preset condition difference in the control program of smart machine provided in an embodiment of the present invention Embodiment combines three embodiments to be illustrated the control program of smart machine provided in an embodiment of the present invention separately below.

In embodiment one, the present embodiment, existed with the angle between the Sounnd source direction of voice data and the direction of smart machine Preset condition is used as in predetermined angle section, with judge voice data whether pair in front of the smart machine in setting regions As (object and other surrounding objects that are such as currently interacting), to realize only to setting area in front of smart machine The voice data of object in domain is responded, and the interference of voice data in other regions is avoided, and can also be kept away to a certain extent Exempt from the interference of the voice data for the object that other no interactions are intended to or interaction intention is weaker.

In the present embodiment, with the angle between the Sounnd source direction of voice data and the direction of smart machine in predetermined angle area It is interior to be used as preset condition, the judgement that preliminary interaction is intended to first is carried out to voice data, if the Sounnd source direction and intelligence of voice data When the angle between of energy equipment is in predetermined angle section, show that voice data sets area in front of smart machine Object in domain then further interacts the judgement of intention to voice data；If the Sounnd source direction of voice data is set with intelligence When the standby angle between is outside predetermined angle section, shows that voice data is not from front of smart machine and set area Object in domain then may filter that the voice data, without interacting the judgement of intention, reduces and disappears to system calculation power Consumption.

The embodiment of the invention provides a kind of control methods of smart machine, as shown in Fig. 2, may include steps of:

Step 201 obtains the collected voice data of smart machine.

Angle between step 202, the Sounnd source direction for determining voice data and the direction of smart machine.

Wherein, the angle between the Sounnd source direction of voice data and the direction of smart machine, can be with the court of smart machine To being origin for benchmark line, smart machine, angulation is rotated clockwise with the reference line it is positive and calculated.Such as Fig. 3 It is shown, with the direction 30 of smart machine for benchmark line, angulation is rotated clockwise with the reference line and is positive, calculates voice Angle between the Sounnd source directions of data and the direction of smart machine, the then Sounnd source direction 31 of the voice data of user A and intelligence Angle between the direction of equipment is angle a, for example, 15 °, the Sounnd source direction 32 of the voice data of user B and smart machine Towards between angle be angle b, for example, 335 °.

Certainly, in other embodiments of the present invention, the angle between the Sounnd source direction of voice data and the direction of smart machine Degree, can also using smart machine be oriented reference line, smart machine as origin, with the reference line be rotated clockwise institute at Angle, which is positive, rotates counterclockwise angulation with the reference line is negative and is calculated.Still by taking Fig. 3 as an example, smart machine It is oriented direction 30, angulation is rotated clockwise with the reference line and is positive for benchmark line towards 30 with smart machine, with The reference line rotates counterclockwise angulation and is negative, and calculates between the Sounnd source direction of voice data and the direction of smart machine Angle, then the angle between the direction of the Sounnd source direction 31 of the voice data of user A and smart machine is angle a, for example, 15 °, the angle between the Sounnd source direction 32 of the voice data of user B and the direction of smart machine is angle c, for example, -25 °.

In a kind of possible embodiment, the angle between the Sounnd source direction of voice data and the direction of smart machine is determined When spending, human body binaural effect can be simulated determine voice data Sounnd source direction and smart machine direction between angle, Also other aiding sensors be can use and determine the angle of Sounnd source direction and smart machine between, the embodiment of the present invention is to this Without limitation.

Step 203, the angle between the direction for the Sounnd source direction and smart machine for determining voice data are in predetermined angle When in section, according to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to.

Wherein, predetermined angle section can be configured according to the actual situation, such as: in order to guarantee that interaction is intended to result Accuracy can set predetermined angle section to [0,20 °] and [345 °, 360 °] or [- 15 °, 20 °], to improve intelligence The interaction probability of equipment and user, predetermined angle section can also be arranged it is slightly bigger, for example, predetermined angle section is also It can be set to [0,25 °] and [335 °, 360 °] or [- 25 °, 25 °], naturally it is also possible to be set as other numerical value, the present invention Embodiment does not limit this.

In a kind of possible embodiment, between the direction for the Sounnd source direction and smart machine for determining voice data Outside predetermined angle section, determination does not respond voice data angle.

With specific reference to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to When, the range of parameter values of parameter and the corresponding relationship of intention score value are first judged according to interaction intention, determine each friendship of target object Mutually it is intended to judge the corresponding intention score value of the parameter value of parameter, interaction is intended to judge that parameter includes facial angle, voice data Then at least one of Sounnd source direction and lip motion feature are intended to judge that parameter is corresponding according to each interaction of target object Intention score value determines the confidence level for the probability that there is characterization target object interaction to be intended to when it is implemented, sentencing according to interaction intention The range of parameter values of disconnected parameter and the corresponding relationship of intention score value determine that each interaction of target object is intended to judge the parameter of parameter It is worth corresponding intention score value, interaction is intended to judge that parameter includes facial angle, and the Sounnd source direction and lip motion of voice data are special At least one of sign.When it is implemented, each interaction according to target object is intended to judge the corresponding intention score value of parameter, determine When characterizing the confidence level for the probability that there is target object interaction to be intended to, it can be intended to judge parameter according to each interaction of target object The corresponding intention score value of parameter value mean value, determine characterization target object have interaction be intended to probability confidence level.

In other embodiments of the present invention, other than it can determine confidence level by the way of mean value, it can also be used He determines for example, each interaction according to target object is intended to judge the corresponding intention score value of parameter and characterizes target object tool mode When the confidence level for the probability for thering is interaction to be intended to, it can also be intended to judge the weight coefficient of parameter according to pre-set each interaction, Each interaction is intended to judge that the corresponding intention score value of parameter is weighted, and determines characterization target according to weighing computation results The confidence level for the probability that there is object interaction to be intended to.

The mode of determining confidence level is not defined in the embodiment of the present invention, as long as mesh can be determined based on intention score value The interaction for marking object is intended to strong and weak mode and is applicable in.

Step 204, the judging result being intended to according to interaction, control smart machine execute corresponding operation.

When it is implemented, control smart machine, which executes corresponding operation, includes the following three types mode:

Mode one, determine confidence level be less than or equal to the first preset threshold when, determination be not responding to voice data.

Specifically, confidence level is less than or equal to the first preset threshold, shows that target object interaction is intended to weaker or does not have Have interactive intention, therefore, in order to improve the intelligence degree of smart machine, voice data can be not responding to, namely not with voice Data said target object interacts.

Wherein, the first preset threshold can be configured according to the actual situation, such as: the first preset threshold can be arranged 0.7, for the interaction probability for improving smart machine and user, the first preset threshold can also be pre-seted to slightly a little bit smaller, example Such as, the first preset threshold may be arranged as 0.6, naturally it is also possible to be set as other numerical value, the embodiment of the present invention does not do this It limits.

Mode two, determine confidence level be greater than the first preset threshold, and confidence level be less than or equal to the second preset threshold when, Smart machine is controlled with the return information of written form output voice data, the second preset threshold is greater than the first preset threshold.

Specifically, confidence level is greater than the first preset threshold, and confidence level is less than or equal to the second preset threshold, shows mesh Marking object, there may be interactions to be intended to, but interaction intention is not strong, therefore, in order to guarantee the interactive experience of user, and reduces Smart machine bothers target object, can control the return information that smart machine exports voice data with written form, with Target object interacts.

It should be noted that the second preset threshold can be configured according to the actual situation, and such as: second can be preset Threshold value is set as 0.8, in order to improve the intelligence degree of smart machine, the second preset threshold can also be arranged slightly big by one Point, for example, the second preset threshold may be arranged as 0.9, naturally it is also possible to be set as other numerical value, the embodiment of the present invention is to this Without limitation.

Mode three, determine confidence level be greater than the second preset threshold when, control smart machine with voice broadcasting modes and text Word mode exports the return information of voice data.

Specifically, confidence level is greater than the second preset threshold, and showing target object, there are stronger interactions to be intended to, therefore, In order to guarantee the interactive process with target object, smart machine can control with voice broadcasting modes and text mode output voice The return information of data, interacts with target object.

Combine specific embodiment that the control method of smart machine provided in this embodiment is illustrated above, under Face is in conjunction with specific embodiments to the specific mistake for judging that the interaction of target object belonging to facial image is intended in the embodiment of the present invention Journey is described in detail.

When specifically executing step 203, since recognition result of the light to facial image has larger impact, in light condition In the case where comparing badly, facial image recognition result may be especially inaccuracy to the identification of lip motion feature , if still considering at this time, lip motion feature interacts the judgement of intention, and interaction is intended to judging result and is likely to occur deviation. In the case where detecting that light is in relatively severe, adjustable interaction is intended to judge parameter, improves the accurate of judging result Property and robustness to light.

Specifically, can determine judgement according to the brightness of face region in the collected facial image of smart machine The interaction of target object is intended to used parameter (i.e. interaction is intended to judge parameter).It is specific as follows:

In a kind of possible embodiment, if the brightness of face region is greater than predetermined luminance threshold in facial image Value determines that the interaction of target object is intended to according to the facial angle of target object in facial image and lip motion feature.

In alternatively possible embodiment, if the brightness of face region is less than or equal to default in facial image Luminance threshold, in order to avoid the influence because of light problem to lip motion feature recognition result, so that interaction be caused to be intended to judgement As a result there is the problem of deviation, according to the facial angle of target object in facial image, can determine the interaction meaning of target object Figure.

Further, when obtaining the voice data of smart machine acquisition, multiple voice data, multiple voice numbers are got According to Sounnd source direction may be different, if smart machine is interacted with the object in its front specific region, intelligence is set Standby to can be determined that there is voice data of the Sounnd source direction in specific region interaction to be intended to, the voice data of other Sounnd source directions is not It is intended to interaction.Based on this, smart machine in the facial angle and lip motion feature according to target object in facial image, When determining that the interaction of target object is intended to, the Sounnd source direction that can be combined with voice data determines that the interaction of target object is intended to. Similarly, smart machine is in the facial angle according to target object in facial image, when determining that the interaction of target object is intended to, It can determine that the interaction of target object is intended in conjunction with the Sounnd source direction of voice data.

Wherein, the facial angle mentioned in the embodiment of the present invention is the face's direction and intelligence of target object in facial image Equipment direction between angle value, equally can using smart machine be oriented reference line, smart machine as origin, with this Reference line is rotated clockwise angulation and is positive, and is calculated, and for details, reference can be made to the directions of Sounnd source direction and smart machine Between angle calculation, details are not described herein again.

In the embodiment of the present invention, predetermined luminance threshold value can be configured according to the actual situation, for example, can will preset bright Degree threshold value is set as 70 candelas, may be arranged as 80 candelas, naturally it is also possible to be set as other numerical value, specific value can It is configured with experimental actual conditions, it is not limited in the embodiment of the present invention.

When it is implemented, can be arrived based on continuous acquisition when determining the lip motion feature of target object in facial image N (N is positive integer) frame facial image in some or all of the lip feature of target object in facial image judge mesh Whether mark object has lip motion feature, can also be based on some or all of collecting in facial image people in preset duration The lip feature of target object in face image judges.

For example, if continuous acquisition to 5 frame facial images in target object lip feature difference when, it is determined that There are lip motion features for target object in facial image；For another example, if continuous acquisition to 5 frame facial images in have 3 frame faces When the lip feature difference of the target object in image, it is determined that there are lip motion features for target object in facial image；Again Such as, if (such as 20ms) collects the lip feature difference of the target object in facial image in preset duration, face figure is determined There are lip motion features for target object as in.

Based on any of the above-described embodiment, in a kind of possible embodiment, according to the collected face figure of smart machine Picture, judge target object belonging to facial image interaction be intended to when, according to interaction intention judge parameter range of parameter values and The corresponding relationship of intention score value determines that each interaction of target object is intended to judge the corresponding intention score value of the parameter value of parameter, hands over Mutually it is intended to judge that parameter includes facial angle, at least one of the Sounnd source direction of voice data and lip motion feature.In turn It is strong and weak can to judge that the interaction of target object is intended to based on obtained intention score value.

It should be noted that interaction is intended to judge that the numberical range of parameter can be configured according to the actual situation, this hair Bright embodiment does not limit this.

Wherein, intention score value can be with the form characterization of fractional value (intention score value full marks is 1.0), can also be with nature The form characterization of number (intention score value full marks are 100), it is not limited in the embodiment of the present invention.

It should be noted that judging the range of parameter values of parameter and the corresponding relationship of intention score value according to interaction intention, When determining that each interaction of target object is intended to judge the corresponding intention score value of the parameter value of parameter, following situation is specifically included:

Situation one, interaction are intended to judge that parameter includes lip motion feature, and the parameter value of parameter is judged according to interaction intention The corresponding relationship of range and intention score value determines the corresponding intention score value of the lip motion feature of target object.

In a kind of possible embodiment, it can determine that there are lip motion features in the facial image of acquisition first Number of image frames determines then according to there are the corresponding relationships of the number of image frames numberical range of lip motion feature and intention score value The corresponding intention score value of the lip motion feature of target object.

For example, it is assumed that the number of image frames numberical range of lip motion feature is [0,10] in facial image, wherein face figure The corresponding relationship of the number of image frames numberical range of lip motion feature and intention score value as in are as follows: the picture frame of lip motion feature Number numerical value be [6,10] corresponding intention score value be 0.9, [4,6) corresponding intention score value is 0.8, [2,4) corresponding intention divides Value is 0.7, [0,2) corresponding intention score value is 0.6.

Determine acquisition facial image in there are the number of image frames of lip feature be 5 when, determine that lip feature is corresponding Intention score value is 0.8；Determine acquisition facial image in there are the number of image frames of lip feature be 8 when, determine lip feature Corresponding intention score value is 0.9.

In another possible embodiment, the time is calculated in order to save, reduces calculation amount, quickly determines that lip motion is special Levy corresponding intention score value, can only in facial image there are lip motion feature and there is no lip motion features to set respectively Set corresponding intention score value.For example, target object is there are when lip motion feature in determining facial image, by lip motion spy It levies corresponding intention score value and is determined as the first default intention score value, lip motion is not present in target object in determining facial image When feature, the corresponding intention score value of lip motion feature is determined as the second default intention score value.

Wherein, the first default intention point can be set to 0.7, naturally it is also possible to be set as other numerical value, the second default meaning It can be set to 0 to point, may be set to be 0.1, it is not limited in the embodiment of the present invention.

Situation two judges parameter according to interaction intention if interaction is intended to judge that parameter includes the Sounnd source direction of voice data Range of parameter values and intention score value corresponding relationship, when determining the corresponding intention score value of the Sounnd source direction of voice data, first Can determine voice data Sounnd source direction and smart machine direction between angle, according to the Sounnd source direction of voice data with The corresponding relationship of angular values range and intention score value between the direction of smart machine, determine the Sounnd source direction of voice data with The corresponding intention score value of angle between the direction of smart machine, and the intention score value is determined as to the Sounnd source direction of voice data Corresponding intention score value.

In an example it is assumed that the numerical value model of the angle between the Sounnd source direction of voice data and the direction of smart machine It encloses for [0 °, 20 °] and [340 °, 360 °], wherein the angle between the Sounnd source direction of voice data and the direction of smart machine Numberical range and intention score value corresponding relationship are as follows: the angle between the Sounnd source direction of voice data and the direction of smart machine It is 0.9 for [0,5 °] and [355 °, 360 °] corresponding intention score value, (5 °, 10 °] and [350 °, 355 °) corresponding intention score value It is 0.8, (10 °, 15 °] and [345 °, 350 °) corresponding intention score value is 0.7, (15 °, 20 °] and [340 °, 345 °) it is corresponding Intention score value is 0.6.

When angle between the direction for the Sounnd source direction and smart machine for determining voice data is 18 °, voice number is determined According to the corresponding intention score value of Sounnd source direction be 0.6；Between the direction for the Sounnd source direction and smart machine for determining voice data Angle when being 2 °, determine that the corresponding intention of the Sounnd source direction of voice data is divided into 0.9；In the Sounnd source direction for determining voice data When angle between the direction of smart machine is 340 °, determine that the corresponding intention score value of the Sounnd source direction of voice data is 0.6.

Situation three, if interaction is intended to judge that parameter includes facial angle, according to facial angle numberical range and intention score value Corresponding relationship, determine the corresponding intention score value of facial angle.

In an example it is assumed that facial angle numberical range is [0 °, 20 °] and [340 °, 360 °], wherein face The corresponding relationship of angular values range and intention score value are as follows: facial angle is [0,5 °] and [355 °, 360 °] corresponding intention point Value is 0.9, and (5 °, 10 °] and [350 °, 355 °) corresponding intention score value is 0.8, (10 °, 15 °] and [345 °, 350 °) it is corresponding Intention score value is 0.7, and (15 °, 20 °] and [340 °, 345 °) corresponding intention score value is 0.6.

When determining facial angle is 18 °, determine that the corresponding intention score value of facial angle is 0.6；Determining facial angle When being 2 °, determine that interaction is intended to 0.9；When determining facial angle is 355 °, determine that the corresponding intention score value of facial angle is 0.9；When determining facial angle is 340 °, determine that the corresponding intention score value of face is 0.6.

Determine that the interaction of target object is intended to based on intention score value in addition to above-mentioned, in alternatively possible embodiment, In order to save calculation amount, according to the collected facial image of smart machine, the interaction of target object belonging to facial image is judged When intention, it can be intended to judge whether parameter is in the interaction and is intended to judge the numberical range of parameter according to each interaction, to sentence Whether disconnected target object has an interactive intention, and interaction is intended to judge that parameter includes facial angle, the Sounnd source direction of voice data with At least one of lip motion feature.

Further, if each interaction is intended to judge that parameter is in the interaction and is intended to judge the numberical range of parameter, Determine that there is target object interaction to be intended to；If any interaction is intended to judge that parameter is not in the interaction and is intended to judge the numerical value of parameter Range, it is determined that target object does not have interaction and is intended to.

For example, if interaction is intended to judge that parameter includes the Sounnd source direction of facial angle and voice data, in facial angle In the angle in preset facial angle numberical range and between the Sounnd source direction of voice data and the direction of smart machine pre- If angular interval in when, determine target object have interaction be intended to；For another example, if interaction is intended to judge that parameter includes face angle Degree, the Sounnd source direction of voice data and lip motion feature, then in facial angle in preset facial angle numberical range, voice Angle between the Sounnd source directions of data and the direction of smart machine is in preset angular interval and the facial image of acquisition When the frame number numerical value of middle lip motion feature is in the frame number numberical range of preset lip feature, determines that target object has and hand over Mutually it is intended to.

Wherein, interaction is intended to judge that parameter values range can be configured according to practical application scene.For example, it is assumed that people Face angular values range is [0,20 °] and [345 °, 360 °], between the Sounnd source direction of voice data and the direction of smart machine Angular interval is [0,25 °] and [340 °, 360 °], and the frame number numberical range of the image of lip motion feature is [2,10].

In an example it is assumed that facial angle is 15 °, between the Sounnd source direction of voice data and the direction of smart machine Angle be 350 ° and the facial image of acquisition in the frame number of image comprising target object lip motion feature be 5, face It include target object lip in angle and facial image between angle, the Sounnd source direction of voice data and the direction of smart machine The frame number of the image of portion's motion characteristic is in corresponding numberical range or section, it is determined that target object has interaction meaning Figure.

In another example, it is assumed that facial angle is 40 °, between the Sounnd source direction of voice data and the direction of smart machine Angle be 350 ° and the facial image of acquisition in the frame number of image comprising target object lip motion feature be 5, voice It is special comprising target object lip motion in angle and facial image between the Sounnd source direction of data and the direction of smart machine The frame number of the image of sign is in corresponding numberical range, but facial angle exceeds corresponding numberical range, it is determined that target Object does not have interaction and is intended to.

It should be noted that when each interaction of configuration is intended to judge the corresponding relationship of parameter and intention point, for different Interaction is intended to judge that parameter can configure different numberical range and intention score value corresponding relationship, can also partly or entirely hand over Mutually it is intended to judge the identical numberical range of parameter configuration and intention score value corresponding relationship.For example, determining interaction intention judgement When the corresponding intention score value of parameter, the Sounnd source direction of facial angle and voice data is configured into identical numberical range and intention Score value corresponding relationship.

In one example, it is assumed that by pair of facial angle and the numberical range of the Sounnd source direction of voice data and intention score value Should be related to and be each configured to: numberical range [0,5 °] and [355 °, 360 °], corresponding intention score value are 0.9；Numberical range (5 °, 10 °] and [350 °, 355 °), corresponding intention score value is 0.8；Numberical range (10 °, 15 °] and [345 °, 350 °), corresponding meaning It is 0.7 to score value；Numberical range (15 °, 20 °] and [340 °, 345 °), corresponding intention score value is 0.6.

Determining that facial angle is 18 °, angle of the Sounnd source direction and smart machine of voice data between is 16 ° When, then according to above-mentioned corresponding relationship, determine that facial angle intention score value corresponding with the Sounnd source direction of voice data is 0.6.

In a kind of possible embodiment, the embodiment of the present invention is judging parameter according to interaction intention, determines target pair When the interaction of elephant is intended to, interaction can also be intended to judgement and be combined, and determined and combine corresponding intention score value.

For example, it is assumed that joining the absolute value of difference between facial angle and the Sounnd source direction of voice data as object judgement Number, object judgement parameter values range are [0 °, 30 °].

Assuming that the corresponding relationship of the numberical range of object judgement parameter and intention score value are as follows: object judgement parameter be [0, 5 °], corresponding intention score value be 0.9, object judgement parameter be (5 °, 10 °], corresponding intention score value be 0.8, object judgement ginseng Number for (10 °, 20 °], corresponding intention score value be 0.7, object judgement parameter be (20 °, 30 °], corresponding intention score value is 0.6。

Determining that facial angle is 18 °, when angle of the Sounnd source direction and smart machine of voice data between is 5 °, It determines that object judgement parameter is 13 °, and then can determine that the corresponding intention score value of object judgement parameter is 0.7；Determining face Angle is 18 °, when angle of the Sounnd source direction and smart machine of voice data between is 16 °, determines object judgement parameter It is 2 °, and then can determines that the corresponding intention score value of object judgement parameter is 0.9.

When judging that parameter determines the corresponding intention score value of combination with specific reference to interaction intention, it can be intended to based on each interaction Judge that intention score value is individually determined in parameter, multiple interactions can also be intended to judge that parameter is combined then determining combination and corresponds to Intention score value, can be with flexible choice, it is not limited in the embodiment of the present invention.

Based on any of the above-described embodiment, in a kind of possible embodiment, in order to improve the intelligent journey of smart machine Degree, after the angle between the direction for the Sounnd source direction and smart machine for determining voice data is in predetermined angle section, Can also the intention further to target object determined by the semantics recognition result based on voice data judge, and in base When the intention determined by the semantics recognition result of voice data belongs to pre-set business type, according to the collected people of smart machine Face image judges that the interaction of target object belonging to facial image is intended to.

Under which, voice data Sounnd source direction and smart machine direction between angle and be based on voice When being intended to be all satisfied respective preset condition determined by the semantics recognition result of data, just triggering is collected according to smart machine Facial image, judge target object belonging to facial image interaction be intended to.If any one is unsatisfactory for, do not trigger according to intelligence The energy collected facial image of equipment judges that the interaction of target object belonging to facial image is intended to.

When it is implemented, the semanteme to voice data identifies, the industry of voice data is determined according to semantics recognition result When service type, natural language understanding (Natural Language Underatanding, abbreviation NLU) technology can be used, it is right Voice data is identified, can also be identified using other methods to the semanteme of voice data, and according to semantics recognition knot Fruit determines the type of service of voice data.

In a kind of possible embodiment, pre-set business type can be stronger with the business correlation of smart machine Type of service, in this embodiment, the angle between the Sounnd source direction of voice data and the direction of smart machine is in preset angle It is intended to stronger with the business correlation of smart machine in degree section and based on determined by the semantics recognition result of voice data Type of service when, trigger according to the collected facial image of smart machine, judge the friendship of target object belonging to facial image Mutually it is intended to；If the angle between the Sounnd source direction of voice data and the direction of smart machine determines not outside predetermined angle section Voice data is responded；If being intended to the business with smart machine based on determined by the semantics recognition result of voice data The weaker type of service of correlation, determination do not respond voice data, or control smart machine output is pre-set Return information, pre-set return information can be business recommended information or inform that the target object business is in business processing The information of range.

In another possible embodiment, pre-set business type can be weaker with the business correlation of smart machine Type of service, in this embodiment, the angle between the Sounnd source direction of voice data and the direction of smart machine is in preset angle It is intended to weaker with the business correlation of smart machine in degree section and based on determined by the semantics recognition result of voice data Type of service when, trigger according to the collected facial image of smart machine, judge the friendship of target object belonging to facial image Mutually it is intended to；If the angle between the Sounnd source direction of voice data and the direction of smart machine determines not outside predetermined angle section Voice data is responded；If being intended to the business with smart machine based on determined by the semantics recognition result of voice data It is stronger to determine that the interaction of target object is intended to for the stronger type of service of correlation, no longer needs to the interaction meaning for judging the target object Figure, directly in response to the voice data.

It should be noted that can be carried out according to actual needs with the stronger type of service of smart machine business correlation Setting, for example, can be question and answer type (for example, it may be " now several with the stronger type of service of smart machine business correlation Select "), inquiry type (for example, " could provide the route on the road XX "), business consultation type is (for example, " the preferential set meal of today is What "), briefing type (for example, " today, temperature was how many "), operational order type is being (for example, smart machine is leading field Scape is in use, operational order type can be " removing meeting room with me ").

Weaker type of service, can be configured according to actual needs with smart machine business correlation.For example, and intelligence The weaker type of service of energy appliance services correlation can be chat type (for example, " you have had a meal "), non-question and answer type (example Such as, " I is XXX ") etc..

Embodiment two, in the present embodiment is belonged to pre- with intention determined by the semantics recognition result based on voice data If type of service as preset condition, with judge to be intended to according to determined by voice data whether smart machine the scope of business It is interior, to realize the only response to intention is interacted in voice data in the smart machine scope of business.

The present embodiment, using determined by the semantics recognition result based on voice data intention belong to pre-set business type as Preset condition first carries out the judgement that preliminary interaction is intended to voice data, if determined by the semantics recognition result of voice data Intention is when belonging to pre-set business type, show the business correlation of the intention based on determined by voice data and smart machine compared with By force, then the judgement of intention is further interacted to voice data；If being intended to determined by the semantics recognition result of voice data When belonging to non-default type of service, show that the intention based on determined by voice data is weaker with the business correlation of smart machine, It then may filter that the voice data, without interacting the judgement of intention, reduce the consumption for calculating system power.

The embodiment of the invention also provides a kind of control methods of smart machine, as shown in figure 4, may include walking as follows It is rapid:

Step 401 obtains the collected voice data of smart machine.

Step 402, the intention of target object is determined based on the semantics recognition result of voice data.

When it is implemented, the semanteme to voice data identifies, the meaning of target object is determined according to semantics recognition result When figure, natural language understanding technology can be used, voice data is identified, it can also be using other methods to voice data Semanteme identified, and the intention of target object is determined according to semantics recognition result.

Step 403, determine be intended to pre-set business type based on determined by the semantics recognition result of voice data when, According to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to.

With specific reference to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to Mode it is similar with the mode of above-described embodiment one, details are not described herein again.

Wherein, facial image is preferably the people for getting and acquiring in the voice data corresponding period in the embodiment of the present invention Face image.

It should be noted that pre-set business type can be with the stronger type of service of smart machine business correlation, can To be configured according to actual needs, for example, can be question and answer type with the stronger type of service of smart machine business correlation (for example, it may be " several points now "), consulting type (for example, " route on the road XX could be provided "), business consultation type (example Such as, " what the preferential set meal of today is "), briefing type (for example, " today, temperature was how many "), operational order type (for example, smart machine is leading scene in use, operational order type can be " removing meeting room with me ").

Step 404, the judging result being intended to according to interaction, control smart machine execute corresponding operation.

Wherein, similar with the step 204 in embodiment one when step 404 is embodied, for details, reference can be made in embodiment one Associated description, details are not described herein again.

In a kind of possible embodiment, however, it is determined that the service class based on determined by the semantics recognition result of voice data Type is non-default type of service, and control smart machine exports pre-set return information, and pre-set return information can be with It is the information that business recommended information or informing target object business are in business processing range.

Wherein, non-default type of service can be the type of service weaker with smart machine business correlation, can basis Actual needs is configured.For example, the type of service weaker with smart machine business correlation can for chat type (for example, " you have had a meal "), non-question and answer type (for example, " I is XXX ") etc..

Based on any of the above-described embodiment, in a kind of possible embodiment, in order to improve the intelligent journey of smart machine Degree can also be into after determining that the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type One step judges the angle between the Sounnd source direction for determining voice data direction corresponding with smart machine, and in determination When angle between the Sounnd source direction of voice data direction corresponding with smart machine is in predetermined angle section, set according to intelligence Standby collected facial image judges that the interaction of target object belonging to facial image is intended to；In the Sounnd source direction of voice data When angle between direction corresponding with smart machine is outside predetermined angle section, determination is not responded voice data.

Embodiment three, in the present embodiment is belonged to pre- with intention determined by the semantics recognition result based on voice data If type of service as preset condition, with judge to be intended to according to determined by voice data whether with business strong correlation, thus It realizes and directly the voice data of business strong correlation is responded, and to the voice data of business not strong correlation, further Judge that the interaction of target object is intended to.

The present embodiment, using determined by the semantics recognition result based on voice data intention belong to pre-set business type as Preset condition first carries out the judgement that preliminary interaction is intended to voice data, if determined by the semantics recognition result of voice data Intention is when belonging to pre-set business type, and the intention for showing voice data and the business correlation of smart machine are stronger, then directly loud Should voice data, without carry out it is subsequent interaction be intended to judgement, reduce to system calculate power consumption；If the language of voice data When being intended to belong to non-default type of service determined by adopted recognition result, show the intention of voice data and the business of smart machine Correlation is weaker, then the judgement that can be intended to further progress interaction, to determine whether to respond the voice data.

The embodiment of the invention also provides a kind of control methods of smart machine, as shown in figure 5, may include walking as follows It is rapid:

Step 501 obtains the collected voice data of smart machine.

Step 502, the intention of target object is determined based on the semantics recognition result of voice data.

Step 503, determine be intended to pre-set business type based on determined by the semantics recognition result of voice data when, According to the collected facial image of smart machine, judge that the interaction of target object belonging to facial image is intended to.

In the present embodiment, pre-set business type is the type of service weaker with smart machine business correlation, can basis Actual needs is configured.For example, the type of service weaker with smart machine business correlation can for chat type (for example, " you have had a meal "), non-question and answer type (for example, " I is XXX ") etc..

Step 504, the judging result being intended to according to interaction, control smart machine execute corresponding operation.

Wherein, similar with the step 204 in embodiment one when step 504 is embodied, for details, reference can be made in embodiment one Associated description, details are not described herein again.

In a kind of possible embodiment, if it is determined that the semantics recognition result institute based on voice data is true in step 503 Fixed type of service is non-default type of service, and control smart machine is directly in response to the voice data, and belonging to voice data Target object interacts.

Wherein, non-default type of service can be with the stronger type of service of smart machine business correlation, can basis Actual needs is configured.For example, with the stronger type of service of smart machine business correlation can for question and answer type (for example, Can be " now several points "), consulting type (for example, " could provide the route on the road XX "), business consultation type is (for example, " today Preferential set meal what is "), briefing type (for example, " today, temperature was how many "), operational order type (for example, intelligence Equipment is leading scene in use, operational order type can be " removing meeting room with me ").

Based on identical inventive concept, the embodiment of the invention also provides a kind of control devices of smart machine.

As shown in fig. 6, the control device of smart machine provided in an embodiment of the present invention, comprising:

Acquiring unit 601, the voice data for taking smart machine to acquire；

Processing unit 602, it is collected according to smart machine if the attribute information for voice data meets preset condition Facial image judges that the interaction of target object belonging to facial image is intended to；

Control unit 603, the judging result for being intended to according to interaction, control smart machine execute corresponding operation.

In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, preset condition includes following One or more: the intention based on determined by the semantics recognition result of voice data belongs to pre-set business type, voice data Angle between Sounnd source direction and the direction of smart machine is in predetermined angle section.

In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is specifically used In:

In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, processing unit 602 is also used In:

The range of parameter values of parameter and the corresponding relationship of intention score value are judged according to interaction intention, determine each of target object Interaction is intended to judge the corresponding intention score value of the parameter value of parameter, and interaction is intended to judge that parameter includes facial angle, Sounnd source direction At least one of with lip motion feature；

In a kind of possible embodiment, in above-mentioned apparatus provided in an embodiment of the present invention, control unit 603 is specifically used In:

If the angle between the Sounnd source direction of voice data method corresponding with smart machine is outside predetermined angle section, really It is fixed voice data not to be responded.

In addition, the control method and device in conjunction with the smart machine of Fig. 1-Fig. 6 embodiment of the present invention described can be by intelligence Can the control equipment of equipment realize.Fig. 7 shows the hardware knot of the control equipment of smart machine provided in an embodiment of the present invention Structure illustrate to.

The control equipment of smart machine may include processor 701 and the memory for being stored with computer program instructions 702。

Specifically, above-mentioned processor 701 may include central processing unit (CPU) or specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement implementation of the present invention One or more integrated circuits of example.

Memory 702 may include the mass storage for data or instruction.For example it rather than limits, memory 702 may include hard disk drive (Hard Disk Drive, HDD), floppy disk drive, flash memory, CD, magneto-optic disk, tape or logical With the combination of universal serial bus (Universal Serial Bus, USB) driver or two or more the above.It is closing In the case where suitable, memory 702 may include the medium of removable or non-removable (or fixed).In a suitable case, it stores Device 702 can be inside or outside data processing equipment.In a particular embodiment, memory 702 is nonvolatile solid state storage Device.In a particular embodiment, memory 702 includes read-only memory (ROM).In a suitable case, which can be mask ROM, programming ROM (PROM), erasable PROM (EPROM), the electric erasable PROM (EEPROM), electrically-alterable ROM of programming (EAROM) or the combination of flash memory or two or more the above.

Processor 701 is by reading and executing the computer program instructions stored in memory 702, to realize above-mentioned implementation The control method of any one smart machine in example.

In one example, the control equipment of smart machine may also include communication interface 703 and bus 710.Wherein, as schemed Shown in 7, processor 701, memory 702, communication interface 703 connect by bus 710 and complete mutual communication.

Communication interface 703 is mainly used for realizing in the embodiment of the present invention between each module, device, unit and/or equipment Communication.

Bus 710 includes hardware, software or both, and the component of the control equipment of smart machine is coupled to each other together. For example it rather than limits, bus may include accelerated graphics port (AGP) or other graphics bus, enhancing Industry Standard Architecture (EISA) bus, front side bus (FSB), super transmission (HT) interconnection, Industry Standard Architecture (ISA) bus, infinite bandwidth interconnect, are low Number of pins (LPC) bus, memory bus, micro- channel architecture (MCA) bus, peripheral component interconnection (PCI) bus, PCI- Express (PCI-X) bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association part (VLB) bus or The combination of other suitable buses or two or more the above.In a suitable case, bus 710 may include one Or multiple buses.Although specific bus has been described and illustrated in the embodiment of the present invention, the present invention considers any suitable bus Or interconnection.

The voice data that the control equipment of smart machine can be acquired based on the smart machine of acquisition executes the present invention and implements The control method of smart machine in example, to realize control method and device in conjunction with Fig. 1-Fig. 5 smart machine described.

In addition, the embodiment of the present invention can provide a kind of calculating in conjunction with the control method of the smart machine in above-described embodiment Machine readable storage medium storing program for executing is realized.Computer program instructions are stored on the computer readable storage medium；The computer program The control method of any one smart machine in above-described embodiment is realized in instruction when being executed by processor.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.) Formula.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then also intention includes these modifications and variations the present invention.

Claims

1. a kind of control method of smart machine characterized by comprising

Obtain the collected voice data of smart machine；

If the attribute information of the voice data meets preset condition, according to the collected facial image of the smart machine, sentence The interactive of target object belonging to the facial image of breaking is intended to；

According to the judging result that the interaction is intended to, controls the smart machine and execute corresponding operation.

2. the method according to claim 1, wherein the preset condition includes one or more of: being based on It is intended to belong to the Sounnd source direction of pre-set business type, the voice data determined by the semantics recognition result of the voice data Angle between the direction of the smart machine is in predetermined angle section.

3. method according to claim 1 or 2, which is characterized in that described according to the collected face of the smart machine Image judges that the interaction of target object belonging to the facial image is intended to, comprising:

If the brightness of face region is greater than predetermined luminance threshold value in the facial image, according to the facial image The facial angle and lip motion feature of target object determine that the interaction of the target object is intended to；Or

If the brightness of face region is less than or equal to predetermined luminance threshold value in the facial image, according to the facial image Described in target object facial angle, determine the target object interaction be intended to.

4. according to the method described in claim 3, it is characterized in that, described according to the collected face figure of the smart machine Picture judges that the interaction of target object belonging to the facial image is intended to, further includes:

According to the Sounnd source direction of the voice data, determine that the interaction of the target object is intended to.

5. the method according to claim 1, wherein described according to the collected face figure of the smart machine Picture judges that the interaction of target object belonging to the facial image is intended to, comprising:

The range of parameter values of parameter and the corresponding relationship of intention score value are judged according to interaction intention, determine each of the target object Interaction is intended to judge the corresponding intention score value of the parameter value of parameter, and the interaction is intended to judge that parameter includes facial angle, described At least one of the Sounnd source direction of voice data and lip motion feature；

It is intended to judge the corresponding intention score value of parameter according to each interaction of the target object, determines and characterize the target object tool The confidence level for the probability for thering is interaction to be intended to.

6. according to the method described in claim 5, it is characterized in that, the judging result being intended to according to the interaction, control The smart machine executes corresponding operation, comprising:

When determining that the confidence level is less than or equal to the first preset threshold, determination is not responding to the voice data；Or

Determine the confidence level be greater than first preset threshold, and the confidence level be less than or equal to the second preset threshold When, the return information that the smart machine exports the voice data with written form is controlled, second preset threshold is greater than First preset threshold；Or

When determining that the confidence level is greater than second preset threshold, the smart machine is controlled with voice broadcasting modes and text Word mode exports the return information of the voice data.

7. according to the method described in claim 2, it is characterized in that, the method also includes:

If the intention based on determined by the semantics recognition result of the voice data belongs to non-default type of service, the intelligence is controlled It can the corresponding return information of the equipment output voice data.

8. a kind of control device of smart machine characterized by comprising

Acquiring unit, for obtaining the voice data of smart machine acquisition；

Processing unit collects if the attribute information for the voice data meets preset condition according to the smart machine Facial image, judge target object belonging to the facial image interaction be intended to；

Control unit, the judging result for being intended to according to the interaction, controls the smart machine and executes corresponding operation.

9. a kind of control equipment of smart machine characterized by comprising at least one processor, at least one processor with And the computer program instructions of storage in the memory, it is real when the computer program instructions are executed by the processor Now such as the control method of smart machine of any of claims 1-7.

10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that when the calculating The control method such as smart machine of any of claims 1-7 is realized when machine program instruction is executed by processor.