CN108064006A - Intelligent sound box and control method for playing back - Google Patents
Intelligent sound box and control method for playing back Download PDFInfo
- Publication number
- CN108064006A CN108064006A CN201810142948.9A CN201810142948A CN108064006A CN 108064006 A CN108064006 A CN 108064006A CN 201810142948 A CN201810142948 A CN 201810142948A CN 108064006 A CN108064006 A CN 108064006A
- Authority
- CN
- China
- Prior art keywords
- gesture
- sound box
- intelligent sound
- profile
- gesture motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 34
- 238000001514 detection method Methods 0.000 claims description 41
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 12
- 230000002452 interceptive effect Effects 0.000 abstract description 4
- 239000013598 vector Substances 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000006073 displacement reaction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005286 illumination Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 235000015170 shellfish Nutrition 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/22—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/01—Aspects of volume control, not necessarily automatic, in sound systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R27/00—Public address systems
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present invention provides a kind of intelligent sound box and control method for playing back, control method for playing back includes:Intelligent sound box carries out human testing;When detecting human body, the gesture motion of the human body is identified;The broadcast state of the intelligent sound box is adjusted according to the gesture motion.Method provided by the invention increases a kind of interactive mode using intelligent sound box so that user can control intelligent sound box by gesture, improve user experience.
Description
Technical field
The present invention relates to intelligent sound box fields, especially relate to a kind of intelligent sound box and control method for playing back.
Background technology
Intelligent sound box is the product of a speaker upgrading, is the instrument that family consumer is surfed the Internet with voice, than
Such as requesting songs, online shopping or understanding weather forecast, it can also control smart home device, for example open
Curtain, set refrigerator temperature, allow in advance water heater heating etc..
Using Amazon Echo as the intelligent sound box of representative, intelligent sound technology is actually belonged to.Its operation is required for language
Sound instructs to control.However, existing domestic environment background noise is larger, this noise can influence the correct knowledge of phonetic order
Not, user experience is reduced.Therefore, it is necessary to use more modes, user is facilitated to be interacted with intelligent sound box, promote user's body
It tests.
The content of the invention
For the main object of the present invention to provide a kind of intelligent sound box and control method for playing back, enhancing uses the use of intelligent sound box
It experiences at family.
The present invention provides a kind of control method for playing back, comprise the following steps:
Intelligent sound box carries out human testing;
When detecting human body, the gesture motion of the human body is identified;
The broadcast state of the intelligent sound box is adjusted according to the gesture motion.
Preferably, the step of gesture motion of the identification human body includes:
The gesture of every frame images of gestures of the human body detected with background is separated, and is found out in every frame images of gestures
Gesture profile;
The gesture profile is matched frame by frame with default beginning gesture profile, will match to first gesture profile
It is determined as starting gesture profile;
Gesture profile of the sequential after the beginning gesture profile is matched frame by frame with default end gesture profile,
The first gesture profile that will match to is determined as terminating gesture profile;
Will be using the beginning gesture profile as starting, the gesture profile that terminates is determined as identifying for the gesture motion of ending
The one group of gesture motion arrived.
Preferably, it is described the intelligent sound box is adjusted according to the gesture motion broadcast state the step of include:
Determine the corresponding control instruction of the gesture motion;
The broadcast state of the intelligent sound box is adjusted according to the control instruction.
Preferably, described the step of determining the gesture motion corresponding control instruction, includes:
Feature extraction is carried out to the gesture motion, obtains gesture motion feature;
The gesture motion feature is encoded, obtains coding result;
Determine the corresponding control instruction of the coding result.
Preferably, the method further includes:
Calculate the physical distance between the intelligent sound box and the human body;
The volume of the intelligent sound box is adjusted according to the physical distance.
Preferably, the step of intelligent sound box progress human testing includes:
The intelligent sound box is based on gradient orientation histogram and carries out human testing.
Preferably, the step of intelligent sound box carries out human testing based on gradient orientation histogram includes:
First-order Gradient calculating is carried out to the image in detection window;
Calculate the gradient orientation histogram of unit lattice in described image;
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction
Fang Tu;
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window,
And using the gradient orientation histogram of the detection window as characteristics of human body's vector.
Another aspect of the present invention, it is also proposed that a kind of intelligent sound box, including:
Detection module, for carrying out human testing;
Identification module, for when detecting human body, identifying the gesture motion of the human body;
Module is adjusted, for adjusting the broadcast state of the intelligent sound box according to the gesture motion.
Preferably, the identification module includes:
Separative element for the gesture of every frame images of gestures of the human body detected to be separated with background, and is found out
Per the gesture profile in frame images of gestures;
Start gesture unit, for the gesture profile to be matched frame by frame with default beginning gesture profile, will match
To first gesture profile be determined as start gesture profile;
Terminate gesture unit, for by sequential it is described beginning gesture profile after gesture profile frame by frame with default end
Gesture profile is matched, and will match to first gesture profile is determined as terminating gesture profile;
Gesture motion unit, for will be described to terminate gesture profile to end up using the beginning gesture profile for starting
Gesture motion is determined as recognize one group of gesture motion.
Preferably, the adjustment module includes:
Determine instruction unit, for determining the corresponding control instruction of the gesture motion;
Adjustment unit, for adjusting the broadcast state of the intelligent sound box according to the control instruction.
Preferably, the determine instruction unit includes:
Feature subelement is obtained, feature extraction is carried out to the gesture motion for feature, obtains gesture motion feature;
Coded sub-units for being encoded to the gesture motion feature, obtain coding result;
Determine instruction subelement, for determining the corresponding control instruction of the coding result.
Preferably, further include:
Distance calculation module, for calculating the physical distance between the intelligent sound box and the human body;
Volume module is adjusted, for adjusting the volume of the intelligent sound box according to the physical distance.
Preferably, the detection module includes:
Gradient detection units carry out human testing for being based on gradient orientation histogram.
Preferably, the gradient detection units include:
First-order Gradient computation subunit, for carrying out First-order Gradient calculating to the image in detection window;
Cell gradient subelement, for calculating the gradient orientation histogram of unit lattice in described image;
Block gradient subelement for all cells in described image in each block to be normalized, obtains
Described piece of gradient orientation histogram;
Feature vector subelement is generated, for all pieces in described image to be normalized, obtains the inspection
The gradient orientation histogram of window is surveyed, and using the gradient orientation histogram of the detection window as characteristics of human body's vector.
The invention also provides a kind of intelligent sound box, including memory, processor and at least one described deposit is stored in
In reservoir and the application program performed by the processor is configured as, the application program is configurable for performing above-mentioned
Control method for playing back.
The present invention provides intelligent sound box and control method for playing back, control method for playing back therein includes:Intelligent sound box into
Row human testing;When detecting human body, the gesture motion of the human body is identified;The intelligence is adjusted according to the gesture motion
The broadcast state of speaker.Method provided by the invention increases a kind of interactive mode using intelligent sound box so that user can lead to
It crosses gesture to control intelligent sound box, improves user experience.
Description of the drawings
Fig. 1 is the flow diagram of one embodiment of control method for playing back of the present invention;
Fig. 2 is the structure diagram of one embodiment of intelligent sound box of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
With reference to Fig. 1, the embodiment of the present invention proposes a kind of control method for playing back, comprises the following steps:
S10, intelligent sound box carry out human testing;
S20, when detecting human body, identify the gesture motion of the human body;
S30, the broadcast state that the intelligent sound box is adjusted according to the gesture motion.
In the present embodiment, depth transducer is installed on intelligent sound box.Depth transducer is divided into two classes:Passive type cubic phase
Machine and active depth camera.Passive type stereoscopic camera observes scene using two or more cameras, and uses these
Difference (displacement) in multiple views of camera between feature estimates the depth of scene.Active depth camera is to scene simulation
Sightless infrared light, and according to the information reflected, estimate the depth of scene.In an application scenarios, user's first station exists
With intelligent sound box certain position, some gesture instructions are made to the depth transducer of intelligent sound box, such as open play instruction, intelligence
After speaker identifies the meaning of user's first gesture instruction, sound is played.
In step S10, intelligent sound box carries out human testing by depth transducer.It can be based on gradient orientation histogram
(Histogram of oriented gradient, HOG), scale invariant feature convert (Scale-invariant feature
Transform, SIFT), local binary patterns (Local Binary Pattern, LBP), the characteristics of image such as HARR carry out human body
Detection.
In step S20, when intelligent sound box detects human body, the gesture motion of the human body is identified.Particular by depth
Sensor obtains one group of video data for including gesture.Depth transducer plays the role of making video recording here.It can be by default rule
Then obtain video data.Such as, it is when depth transducer monitors that user has larger gesture motion, this section of video data is true
It is set to the video data for including gesture.
It is the continuous image of multiframe by above-mentioned Digital video resolution, the background in image is separated with gesture, and finds out every
Gesture profile in two field picture.The start frame and end frame of gesture motion are determined by default rule.By start frame and end frame
Between gesture profile be determined as gesture motion.That is, gesture motion includes the gesture profile of multiple image.
In step S30, after obtaining gesture motion, feature extraction is carried out to gesture motion, obtains gesture motion feature, it is right
Gesture motion feature is identified, and obtains recognition result, finally generates control instruction according to recognition result.
Intelligent sound box adjusts broadcast state according to control instruction.Control instruction such as acquisition is to commence play out instruction, then intelligence
Energy speaker commences play out sound;If the control instruction obtained is stops play instruction, intelligent sound box stops playing sound.
Optionally, step S20 includes:
The gesture of every frame images of gestures of the human body detected with background is separated, and is found out in every frame images of gestures
Gesture profile;
The gesture profile is matched frame by frame with default beginning gesture profile, will match to first gesture profile
It is determined as starting gesture profile;
Gesture profile of the sequential after the beginning gesture profile is matched frame by frame with default end gesture profile,
The first gesture profile that will match to is determined as terminating gesture profile;
Will be using the beginning gesture profile as starting, the gesture profile that terminates is determined as identifying for the gesture motion of ending
The one group of gesture motion arrived.
In the present embodiment, intelligent sound box is stored with the different corresponding default beginning gesture profiles of control instruction and default end
Gesture profile.Each gesture profile of video data is first matched with default beginning gesture profile frame by frame, by matched first
Frame gesture profile is determined as starting gesture profile.Gesture profile after first frame, frame by frame with it is default end gesture profile into
Matched first frame gesture profile is determined as terminating gesture profile by row matching.Then, will using it is described beginning gesture profile for
Begin, the gesture profile that terminates is determined as gesture motion for the gesture profile sequence of ending.The gesture motion of acquisition can be used to know
The meaning that other gesture includes, and then generate corresponding control instruction.
Optionally, step S30 includes:
Determine the corresponding control instruction of the gesture motion;
The broadcast state of the intelligent sound box is adjusted according to the control instruction.
In the present embodiment, storage chip has prestored the corresponding control instruction of multigroup different gesture motion on intelligent sound box.Example
As it can be stated that gesture motion " up waves " correspondence " promotion volume " instruction, gesture motion " waves " correspondence down " reduces sound
Amount " instruction, gesture motion correspondence " stop play " of " waving " instruct, and gesture motion " both hands are patted " correspondence " commencing play out " refers to
Order.Instruction is commenced play out when intelligent sound box determines that gesture motion that user makes is corresponding, then intelligent sound box can be according to starting to broadcast
Instruction is put to play out.The content of broadcasting can be music or news.Likewise, when intelligent sound box determines that user does
The corresponding end play instruction of gesture motion gone out, then intelligent sound box can be according to end play instruction stopping broadcasting sound-content.
User can be against the interference for stopping the sound-content before playing.
Optionally, described the step of determining the gesture motion corresponding control instruction, includes:
Feature extraction is carried out to the gesture motion, obtains gesture motion feature;
The gesture motion feature is encoded, obtains coding result;
Determine the corresponding control instruction of the coding result.
In the present embodiment, gesture motion is characterized in the arrangement set of every two field picture contour feature.In order to obtain gesture motion
Feature is, it is necessary to calculate the characteristic value of each profile of every two field picture.Specifically, obtained gesture profile will be extracted, calculating should
The contour feature value of each profile in gesture profile.Region histogram of the contour feature value including each profile of each profile,
Square and earth displacement distance.
Then, the gesture motion feature of extraction is encoded using 8 reference direction vectors, calculation code result.8
A reference direction refers to 360 degree of eight directions divided equally.
It can carry out calculation code result using DTW algorithms.In DTW algorithms, being stored in each gesture of template library becomes
Sample form, a sample form be expressed as T (1), T (2) ..., T { m } ..., T { M } }.The input hand to be identified
Gesture is test template, be expressed as S (1), S (2) ... S (n) ..., S (N) }.By each frame number m=1-M of test template vertical
It is marked on axis, grid one by one can be formed by drawing ordinate by the coordinate of these expression frame numbers, each intersection in grid
Point (n, m) represents the joint of a certain frame and a certain frame in training mode in test template.
DTW algorithms, which can be attributed to, finds a path by several lattice points in this grid to describe this paths,
Assuming that path by all lattice points be followed successively by (n1,m1),...,(ni,mi),..,(nN=mN), wherein (n1,m1)=(1,1),
(nN,mN)=(N, M) paths can use function mi=f (ni), wherein ni=i, i=1,2 ..., N, f (1)=1, f (N)=M.
In order to which path is made to be unlikely to too to tilt, inclination can be constrained in the range of 0-2, if path passes through lattice point (ni,mi), then
Its previous node is only possible to be one of following three situation:(ni-1,mi),(ni-1,mi- 1) or (ni-1,mi-2).Path
Cumulative Distance D [(ni,mi)]=d [S (ni),T(mi)+D((ni-1,mi)], -1) wherein (ni-1,mi- 1) determined by following formula:
D[(ni-1,mi- 1)]=min { D [ni-1,mi],D[(ni-1,mi-1)],D[(ni-1,mi-2)]}。
Finally, the corresponding control instruction of coding result is determined.The coding result of acquisition is compared with pre-arranged code data, output
The corresponding control instruction of immediate pre-arranged code data.In order to reduce false detection rate, can also degree of being positioned proximate to threshold value, if
The coding result of acquisition is too low with the matching degree of pre-arranged code data, then does not export control instruction.
Optionally, control method for playing back further includes:
Calculate the physical distance between the intelligent sound box and the human body;
The volume of the intelligent sound box is adjusted according to the physical distance.
In the present embodiment, can the distance between intelligent sound box and user directly be calculated by active depth camera,
Then volume is adjusted according to the distance of user and intelligent sound box, so that the volume after adjusting reaches preset value.For example, user is from intelligence
Can 5 meters of speaker when, when the volume heard is 50 decibels, 10 meters from intelligent sound box, in order to which the volume for hearing user is equal to 50 points
Shellfish need to improve the volume of intelligent sound box.Since indoors, distance, into certain correspondence, can be closed with volume according to corresponding
System adjusts the volume of intelligent sound box so that the volume that user hears in different location is the same.Preset value herein can be 5
The volume or the volume of a physical distance of factory default that user hears at rice.
Optionally, step S10 includes:
The intelligent sound box is based on gradient orientation histogram and carries out human testing.
In the present embodiment, intelligent sound box can be based on gradient orientation histogram (Histogram of oriented
Gradient, HOG) carry out human testing.
Gradient orientation histogram is analogous to a kind of local description symbol of scale invariant feature conversion, it is local by calculating
Gradient orientation histogram on region forms characteristics of human body.Unlike scale invariant feature conversion, scale invariant feature
Conversion is the feature extraction based on key point, is a kind of sparse description method, and gradient orientation histogram is intensive description side
Method.
Gradient orientation histogram describes method and has the following advantages:What gradient orientation histogram represented is edge (gradient)
Structure feature, therefore local shape information can be described;The quantization in position and direction space can inhibit to a certain extent
The influence that translation and rotation are brought;The normalization in regional area, the influence that can be brought with partial offset illumination are taken simultaneously.Therefore
The embodiment of the present invention is preferably based on gradient orientation histogram and carries out human testing.
Optionally, the step of intelligent sound box carries out human testing based on gradient orientation histogram includes:
First-order Gradient calculating is carried out to the image in detection window;
Calculate the gradient orientation histogram of unit lattice in described image;
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction
Fang Tu;
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window,
And using the gradient orientation histogram of the detection window as characteristics of human body's vector.
In the present embodiment, First-order Gradient calculating is carried out to the image in detection window first, is specially:To standardize size
The detection window (Detection Window) of (such as 64x128) as input, by single order (one-dimensional) Sobel operators [- 1,0,
1] gradient on the image level and vertical direction in detection window is calculated.
Using single window as the benefit that grader inputs be grader has consistency to the position of target with scale.
For an input picture to be detected, it is necessary to along both horizontally and vertically moving detection window, while will be with more rulers
Degree zooms in and out image to detect the human body under different scale.
Then, the gradient orientation histogram of unit lattice in described image is calculated, is specially:Gradient orientation histogram is
Carry out what intensive calculations obtained in the grid for being referred to as cell (Cell) and block (Block).Divide the image into several units
Lattice, each cell is made of multiple pixels, and block is then made of several adjacent cells.
In this embodiment, the gradient of each pixel in image is first calculated, then counts in image institute in each cell
There is the gradient orientation histogram of the gradient orientation histogram of pixel, the i.e. cell.In the gradient direction of statistics unit lattice
During histogram, [0~π] is divided into multiple sections first against each cell, then according to each pixel in the cell
Gradient direction is weighted ballot paper account, obtains the gradient orientation histogram of all pixels in the cell.
When being weighted ballot paper account, the weight of each pixel is the gradient amplitude of the preferably pixel.In order to eliminate
Obscure, it is preferred to use three linear differences (Trilinear Interpolationi) are weighted ballot paper account.
Each cell in traversing graph picture, obtains the gradient orientation histogram of unit lattice in image.
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction
Fang Tu.In block, the gradient orientation histogram of the cell in the block is normalized, to eliminate the influence of illumination,
So as to obtain the gradient orientation histogram of the block.Each block in traversing graph picture, the gradient direction for obtaining each block in image are straight
Fang Tu.
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window,
And using the gradient orientation histogram of the detection window as characteristics of human body's vector.By the detection window obtained after each piece of normalization
Gradient orientation histogram, form characteristics of human body vector, so as to fulfill human testing.
Since gradient orientation histogram is a kind of intensive calculations mode, calculation amount is larger.In order to reduce calculation amount, carry
High detection speed, it may be considered that it selects to calculate gradient orientation histogram in the key area for having obvious human body contour outline, so as to
Achieve the purpose that reduce dimension.
The present invention provides a kind of control method for playing back, including:Intelligent sound box carries out human testing;When detecting human body
When, identify the gesture motion of the human body;The broadcast state of the intelligent sound box is adjusted according to the gesture motion.The present invention carries
For method increase a kind of interactive mode using intelligent sound box so that user can be controlled intelligent sound box by gesture
System, improves user experience.
With reference to Fig. 2, the embodiment of the present invention also proposed a kind of intelligent sound box, including:
Detection module 10, for carrying out human testing;
Identification module 20, for when detecting human body, identifying the gesture motion of the human body;
Module 30 is adjusted, for adjusting the broadcast state of the intelligent sound box according to the gesture motion.
In the present embodiment, depth transducer is installed on intelligent sound box.Depth transducer is divided into two classes:Passive type cubic phase
Machine and active depth camera.Passive type stereoscopic camera observes scene using two or more cameras, and uses these
Difference (displacement) in multiple views of camera between feature estimates the depth of scene.Active depth camera is to scene simulation
Sightless infrared light, and according to the information reflected, estimate the depth of scene.In an application scenarios, user's first station exists
With intelligent sound box certain position, some gesture instructions are made to the depth transducer of intelligent sound box, such as open play instruction, intelligence
After speaker identifies the meaning of user's first gesture instruction, sound is played.
In detection module 10, intelligent sound box carries out human testing by depth transducer.It can be based on gradient direction Nogata
Scheme (Histogram of oriented gradient, HOG), scale invariant feature conversion (Scale-invariant
Feature transform, SIFT), local binary patterns (Local Binary Pattern, LBP), the characteristics of image such as HARR
Carry out human testing.
In identification module 20, when intelligent sound box detects human body, the gesture motion of the human body is identified.Particular by depth
It spends sensor and obtains one group of video data for including gesture.Depth transducer plays the role of making video recording here.It can be by default
Rule obtains video data.Such as, when depth transducer monitors that user has larger gesture motion, by this section of video data
It is determined as including the video data of gesture.
It is the continuous image of multiframe by above-mentioned Digital video resolution, the background in image is separated with gesture, and finds out every
Gesture profile in two field picture.The start frame and end frame of gesture motion are determined by default rule.By start frame and end frame
Between gesture profile be determined as gesture motion.That is, gesture motion includes the gesture profile of multiple image.
It adjusts in module 30, after obtaining gesture motion, feature extraction is carried out to gesture motion, it is special to obtain gesture motion
Sign, is identified gesture motion characteristic, obtains recognition result, finally generates control instruction according to recognition result.
Intelligent sound box adjusts broadcast state according to control instruction.Control instruction such as acquisition is to commence play out instruction, then intelligence
Energy speaker commences play out sound;If the control instruction obtained is stops play instruction, intelligent sound box stops playing sound.
Optionally, identification module 20 includes:
Separative element for the gesture of every frame images of gestures of the human body detected to be separated with background, and is found out
Per the gesture profile in frame images of gestures;
Start gesture unit, for the gesture profile to be matched frame by frame with default beginning gesture profile, will match
To first gesture profile be determined as start gesture profile;
Terminate gesture unit, for by sequential it is described beginning gesture profile after gesture profile frame by frame with default end
Gesture profile is matched, and will match to first gesture profile is determined as terminating gesture profile;
Gesture motion unit, for will be described to terminate gesture profile to end up using the beginning gesture profile for starting
Gesture motion is determined as recognize one group of gesture motion.
In the present embodiment, intelligent sound box is stored with the different corresponding default beginning gesture profiles of control instruction and default end
Gesture profile.Each gesture profile of video data is first matched with default beginning gesture profile frame by frame, by matched first
Frame gesture profile is determined as starting gesture profile.Gesture profile after first frame, frame by frame with it is default end gesture profile into
Matched first frame gesture profile is determined as terminating gesture profile by row matching.Then, will using it is described beginning gesture profile for
Begin, the gesture profile that terminates is determined as gesture motion for the gesture profile sequence of ending.The gesture motion of acquisition can be used to know
The meaning that other gesture includes, and then generate corresponding control instruction.
Optionally, adjustment module 30 includes:
Determine instruction unit, for determining the corresponding control instruction of the gesture motion;
Adjustment unit, for adjusting the broadcast state of the intelligent sound box according to the control instruction.
In the present embodiment, storage chip has prestored the corresponding control instruction of multigroup different gesture motion on intelligent sound box.Example
As it can be stated that gesture motion " up waves " correspondence " promotion volume " instruction, gesture motion " waves " correspondence down " reduces sound
Amount " instruction, gesture motion correspondence " stop play " of " waving " instruct, and gesture motion " both hands are patted " correspondence " commencing play out " refers to
Order.Instruction is commenced play out when intelligent sound box determines that gesture motion that user makes is corresponding, then intelligent sound box can be according to starting to broadcast
Instruction is put to play out.The content of broadcasting can be music or news.Likewise, when intelligent sound box determines that user does
The corresponding end play instruction of gesture motion gone out, then intelligent sound box can be according to end play instruction stopping broadcasting sound-content.
User can be against the interference for stopping the sound-content before playing.
Optionally, the determine instruction unit includes:
Feature subelement is obtained, feature extraction is carried out to the gesture motion for feature, obtains gesture motion feature;
Coded sub-units for being encoded to the gesture motion feature, obtain coding result;
Determine instruction subelement, for determining the corresponding control instruction of the coding result.
In the present embodiment, gesture motion is characterized in the arrangement set of every two field picture contour feature.In order to obtain gesture motion
Feature is, it is necessary to calculate the characteristic value of each profile of every two field picture.Specifically, obtained gesture profile will be extracted, calculating should
The contour feature value of each profile in gesture profile.Region histogram of the contour feature value including each profile of each profile,
Square and earth displacement distance.
Then, the gesture motion feature of extraction is encoded using 8 reference direction vectors, calculation code result.8
A reference direction refers to 360 degree of eight directions divided equally.
It can carry out calculation code result using DTW algorithms.In DTW algorithms, being stored in each gesture of template library becomes
Sample form, a sample form be expressed as T (1), T (2) ..., T { m } ..., T { M } }.The input hand to be identified
Gesture is test template, be expressed as S (1), S (2) ... S (n) ..., S (N) }.By each frame number m=1-M of test template vertical
It is marked on axis, grid one by one can be formed by drawing ordinate by the coordinate of these expression frame numbers, each intersection in grid
Point (n, m) represents the joint of a certain frame and a certain frame in training mode in test template.
DTW algorithms, which can be attributed to, finds a path by several lattice points in this grid to describe this paths,
Assuming that path by all lattice points be followed successively by (n1,m1),...,(ni,mi),..,(nN=mN), wherein (n1,m1)=(1,1),
(nN,mN)=(N, M) paths can use function mi=f (ni), wherein ni=i, i=1,2 ..., N, f (1)=1, f (N)=M.
In order to which path is made to be unlikely to too to tilt, inclination can be constrained in the range of 0-2, if path passes through lattice point (ni,mi), then
Its previous node is only possible to be one of following three situation:(ni-1,mi),(ni-1,mi- 1) or (ni-1,mi-2).Path
Cumulative Distance D [(ni,mi)]=d [S (ni),T(mi)+D((ni-1,mi)], -1) wherein (ni-1,mi- 1) determined by following formula:
D[(ni-1,mi- 1)]=min { D [ni-1,mi],D[(ni-1,mi-1)],D[(ni-1,mi-2)]}。
Finally, the corresponding control instruction of coding result is determined.The coding result of acquisition is compared with pre-arranged code data, output
The corresponding control instruction of immediate pre-arranged code data.In order to reduce false detection rate, can also degree of being positioned proximate to threshold value, if
The coding result of acquisition is too low with the matching degree of pre-arranged code data, then does not export control instruction.
Optionally, intelligent sound box further includes:
Distance calculation module, for calculating the physical distance between the intelligent sound box and the human body;
Volume module is adjusted, for adjusting the volume of the intelligent sound box according to the physical distance.
In the present embodiment, can the distance between intelligent sound box and user directly be calculated by active depth camera,
Then volume is adjusted according to the distance of user and intelligent sound box.For example, during 5 meters from intelligent sound box of user, the volume heard is 50
Decibel at 10 meters from intelligent sound box, in order to which the volume for hearing user is equal to 50 decibels, need to improve the volume of intelligent sound box.By
In indoors, distance and volume can adjust the volume of intelligent sound box so that use into certain correspondence according to correspondence
The volume that family is heard in different location is the same.Preset value herein can be the volume that user hears at 5 meters or
The volume of one physical distance of factory default.
Optionally, the detection module 10 includes:
Gradient detection units carry out human testing for being based on gradient orientation histogram.
In the present embodiment, intelligent sound box can be based on gradient orientation histogram (Histogram of oriented
Gradient, HOG) carry out human testing.
Gradient orientation histogram is analogous to a kind of local description symbol of scale invariant feature conversion, it is local by calculating
Gradient orientation histogram on region forms characteristics of human body.Unlike scale invariant feature conversion, scale invariant feature
Conversion is the feature extraction based on key point, is a kind of sparse description method, and gradient orientation histogram is intensive description side
Method.
Gradient orientation histogram describes method and has the following advantages:What gradient orientation histogram represented is edge (gradient)
Structure feature, therefore local shape information can be described;The quantization in position and direction space can inhibit to a certain extent
The influence that translation and rotation are brought;The normalization in regional area, the influence that can be brought with partial offset illumination are taken simultaneously.Therefore
The embodiment of the present invention is preferably based on gradient orientation histogram and carries out human testing.
Optionally, the gradient detection units include:
First-order Gradient computation subunit, for carrying out First-order Gradient calculating to the image in detection window;
Cell gradient subelement, for calculating the gradient orientation histogram of unit lattice in described image;
Block gradient subelement for all cells in described image in each block to be normalized, obtains
Described piece of gradient orientation histogram;
Feature vector subelement is generated, for all pieces in described image to be normalized, obtains the inspection
The gradient orientation histogram of window is surveyed, and using the gradient orientation histogram of the detection window as characteristics of human body's vector.
In the present embodiment, First-order Gradient calculating is carried out to the image in detection window first, is specially:To standardize size
The detection window (Detection Window) of (such as 64x128) as input, by single order (one-dimensional) Sobel operators [- 1,0,
1] gradient on the image level and vertical direction in detection window is calculated.
Using single window as the benefit that grader inputs be grader has consistency to the position of target with scale.
For an input picture to be detected, it is necessary to along both horizontally and vertically moving detection window, while will be with more rulers
Degree zooms in and out image to detect the human body under different scale.
Then, the gradient orientation histogram of unit lattice in described image is calculated, is specially:Gradient orientation histogram is
Carry out what intensive calculations obtained in the grid for being referred to as cell (Cell) and block (Block).Divide the image into several units
Lattice, each cell is made of multiple pixels, and block is then made of several adjacent cells.
In this embodiment, the gradient of each pixel in image is first calculated, then counts in image institute in each cell
There is the gradient orientation histogram of the gradient orientation histogram of pixel, the i.e. cell.In the gradient direction of statistics unit lattice
During histogram, [0~π] is divided into multiple sections first against each cell, then according to each pixel in the cell
Gradient direction is weighted ballot paper account, obtains the gradient orientation histogram of all pixels in the cell.
When being weighted ballot paper account, the weight of each pixel is the gradient amplitude of the preferably pixel.In order to eliminate
Obscure, it is preferred to use three linear differences (Trilinear Interpolationi) are weighted ballot paper account.
Each cell in traversing graph picture, obtains the gradient orientation histogram of unit lattice in image.
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction
Fang Tu.In block, the gradient orientation histogram of the cell in the block is normalized, to eliminate the influence of illumination,
So as to obtain the gradient orientation histogram of the block.Each block in traversing graph picture, the gradient direction for obtaining each block in image are straight
Fang Tu.
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window,
And using the gradient orientation histogram of the detection window as characteristics of human body's vector.By the detection window obtained after each piece of normalization
Gradient orientation histogram, form characteristics of human body vector, so as to fulfill human testing.
Since gradient orientation histogram is a kind of intensive calculations mode, calculation amount is larger.In order to reduce calculation amount, carry
High detection speed, it may be considered that it selects to calculate gradient orientation histogram in the key area for having obvious human body contour outline, so as to
Achieve the purpose that reduce dimension.
The present invention provides a kind of intelligent sound box, intelligent sound box carries out human testing;When detecting human body, described in identification
The gesture motion of human body;The broadcast state of the intelligent sound box is adjusted according to the gesture motion.Intelligent sound provided by the invention
Case increases a kind of interactive mode using intelligent sound box so that user can control intelligent sound box by gesture, improve
User experience.
The invention also provides a kind of intelligent sound box, including memory, processor and at least one described deposit is stored in
In reservoir and the application program performed by the processor is configured as, the application program is configurable for performing above-mentioned
Control method for playing back.
In embodiments of the present invention, the processor included by the intelligent sound box also has following functions:
Carry out human testing;
When detecting human body, the gesture motion of the human body is identified;
The broadcast state of the intelligent sound box is adjusted according to the gesture motion.The foregoing is merely the embodiment of the present invention
, it is not intended to limit the invention, for those skilled in the art, the invention may be variously modified and varied.
Within the spirit and principles of the invention, any modifications, equivalent replacements and improvements are made should be included in the present invention's
Within right.
Claims (10)
1. a kind of control method for playing back, which is characterized in that comprise the following steps:
Intelligent sound box carries out human testing;
When detecting human body, the gesture motion of the human body is identified;
The broadcast state of the intelligent sound box is adjusted according to the gesture motion.
2. control method for playing back according to claim 1, which is characterized in that the gesture motion of the identification human body
Step includes:
The gesture of every frame images of gestures of the human body detected with background is separated, and finds out the hand in every frame images of gestures
Gesture profile;
The gesture profile is matched frame by frame with default beginning gesture profile, will match to first gesture profile determines
To start gesture profile;
Gesture profile of the sequential after the beginning gesture profile is matched frame by frame with default end gesture profile, general
First gesture profile being fitted on is determined as terminating gesture profile;
Will be using the beginning gesture profile as starting, the gesture profile that terminates is determined as what is recognized for the gesture path of ending
One group of gesture motion.
3. control method for playing back according to claim 1, which is characterized in that described according to being adjusted the gesture motion
The step of broadcast state of intelligent sound box, includes:
Determine the corresponding control instruction of the gesture motion;
The broadcast state of the intelligent sound box is adjusted according to the control instruction.
4. control method for playing back according to claim 3, which is characterized in that described to determine the corresponding control of the gesture motion
The step of system instruction, includes:
Feature extraction is carried out to the gesture motion, obtains gesture motion feature;
The gesture motion feature is encoded, obtains coding result;
Determine the corresponding control instruction of the coding result.
5. control method for playing back according to any one of claims 1 to 4, which is characterized in that the method further includes:
Calculate the physical distance between the intelligent sound box and the human body;
The volume of the intelligent sound box is adjusted according to the physical distance.
6. a kind of intelligent sound box, which is characterized in that including:
Detection module, for carrying out human testing;
Identification module, for when detecting human body, identifying the gesture motion of the human body;
Module is adjusted, for adjusting the broadcast state of the intelligent sound box according to the gesture motion.
7. intelligent sound box according to claim 6, which is characterized in that the identification module includes:
Separative element for the gesture of every frame images of gestures of the human body detected to be separated with background, and finds out every frame
Gesture profile in images of gestures;
Start gesture unit, for the gesture profile to be matched frame by frame with default beginning gesture profile, will match to
First gesture profile is determined as starting gesture profile;
Terminate gesture unit, for by sequential it is described beginning gesture profile after gesture profile frame by frame with default end gesture
Profile is matched, and will match to first gesture profile is determined as terminating gesture profile;
Gesture motion unit, for will be described to terminate gesture profile as the gesture that ends up using the beginning gesture profile for starting
Action is determined as recognize one group of gesture motion.
8. intelligent sound box according to claim 6, which is characterized in that the adjustment module includes:
Determine instruction unit, for determining the corresponding control instruction of the gesture motion;
Adjustment unit, for adjusting the broadcast state of the intelligent sound box according to the control instruction.
9. intelligent sound box according to claim 8, which is characterized in that the determine instruction unit includes:
Feature subelement is obtained, feature extraction is carried out to the gesture motion for feature, obtains gesture motion feature;
Coded sub-units for being encoded to the gesture motion feature, obtain coding result;
Determine instruction subelement, for determining the corresponding control instruction of the coding result.
10. according to claim 6 to 9 any one of them intelligent sound box, which is characterized in that further include:
Distance calculation module, for calculating the physical distance between the intelligent sound box and the human body;
Volume module is adjusted, for adjusting the volume of the intelligent sound box according to the physical distance.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810142948.9A CN108064006A (en) | 2018-02-11 | 2018-02-11 | Intelligent sound box and control method for playing back |
PCT/CN2018/077458 WO2019153382A1 (en) | 2018-02-11 | 2018-02-27 | Intelligent speaker and playing control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810142948.9A CN108064006A (en) | 2018-02-11 | 2018-02-11 | Intelligent sound box and control method for playing back |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108064006A true CN108064006A (en) | 2018-05-22 |
Family
ID=62134459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810142948.9A Pending CN108064006A (en) | 2018-02-11 | 2018-02-11 | Intelligent sound box and control method for playing back |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108064006A (en) |
WO (1) | WO2019153382A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111182381A (en) * | 2019-10-10 | 2020-05-19 | 广东小天才科技有限公司 | Camera control method of intelligent sound box, intelligent sound box and storage medium |
CN111242149A (en) * | 2018-11-28 | 2020-06-05 | 珠海格力电器股份有限公司 | Smart home control method and device, storage medium, processor and smart home |
CN112992796A (en) * | 2021-02-09 | 2021-06-18 | 深圳市众芯诺科技有限公司 | Intelligent visual sound box chip |
CN113311939A (en) * | 2021-04-01 | 2021-08-27 | 江苏理工学院 | Intelligent sound box control system based on gesture recognition |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113659950B (en) * | 2021-08-13 | 2024-03-22 | 深圳市百匠科技有限公司 | High-fidelity multipurpose sound control method, system, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763515A (en) * | 2009-09-23 | 2010-06-30 | 中国科学院自动化研究所 | Real-time gesture interaction method based on computer vision |
CN103092332A (en) * | 2011-11-08 | 2013-05-08 | 苏州中茵泰格科技有限公司 | Digital image interactive method and system of television |
CN103458288A (en) * | 2013-09-02 | 2013-12-18 | 湖南华凯创意展览服务有限公司 | Gesture sensing method, gesture sensing device and audio/video playing system |
CN103679154A (en) * | 2013-12-26 | 2014-03-26 | 中国科学院自动化研究所 | Three-dimensional gesture action recognition method based on depth images |
CN105744434A (en) * | 2016-02-25 | 2016-07-06 | 深圳市广懋创新科技有限公司 | Intelligent loudspeaker box control method and system based on gesture recognition |
CN106358120A (en) * | 2016-09-23 | 2017-01-25 | 成都创慧科达科技有限公司 | Audio play device with various regulation methods |
-
2018
- 2018-02-11 CN CN201810142948.9A patent/CN108064006A/en active Pending
- 2018-02-27 WO PCT/CN2018/077458 patent/WO2019153382A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763515A (en) * | 2009-09-23 | 2010-06-30 | 中国科学院自动化研究所 | Real-time gesture interaction method based on computer vision |
CN103092332A (en) * | 2011-11-08 | 2013-05-08 | 苏州中茵泰格科技有限公司 | Digital image interactive method and system of television |
CN103458288A (en) * | 2013-09-02 | 2013-12-18 | 湖南华凯创意展览服务有限公司 | Gesture sensing method, gesture sensing device and audio/video playing system |
CN103679154A (en) * | 2013-12-26 | 2014-03-26 | 中国科学院自动化研究所 | Three-dimensional gesture action recognition method based on depth images |
CN105744434A (en) * | 2016-02-25 | 2016-07-06 | 深圳市广懋创新科技有限公司 | Intelligent loudspeaker box control method and system based on gesture recognition |
CN106358120A (en) * | 2016-09-23 | 2017-01-25 | 成都创慧科达科技有限公司 | Audio play device with various regulation methods |
Non-Patent Citations (1)
Title |
---|
李凯: "一种改进的DTW动态手势识别方法", 《小型微型计算机系统》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242149A (en) * | 2018-11-28 | 2020-06-05 | 珠海格力电器股份有限公司 | Smart home control method and device, storage medium, processor and smart home |
CN111182381A (en) * | 2019-10-10 | 2020-05-19 | 广东小天才科技有限公司 | Camera control method of intelligent sound box, intelligent sound box and storage medium |
CN111182381B (en) * | 2019-10-10 | 2021-08-20 | 广东小天才科技有限公司 | Camera control method of intelligent sound box, intelligent sound box and storage medium |
CN112992796A (en) * | 2021-02-09 | 2021-06-18 | 深圳市众芯诺科技有限公司 | Intelligent visual sound box chip |
CN113311939A (en) * | 2021-04-01 | 2021-08-27 | 江苏理工学院 | Intelligent sound box control system based on gesture recognition |
Also Published As
Publication number | Publication date |
---|---|
WO2019153382A1 (en) | 2019-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108064006A (en) | Intelligent sound box and control method for playing back | |
US11450146B2 (en) | Gesture recognition method, apparatus, and device | |
US20230077355A1 (en) | Tracker assisted image capture | |
JP5366824B2 (en) | Method and system for converting 2D video to 3D video | |
CN108197618B (en) | Method and device for generating human face detection model | |
US11042991B2 (en) | Determining multiple camera positions from multiple videos | |
CN109858563B (en) | Self-supervision characterization learning method and device based on transformation recognition | |
KR20150110697A (en) | Systems and methods for tracking and detecting a target object | |
JP2007328746A (en) | Apparatus and method for tracking object | |
JP2011134114A (en) | Pattern recognition method and pattern recognition apparatus | |
WO2021031954A1 (en) | Object quantity determination method and apparatus, and storage medium and electronic device | |
CN107169417B (en) | RGBD image collaborative saliency detection method based on multi-core enhancement and saliency fusion | |
CN107169503B (en) | Indoor scene classification method and device | |
CN111914878A (en) | Feature point tracking training and tracking method and device, electronic equipment and storage medium | |
JP2017228224A (en) | Information processing device, information processing method, and program | |
CN103105924A (en) | Man-machine interaction method and device | |
JP2020071875A (en) | Deep learning model used for image recognition, and apparatus and method for training the model | |
CN109697385A (en) | A kind of method for tracking target and device | |
KR102102164B1 (en) | Method, apparatus and computer program for pre-processing video | |
CN112149557A (en) | Person identity tracking method and system based on face recognition | |
CN109711267A (en) | A kind of pedestrian identifies again, pedestrian movement's orbit generation method and device | |
KR20140068746A (en) | Method, system and computer-readable recording media for motion recognition | |
Cordea et al. | Real-time 2 (1/2)-D head pose recovery for model-based video-coding | |
CN116611491A (en) | Training method and device of target detection model, electronic equipment and storage medium | |
CN110197123A (en) | A kind of human posture recognition method based on Mask R-CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180522 |