CN108064006A - Intelligent sound box and control method for playing back - Google Patents

Intelligent sound box and control method for playing back Download PDF

Info

Publication number
CN108064006A
CN108064006A CN201810142948.9A CN201810142948A CN108064006A CN 108064006 A CN108064006 A CN 108064006A CN 201810142948 A CN201810142948 A CN 201810142948A CN 108064006 A CN108064006 A CN 108064006A
Authority
CN
China
Prior art keywords
gesture
sound box
intelligent sound
profile
gesture motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810142948.9A
Other languages
Chinese (zh)
Inventor
王声平
张立新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Water World Co Ltd
Original Assignee
Shenzhen Water World Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Water World Co Ltd filed Critical Shenzhen Water World Co Ltd
Priority to CN201810142948.9A priority Critical patent/CN108064006A/en
Priority to PCT/CN2018/077458 priority patent/WO2019153382A1/en
Publication of CN108064006A publication Critical patent/CN108064006A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/22Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention provides a kind of intelligent sound box and control method for playing back, control method for playing back includes:Intelligent sound box carries out human testing;When detecting human body, the gesture motion of the human body is identified;The broadcast state of the intelligent sound box is adjusted according to the gesture motion.Method provided by the invention increases a kind of interactive mode using intelligent sound box so that user can control intelligent sound box by gesture, improve user experience.

Description

Intelligent sound box and control method for playing back
Technical field
The present invention relates to intelligent sound box fields, especially relate to a kind of intelligent sound box and control method for playing back.
Background technology
Intelligent sound box is the product of a speaker upgrading, is the instrument that family consumer is surfed the Internet with voice, than Such as requesting songs, online shopping or understanding weather forecast, it can also control smart home device, for example open Curtain, set refrigerator temperature, allow in advance water heater heating etc..
Using Amazon Echo as the intelligent sound box of representative, intelligent sound technology is actually belonged to.Its operation is required for language Sound instructs to control.However, existing domestic environment background noise is larger, this noise can influence the correct knowledge of phonetic order Not, user experience is reduced.Therefore, it is necessary to use more modes, user is facilitated to be interacted with intelligent sound box, promote user's body It tests.
The content of the invention
For the main object of the present invention to provide a kind of intelligent sound box and control method for playing back, enhancing uses the use of intelligent sound box It experiences at family.
The present invention provides a kind of control method for playing back, comprise the following steps:
Intelligent sound box carries out human testing;
When detecting human body, the gesture motion of the human body is identified;
The broadcast state of the intelligent sound box is adjusted according to the gesture motion.
Preferably, the step of gesture motion of the identification human body includes:
The gesture of every frame images of gestures of the human body detected with background is separated, and is found out in every frame images of gestures Gesture profile;
The gesture profile is matched frame by frame with default beginning gesture profile, will match to first gesture profile It is determined as starting gesture profile;
Gesture profile of the sequential after the beginning gesture profile is matched frame by frame with default end gesture profile, The first gesture profile that will match to is determined as terminating gesture profile;
Will be using the beginning gesture profile as starting, the gesture profile that terminates is determined as identifying for the gesture motion of ending The one group of gesture motion arrived.
Preferably, it is described the intelligent sound box is adjusted according to the gesture motion broadcast state the step of include:
Determine the corresponding control instruction of the gesture motion;
The broadcast state of the intelligent sound box is adjusted according to the control instruction.
Preferably, described the step of determining the gesture motion corresponding control instruction, includes:
Feature extraction is carried out to the gesture motion, obtains gesture motion feature;
The gesture motion feature is encoded, obtains coding result;
Determine the corresponding control instruction of the coding result.
Preferably, the method further includes:
Calculate the physical distance between the intelligent sound box and the human body;
The volume of the intelligent sound box is adjusted according to the physical distance.
Preferably, the step of intelligent sound box progress human testing includes:
The intelligent sound box is based on gradient orientation histogram and carries out human testing.
Preferably, the step of intelligent sound box carries out human testing based on gradient orientation histogram includes:
First-order Gradient calculating is carried out to the image in detection window;
Calculate the gradient orientation histogram of unit lattice in described image;
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction Fang Tu;
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window, And using the gradient orientation histogram of the detection window as characteristics of human body's vector.
Another aspect of the present invention, it is also proposed that a kind of intelligent sound box, including:
Detection module, for carrying out human testing;
Identification module, for when detecting human body, identifying the gesture motion of the human body;
Module is adjusted, for adjusting the broadcast state of the intelligent sound box according to the gesture motion.
Preferably, the identification module includes:
Separative element for the gesture of every frame images of gestures of the human body detected to be separated with background, and is found out Per the gesture profile in frame images of gestures;
Start gesture unit, for the gesture profile to be matched frame by frame with default beginning gesture profile, will match To first gesture profile be determined as start gesture profile;
Terminate gesture unit, for by sequential it is described beginning gesture profile after gesture profile frame by frame with default end Gesture profile is matched, and will match to first gesture profile is determined as terminating gesture profile;
Gesture motion unit, for will be described to terminate gesture profile to end up using the beginning gesture profile for starting Gesture motion is determined as recognize one group of gesture motion.
Preferably, the adjustment module includes:
Determine instruction unit, for determining the corresponding control instruction of the gesture motion;
Adjustment unit, for adjusting the broadcast state of the intelligent sound box according to the control instruction.
Preferably, the determine instruction unit includes:
Feature subelement is obtained, feature extraction is carried out to the gesture motion for feature, obtains gesture motion feature;
Coded sub-units for being encoded to the gesture motion feature, obtain coding result;
Determine instruction subelement, for determining the corresponding control instruction of the coding result.
Preferably, further include:
Distance calculation module, for calculating the physical distance between the intelligent sound box and the human body;
Volume module is adjusted, for adjusting the volume of the intelligent sound box according to the physical distance.
Preferably, the detection module includes:
Gradient detection units carry out human testing for being based on gradient orientation histogram.
Preferably, the gradient detection units include:
First-order Gradient computation subunit, for carrying out First-order Gradient calculating to the image in detection window;
Cell gradient subelement, for calculating the gradient orientation histogram of unit lattice in described image;
Block gradient subelement for all cells in described image in each block to be normalized, obtains Described piece of gradient orientation histogram;
Feature vector subelement is generated, for all pieces in described image to be normalized, obtains the inspection The gradient orientation histogram of window is surveyed, and using the gradient orientation histogram of the detection window as characteristics of human body's vector.
The invention also provides a kind of intelligent sound box, including memory, processor and at least one described deposit is stored in In reservoir and the application program performed by the processor is configured as, the application program is configurable for performing above-mentioned Control method for playing back.
The present invention provides intelligent sound box and control method for playing back, control method for playing back therein includes:Intelligent sound box into Row human testing;When detecting human body, the gesture motion of the human body is identified;The intelligence is adjusted according to the gesture motion The broadcast state of speaker.Method provided by the invention increases a kind of interactive mode using intelligent sound box so that user can lead to It crosses gesture to control intelligent sound box, improves user experience.
Description of the drawings
Fig. 1 is the flow diagram of one embodiment of control method for playing back of the present invention;
Fig. 2 is the structure diagram of one embodiment of intelligent sound box of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
With reference to Fig. 1, the embodiment of the present invention proposes a kind of control method for playing back, comprises the following steps:
S10, intelligent sound box carry out human testing;
S20, when detecting human body, identify the gesture motion of the human body;
S30, the broadcast state that the intelligent sound box is adjusted according to the gesture motion.
In the present embodiment, depth transducer is installed on intelligent sound box.Depth transducer is divided into two classes:Passive type cubic phase Machine and active depth camera.Passive type stereoscopic camera observes scene using two or more cameras, and uses these Difference (displacement) in multiple views of camera between feature estimates the depth of scene.Active depth camera is to scene simulation Sightless infrared light, and according to the information reflected, estimate the depth of scene.In an application scenarios, user's first station exists With intelligent sound box certain position, some gesture instructions are made to the depth transducer of intelligent sound box, such as open play instruction, intelligence After speaker identifies the meaning of user's first gesture instruction, sound is played.
In step S10, intelligent sound box carries out human testing by depth transducer.It can be based on gradient orientation histogram (Histogram of oriented gradient, HOG), scale invariant feature convert (Scale-invariant feature Transform, SIFT), local binary patterns (Local Binary Pattern, LBP), the characteristics of image such as HARR carry out human body Detection.
In step S20, when intelligent sound box detects human body, the gesture motion of the human body is identified.Particular by depth Sensor obtains one group of video data for including gesture.Depth transducer plays the role of making video recording here.It can be by default rule Then obtain video data.Such as, it is when depth transducer monitors that user has larger gesture motion, this section of video data is true It is set to the video data for including gesture.
It is the continuous image of multiframe by above-mentioned Digital video resolution, the background in image is separated with gesture, and finds out every Gesture profile in two field picture.The start frame and end frame of gesture motion are determined by default rule.By start frame and end frame Between gesture profile be determined as gesture motion.That is, gesture motion includes the gesture profile of multiple image.
In step S30, after obtaining gesture motion, feature extraction is carried out to gesture motion, obtains gesture motion feature, it is right Gesture motion feature is identified, and obtains recognition result, finally generates control instruction according to recognition result.
Intelligent sound box adjusts broadcast state according to control instruction.Control instruction such as acquisition is to commence play out instruction, then intelligence Energy speaker commences play out sound;If the control instruction obtained is stops play instruction, intelligent sound box stops playing sound.
Optionally, step S20 includes:
The gesture of every frame images of gestures of the human body detected with background is separated, and is found out in every frame images of gestures Gesture profile;
The gesture profile is matched frame by frame with default beginning gesture profile, will match to first gesture profile It is determined as starting gesture profile;
Gesture profile of the sequential after the beginning gesture profile is matched frame by frame with default end gesture profile, The first gesture profile that will match to is determined as terminating gesture profile;
Will be using the beginning gesture profile as starting, the gesture profile that terminates is determined as identifying for the gesture motion of ending The one group of gesture motion arrived.
In the present embodiment, intelligent sound box is stored with the different corresponding default beginning gesture profiles of control instruction and default end Gesture profile.Each gesture profile of video data is first matched with default beginning gesture profile frame by frame, by matched first Frame gesture profile is determined as starting gesture profile.Gesture profile after first frame, frame by frame with it is default end gesture profile into Matched first frame gesture profile is determined as terminating gesture profile by row matching.Then, will using it is described beginning gesture profile for Begin, the gesture profile that terminates is determined as gesture motion for the gesture profile sequence of ending.The gesture motion of acquisition can be used to know The meaning that other gesture includes, and then generate corresponding control instruction.
Optionally, step S30 includes:
Determine the corresponding control instruction of the gesture motion;
The broadcast state of the intelligent sound box is adjusted according to the control instruction.
In the present embodiment, storage chip has prestored the corresponding control instruction of multigroup different gesture motion on intelligent sound box.Example As it can be stated that gesture motion " up waves " correspondence " promotion volume " instruction, gesture motion " waves " correspondence down " reduces sound Amount " instruction, gesture motion correspondence " stop play " of " waving " instruct, and gesture motion " both hands are patted " correspondence " commencing play out " refers to Order.Instruction is commenced play out when intelligent sound box determines that gesture motion that user makes is corresponding, then intelligent sound box can be according to starting to broadcast Instruction is put to play out.The content of broadcasting can be music or news.Likewise, when intelligent sound box determines that user does The corresponding end play instruction of gesture motion gone out, then intelligent sound box can be according to end play instruction stopping broadcasting sound-content. User can be against the interference for stopping the sound-content before playing.
Optionally, described the step of determining the gesture motion corresponding control instruction, includes:
Feature extraction is carried out to the gesture motion, obtains gesture motion feature;
The gesture motion feature is encoded, obtains coding result;
Determine the corresponding control instruction of the coding result.
In the present embodiment, gesture motion is characterized in the arrangement set of every two field picture contour feature.In order to obtain gesture motion Feature is, it is necessary to calculate the characteristic value of each profile of every two field picture.Specifically, obtained gesture profile will be extracted, calculating should The contour feature value of each profile in gesture profile.Region histogram of the contour feature value including each profile of each profile, Square and earth displacement distance.
Then, the gesture motion feature of extraction is encoded using 8 reference direction vectors, calculation code result.8 A reference direction refers to 360 degree of eight directions divided equally.
It can carry out calculation code result using DTW algorithms.In DTW algorithms, being stored in each gesture of template library becomes Sample form, a sample form be expressed as T (1), T (2) ..., T { m } ..., T { M } }.The input hand to be identified Gesture is test template, be expressed as S (1), S (2) ... S (n) ..., S (N) }.By each frame number m=1-M of test template vertical It is marked on axis, grid one by one can be formed by drawing ordinate by the coordinate of these expression frame numbers, each intersection in grid Point (n, m) represents the joint of a certain frame and a certain frame in training mode in test template.
DTW algorithms, which can be attributed to, finds a path by several lattice points in this grid to describe this paths, Assuming that path by all lattice points be followed successively by (n1,m1),...,(ni,mi),..,(nN=mN), wherein (n1,m1)=(1,1), (nN,mN)=(N, M) paths can use function mi=f (ni), wherein ni=i, i=1,2 ..., N, f (1)=1, f (N)=M. In order to which path is made to be unlikely to too to tilt, inclination can be constrained in the range of 0-2, if path passes through lattice point (ni,mi), then Its previous node is only possible to be one of following three situation:(ni-1,mi),(ni-1,mi- 1) or (ni-1,mi-2).Path Cumulative Distance D [(ni,mi)]=d [S (ni),T(mi)+D((ni-1,mi)], -1) wherein (ni-1,mi- 1) determined by following formula:
D[(ni-1,mi- 1)]=min { D [ni-1,mi],D[(ni-1,mi-1)],D[(ni-1,mi-2)]}。
Finally, the corresponding control instruction of coding result is determined.The coding result of acquisition is compared with pre-arranged code data, output The corresponding control instruction of immediate pre-arranged code data.In order to reduce false detection rate, can also degree of being positioned proximate to threshold value, if The coding result of acquisition is too low with the matching degree of pre-arranged code data, then does not export control instruction.
Optionally, control method for playing back further includes:
Calculate the physical distance between the intelligent sound box and the human body;
The volume of the intelligent sound box is adjusted according to the physical distance.
In the present embodiment, can the distance between intelligent sound box and user directly be calculated by active depth camera, Then volume is adjusted according to the distance of user and intelligent sound box, so that the volume after adjusting reaches preset value.For example, user is from intelligence Can 5 meters of speaker when, when the volume heard is 50 decibels, 10 meters from intelligent sound box, in order to which the volume for hearing user is equal to 50 points Shellfish need to improve the volume of intelligent sound box.Since indoors, distance, into certain correspondence, can be closed with volume according to corresponding System adjusts the volume of intelligent sound box so that the volume that user hears in different location is the same.Preset value herein can be 5 The volume or the volume of a physical distance of factory default that user hears at rice.
Optionally, step S10 includes:
The intelligent sound box is based on gradient orientation histogram and carries out human testing.
In the present embodiment, intelligent sound box can be based on gradient orientation histogram (Histogram of oriented Gradient, HOG) carry out human testing.
Gradient orientation histogram is analogous to a kind of local description symbol of scale invariant feature conversion, it is local by calculating Gradient orientation histogram on region forms characteristics of human body.Unlike scale invariant feature conversion, scale invariant feature Conversion is the feature extraction based on key point, is a kind of sparse description method, and gradient orientation histogram is intensive description side Method.
Gradient orientation histogram describes method and has the following advantages:What gradient orientation histogram represented is edge (gradient) Structure feature, therefore local shape information can be described;The quantization in position and direction space can inhibit to a certain extent The influence that translation and rotation are brought;The normalization in regional area, the influence that can be brought with partial offset illumination are taken simultaneously.Therefore The embodiment of the present invention is preferably based on gradient orientation histogram and carries out human testing.
Optionally, the step of intelligent sound box carries out human testing based on gradient orientation histogram includes:
First-order Gradient calculating is carried out to the image in detection window;
Calculate the gradient orientation histogram of unit lattice in described image;
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction Fang Tu;
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window, And using the gradient orientation histogram of the detection window as characteristics of human body's vector.
In the present embodiment, First-order Gradient calculating is carried out to the image in detection window first, is specially:To standardize size The detection window (Detection Window) of (such as 64x128) as input, by single order (one-dimensional) Sobel operators [- 1,0, 1] gradient on the image level and vertical direction in detection window is calculated.
Using single window as the benefit that grader inputs be grader has consistency to the position of target with scale. For an input picture to be detected, it is necessary to along both horizontally and vertically moving detection window, while will be with more rulers Degree zooms in and out image to detect the human body under different scale.
Then, the gradient orientation histogram of unit lattice in described image is calculated, is specially:Gradient orientation histogram is Carry out what intensive calculations obtained in the grid for being referred to as cell (Cell) and block (Block).Divide the image into several units Lattice, each cell is made of multiple pixels, and block is then made of several adjacent cells.
In this embodiment, the gradient of each pixel in image is first calculated, then counts in image institute in each cell There is the gradient orientation histogram of the gradient orientation histogram of pixel, the i.e. cell.In the gradient direction of statistics unit lattice During histogram, [0~π] is divided into multiple sections first against each cell, then according to each pixel in the cell Gradient direction is weighted ballot paper account, obtains the gradient orientation histogram of all pixels in the cell.
When being weighted ballot paper account, the weight of each pixel is the gradient amplitude of the preferably pixel.In order to eliminate Obscure, it is preferred to use three linear differences (Trilinear Interpolationi) are weighted ballot paper account.
Each cell in traversing graph picture, obtains the gradient orientation histogram of unit lattice in image.
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction Fang Tu.In block, the gradient orientation histogram of the cell in the block is normalized, to eliminate the influence of illumination, So as to obtain the gradient orientation histogram of the block.Each block in traversing graph picture, the gradient direction for obtaining each block in image are straight Fang Tu.
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window, And using the gradient orientation histogram of the detection window as characteristics of human body's vector.By the detection window obtained after each piece of normalization Gradient orientation histogram, form characteristics of human body vector, so as to fulfill human testing.
Since gradient orientation histogram is a kind of intensive calculations mode, calculation amount is larger.In order to reduce calculation amount, carry High detection speed, it may be considered that it selects to calculate gradient orientation histogram in the key area for having obvious human body contour outline, so as to Achieve the purpose that reduce dimension.
The present invention provides a kind of control method for playing back, including:Intelligent sound box carries out human testing;When detecting human body When, identify the gesture motion of the human body;The broadcast state of the intelligent sound box is adjusted according to the gesture motion.The present invention carries For method increase a kind of interactive mode using intelligent sound box so that user can be controlled intelligent sound box by gesture System, improves user experience.
With reference to Fig. 2, the embodiment of the present invention also proposed a kind of intelligent sound box, including:
Detection module 10, for carrying out human testing;
Identification module 20, for when detecting human body, identifying the gesture motion of the human body;
Module 30 is adjusted, for adjusting the broadcast state of the intelligent sound box according to the gesture motion.
In the present embodiment, depth transducer is installed on intelligent sound box.Depth transducer is divided into two classes:Passive type cubic phase Machine and active depth camera.Passive type stereoscopic camera observes scene using two or more cameras, and uses these Difference (displacement) in multiple views of camera between feature estimates the depth of scene.Active depth camera is to scene simulation Sightless infrared light, and according to the information reflected, estimate the depth of scene.In an application scenarios, user's first station exists With intelligent sound box certain position, some gesture instructions are made to the depth transducer of intelligent sound box, such as open play instruction, intelligence After speaker identifies the meaning of user's first gesture instruction, sound is played.
In detection module 10, intelligent sound box carries out human testing by depth transducer.It can be based on gradient direction Nogata Scheme (Histogram of oriented gradient, HOG), scale invariant feature conversion (Scale-invariant Feature transform, SIFT), local binary patterns (Local Binary Pattern, LBP), the characteristics of image such as HARR Carry out human testing.
In identification module 20, when intelligent sound box detects human body, the gesture motion of the human body is identified.Particular by depth It spends sensor and obtains one group of video data for including gesture.Depth transducer plays the role of making video recording here.It can be by default Rule obtains video data.Such as, when depth transducer monitors that user has larger gesture motion, by this section of video data It is determined as including the video data of gesture.
It is the continuous image of multiframe by above-mentioned Digital video resolution, the background in image is separated with gesture, and finds out every Gesture profile in two field picture.The start frame and end frame of gesture motion are determined by default rule.By start frame and end frame Between gesture profile be determined as gesture motion.That is, gesture motion includes the gesture profile of multiple image.
It adjusts in module 30, after obtaining gesture motion, feature extraction is carried out to gesture motion, it is special to obtain gesture motion Sign, is identified gesture motion characteristic, obtains recognition result, finally generates control instruction according to recognition result.
Intelligent sound box adjusts broadcast state according to control instruction.Control instruction such as acquisition is to commence play out instruction, then intelligence Energy speaker commences play out sound;If the control instruction obtained is stops play instruction, intelligent sound box stops playing sound.
Optionally, identification module 20 includes:
Separative element for the gesture of every frame images of gestures of the human body detected to be separated with background, and is found out Per the gesture profile in frame images of gestures;
Start gesture unit, for the gesture profile to be matched frame by frame with default beginning gesture profile, will match To first gesture profile be determined as start gesture profile;
Terminate gesture unit, for by sequential it is described beginning gesture profile after gesture profile frame by frame with default end Gesture profile is matched, and will match to first gesture profile is determined as terminating gesture profile;
Gesture motion unit, for will be described to terminate gesture profile to end up using the beginning gesture profile for starting Gesture motion is determined as recognize one group of gesture motion.
In the present embodiment, intelligent sound box is stored with the different corresponding default beginning gesture profiles of control instruction and default end Gesture profile.Each gesture profile of video data is first matched with default beginning gesture profile frame by frame, by matched first Frame gesture profile is determined as starting gesture profile.Gesture profile after first frame, frame by frame with it is default end gesture profile into Matched first frame gesture profile is determined as terminating gesture profile by row matching.Then, will using it is described beginning gesture profile for Begin, the gesture profile that terminates is determined as gesture motion for the gesture profile sequence of ending.The gesture motion of acquisition can be used to know The meaning that other gesture includes, and then generate corresponding control instruction.
Optionally, adjustment module 30 includes:
Determine instruction unit, for determining the corresponding control instruction of the gesture motion;
Adjustment unit, for adjusting the broadcast state of the intelligent sound box according to the control instruction.
In the present embodiment, storage chip has prestored the corresponding control instruction of multigroup different gesture motion on intelligent sound box.Example As it can be stated that gesture motion " up waves " correspondence " promotion volume " instruction, gesture motion " waves " correspondence down " reduces sound Amount " instruction, gesture motion correspondence " stop play " of " waving " instruct, and gesture motion " both hands are patted " correspondence " commencing play out " refers to Order.Instruction is commenced play out when intelligent sound box determines that gesture motion that user makes is corresponding, then intelligent sound box can be according to starting to broadcast Instruction is put to play out.The content of broadcasting can be music or news.Likewise, when intelligent sound box determines that user does The corresponding end play instruction of gesture motion gone out, then intelligent sound box can be according to end play instruction stopping broadcasting sound-content. User can be against the interference for stopping the sound-content before playing.
Optionally, the determine instruction unit includes:
Feature subelement is obtained, feature extraction is carried out to the gesture motion for feature, obtains gesture motion feature;
Coded sub-units for being encoded to the gesture motion feature, obtain coding result;
Determine instruction subelement, for determining the corresponding control instruction of the coding result.
In the present embodiment, gesture motion is characterized in the arrangement set of every two field picture contour feature.In order to obtain gesture motion Feature is, it is necessary to calculate the characteristic value of each profile of every two field picture.Specifically, obtained gesture profile will be extracted, calculating should The contour feature value of each profile in gesture profile.Region histogram of the contour feature value including each profile of each profile, Square and earth displacement distance.
Then, the gesture motion feature of extraction is encoded using 8 reference direction vectors, calculation code result.8 A reference direction refers to 360 degree of eight directions divided equally.
It can carry out calculation code result using DTW algorithms.In DTW algorithms, being stored in each gesture of template library becomes Sample form, a sample form be expressed as T (1), T (2) ..., T { m } ..., T { M } }.The input hand to be identified Gesture is test template, be expressed as S (1), S (2) ... S (n) ..., S (N) }.By each frame number m=1-M of test template vertical It is marked on axis, grid one by one can be formed by drawing ordinate by the coordinate of these expression frame numbers, each intersection in grid Point (n, m) represents the joint of a certain frame and a certain frame in training mode in test template.
DTW algorithms, which can be attributed to, finds a path by several lattice points in this grid to describe this paths, Assuming that path by all lattice points be followed successively by (n1,m1),...,(ni,mi),..,(nN=mN), wherein (n1,m1)=(1,1), (nN,mN)=(N, M) paths can use function mi=f (ni), wherein ni=i, i=1,2 ..., N, f (1)=1, f (N)=M. In order to which path is made to be unlikely to too to tilt, inclination can be constrained in the range of 0-2, if path passes through lattice point (ni,mi), then Its previous node is only possible to be one of following three situation:(ni-1,mi),(ni-1,mi- 1) or (ni-1,mi-2).Path Cumulative Distance D [(ni,mi)]=d [S (ni),T(mi)+D((ni-1,mi)], -1) wherein (ni-1,mi- 1) determined by following formula:
D[(ni-1,mi- 1)]=min { D [ni-1,mi],D[(ni-1,mi-1)],D[(ni-1,mi-2)]}。
Finally, the corresponding control instruction of coding result is determined.The coding result of acquisition is compared with pre-arranged code data, output The corresponding control instruction of immediate pre-arranged code data.In order to reduce false detection rate, can also degree of being positioned proximate to threshold value, if The coding result of acquisition is too low with the matching degree of pre-arranged code data, then does not export control instruction.
Optionally, intelligent sound box further includes:
Distance calculation module, for calculating the physical distance between the intelligent sound box and the human body;
Volume module is adjusted, for adjusting the volume of the intelligent sound box according to the physical distance.
In the present embodiment, can the distance between intelligent sound box and user directly be calculated by active depth camera, Then volume is adjusted according to the distance of user and intelligent sound box.For example, during 5 meters from intelligent sound box of user, the volume heard is 50 Decibel at 10 meters from intelligent sound box, in order to which the volume for hearing user is equal to 50 decibels, need to improve the volume of intelligent sound box.By In indoors, distance and volume can adjust the volume of intelligent sound box so that use into certain correspondence according to correspondence The volume that family is heard in different location is the same.Preset value herein can be the volume that user hears at 5 meters or The volume of one physical distance of factory default.
Optionally, the detection module 10 includes:
Gradient detection units carry out human testing for being based on gradient orientation histogram.
In the present embodiment, intelligent sound box can be based on gradient orientation histogram (Histogram of oriented Gradient, HOG) carry out human testing.
Gradient orientation histogram is analogous to a kind of local description symbol of scale invariant feature conversion, it is local by calculating Gradient orientation histogram on region forms characteristics of human body.Unlike scale invariant feature conversion, scale invariant feature Conversion is the feature extraction based on key point, is a kind of sparse description method, and gradient orientation histogram is intensive description side Method.
Gradient orientation histogram describes method and has the following advantages:What gradient orientation histogram represented is edge (gradient) Structure feature, therefore local shape information can be described;The quantization in position and direction space can inhibit to a certain extent The influence that translation and rotation are brought;The normalization in regional area, the influence that can be brought with partial offset illumination are taken simultaneously.Therefore The embodiment of the present invention is preferably based on gradient orientation histogram and carries out human testing.
Optionally, the gradient detection units include:
First-order Gradient computation subunit, for carrying out First-order Gradient calculating to the image in detection window;
Cell gradient subelement, for calculating the gradient orientation histogram of unit lattice in described image;
Block gradient subelement for all cells in described image in each block to be normalized, obtains Described piece of gradient orientation histogram;
Feature vector subelement is generated, for all pieces in described image to be normalized, obtains the inspection The gradient orientation histogram of window is surveyed, and using the gradient orientation histogram of the detection window as characteristics of human body's vector.
In the present embodiment, First-order Gradient calculating is carried out to the image in detection window first, is specially:To standardize size The detection window (Detection Window) of (such as 64x128) as input, by single order (one-dimensional) Sobel operators [- 1,0, 1] gradient on the image level and vertical direction in detection window is calculated.
Using single window as the benefit that grader inputs be grader has consistency to the position of target with scale. For an input picture to be detected, it is necessary to along both horizontally and vertically moving detection window, while will be with more rulers Degree zooms in and out image to detect the human body under different scale.
Then, the gradient orientation histogram of unit lattice in described image is calculated, is specially:Gradient orientation histogram is Carry out what intensive calculations obtained in the grid for being referred to as cell (Cell) and block (Block).Divide the image into several units Lattice, each cell is made of multiple pixels, and block is then made of several adjacent cells.
In this embodiment, the gradient of each pixel in image is first calculated, then counts in image institute in each cell There is the gradient orientation histogram of the gradient orientation histogram of pixel, the i.e. cell.In the gradient direction of statistics unit lattice During histogram, [0~π] is divided into multiple sections first against each cell, then according to each pixel in the cell Gradient direction is weighted ballot paper account, obtains the gradient orientation histogram of all pixels in the cell.
When being weighted ballot paper account, the weight of each pixel is the gradient amplitude of the preferably pixel.In order to eliminate Obscure, it is preferred to use three linear differences (Trilinear Interpolationi) are weighted ballot paper account.
Each cell in traversing graph picture, obtains the gradient orientation histogram of unit lattice in image.
All cells in described image in each block are normalized, it is straight to obtain described piece of gradient direction Fang Tu.In block, the gradient orientation histogram of the cell in the block is normalized, to eliminate the influence of illumination, So as to obtain the gradient orientation histogram of the block.Each block in traversing graph picture, the gradient direction for obtaining each block in image are straight Fang Tu.
All pieces in described image are normalized, obtain the gradient orientation histogram of the detection window, And using the gradient orientation histogram of the detection window as characteristics of human body's vector.By the detection window obtained after each piece of normalization Gradient orientation histogram, form characteristics of human body vector, so as to fulfill human testing.
Since gradient orientation histogram is a kind of intensive calculations mode, calculation amount is larger.In order to reduce calculation amount, carry High detection speed, it may be considered that it selects to calculate gradient orientation histogram in the key area for having obvious human body contour outline, so as to Achieve the purpose that reduce dimension.
The present invention provides a kind of intelligent sound box, intelligent sound box carries out human testing;When detecting human body, described in identification The gesture motion of human body;The broadcast state of the intelligent sound box is adjusted according to the gesture motion.Intelligent sound provided by the invention Case increases a kind of interactive mode using intelligent sound box so that user can control intelligent sound box by gesture, improve User experience.
The invention also provides a kind of intelligent sound box, including memory, processor and at least one described deposit is stored in In reservoir and the application program performed by the processor is configured as, the application program is configurable for performing above-mentioned Control method for playing back.
In embodiments of the present invention, the processor included by the intelligent sound box also has following functions:
Carry out human testing;
When detecting human body, the gesture motion of the human body is identified;
The broadcast state of the intelligent sound box is adjusted according to the gesture motion.The foregoing is merely the embodiment of the present invention , it is not intended to limit the invention, for those skilled in the art, the invention may be variously modified and varied. Within the spirit and principles of the invention, any modifications, equivalent replacements and improvements are made should be included in the present invention's Within right.

Claims (10)

1. a kind of control method for playing back, which is characterized in that comprise the following steps:
Intelligent sound box carries out human testing;
When detecting human body, the gesture motion of the human body is identified;
The broadcast state of the intelligent sound box is adjusted according to the gesture motion.
2. control method for playing back according to claim 1, which is characterized in that the gesture motion of the identification human body Step includes:
The gesture of every frame images of gestures of the human body detected with background is separated, and finds out the hand in every frame images of gestures Gesture profile;
The gesture profile is matched frame by frame with default beginning gesture profile, will match to first gesture profile determines To start gesture profile;
Gesture profile of the sequential after the beginning gesture profile is matched frame by frame with default end gesture profile, general First gesture profile being fitted on is determined as terminating gesture profile;
Will be using the beginning gesture profile as starting, the gesture profile that terminates is determined as what is recognized for the gesture path of ending One group of gesture motion.
3. control method for playing back according to claim 1, which is characterized in that described according to being adjusted the gesture motion The step of broadcast state of intelligent sound box, includes:
Determine the corresponding control instruction of the gesture motion;
The broadcast state of the intelligent sound box is adjusted according to the control instruction.
4. control method for playing back according to claim 3, which is characterized in that described to determine the corresponding control of the gesture motion The step of system instruction, includes:
Feature extraction is carried out to the gesture motion, obtains gesture motion feature;
The gesture motion feature is encoded, obtains coding result;
Determine the corresponding control instruction of the coding result.
5. control method for playing back according to any one of claims 1 to 4, which is characterized in that the method further includes:
Calculate the physical distance between the intelligent sound box and the human body;
The volume of the intelligent sound box is adjusted according to the physical distance.
6. a kind of intelligent sound box, which is characterized in that including:
Detection module, for carrying out human testing;
Identification module, for when detecting human body, identifying the gesture motion of the human body;
Module is adjusted, for adjusting the broadcast state of the intelligent sound box according to the gesture motion.
7. intelligent sound box according to claim 6, which is characterized in that the identification module includes:
Separative element for the gesture of every frame images of gestures of the human body detected to be separated with background, and finds out every frame Gesture profile in images of gestures;
Start gesture unit, for the gesture profile to be matched frame by frame with default beginning gesture profile, will match to First gesture profile is determined as starting gesture profile;
Terminate gesture unit, for by sequential it is described beginning gesture profile after gesture profile frame by frame with default end gesture Profile is matched, and will match to first gesture profile is determined as terminating gesture profile;
Gesture motion unit, for will be described to terminate gesture profile as the gesture that ends up using the beginning gesture profile for starting Action is determined as recognize one group of gesture motion.
8. intelligent sound box according to claim 6, which is characterized in that the adjustment module includes:
Determine instruction unit, for determining the corresponding control instruction of the gesture motion;
Adjustment unit, for adjusting the broadcast state of the intelligent sound box according to the control instruction.
9. intelligent sound box according to claim 8, which is characterized in that the determine instruction unit includes:
Feature subelement is obtained, feature extraction is carried out to the gesture motion for feature, obtains gesture motion feature;
Coded sub-units for being encoded to the gesture motion feature, obtain coding result;
Determine instruction subelement, for determining the corresponding control instruction of the coding result.
10. according to claim 6 to 9 any one of them intelligent sound box, which is characterized in that further include:
Distance calculation module, for calculating the physical distance between the intelligent sound box and the human body;
Volume module is adjusted, for adjusting the volume of the intelligent sound box according to the physical distance.
CN201810142948.9A 2018-02-11 2018-02-11 Intelligent sound box and control method for playing back Pending CN108064006A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810142948.9A CN108064006A (en) 2018-02-11 2018-02-11 Intelligent sound box and control method for playing back
PCT/CN2018/077458 WO2019153382A1 (en) 2018-02-11 2018-02-27 Intelligent speaker and playing control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810142948.9A CN108064006A (en) 2018-02-11 2018-02-11 Intelligent sound box and control method for playing back

Publications (1)

Publication Number Publication Date
CN108064006A true CN108064006A (en) 2018-05-22

Family

ID=62134459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810142948.9A Pending CN108064006A (en) 2018-02-11 2018-02-11 Intelligent sound box and control method for playing back

Country Status (2)

Country Link
CN (1) CN108064006A (en)
WO (1) WO2019153382A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111182381A (en) * 2019-10-10 2020-05-19 广东小天才科技有限公司 Camera control method of intelligent sound box, intelligent sound box and storage medium
CN111242149A (en) * 2018-11-28 2020-06-05 珠海格力电器股份有限公司 Smart home control method and device, storage medium, processor and smart home
CN112992796A (en) * 2021-02-09 2021-06-18 深圳市众芯诺科技有限公司 Intelligent visual sound box chip
CN113311939A (en) * 2021-04-01 2021-08-27 江苏理工学院 Intelligent sound box control system based on gesture recognition

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113659950B (en) * 2021-08-13 2024-03-22 深圳市百匠科技有限公司 High-fidelity multipurpose sound control method, system, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763515A (en) * 2009-09-23 2010-06-30 中国科学院自动化研究所 Real-time gesture interaction method based on computer vision
CN103092332A (en) * 2011-11-08 2013-05-08 苏州中茵泰格科技有限公司 Digital image interactive method and system of television
CN103458288A (en) * 2013-09-02 2013-12-18 湖南华凯创意展览服务有限公司 Gesture sensing method, gesture sensing device and audio/video playing system
CN103679154A (en) * 2013-12-26 2014-03-26 中国科学院自动化研究所 Three-dimensional gesture action recognition method based on depth images
CN105744434A (en) * 2016-02-25 2016-07-06 深圳市广懋创新科技有限公司 Intelligent loudspeaker box control method and system based on gesture recognition
CN106358120A (en) * 2016-09-23 2017-01-25 成都创慧科达科技有限公司 Audio play device with various regulation methods

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763515A (en) * 2009-09-23 2010-06-30 中国科学院自动化研究所 Real-time gesture interaction method based on computer vision
CN103092332A (en) * 2011-11-08 2013-05-08 苏州中茵泰格科技有限公司 Digital image interactive method and system of television
CN103458288A (en) * 2013-09-02 2013-12-18 湖南华凯创意展览服务有限公司 Gesture sensing method, gesture sensing device and audio/video playing system
CN103679154A (en) * 2013-12-26 2014-03-26 中国科学院自动化研究所 Three-dimensional gesture action recognition method based on depth images
CN105744434A (en) * 2016-02-25 2016-07-06 深圳市广懋创新科技有限公司 Intelligent loudspeaker box control method and system based on gesture recognition
CN106358120A (en) * 2016-09-23 2017-01-25 成都创慧科达科技有限公司 Audio play device with various regulation methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李凯: "一种改进的DTW动态手势识别方法", 《小型微型计算机系统》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242149A (en) * 2018-11-28 2020-06-05 珠海格力电器股份有限公司 Smart home control method and device, storage medium, processor and smart home
CN111182381A (en) * 2019-10-10 2020-05-19 广东小天才科技有限公司 Camera control method of intelligent sound box, intelligent sound box and storage medium
CN111182381B (en) * 2019-10-10 2021-08-20 广东小天才科技有限公司 Camera control method of intelligent sound box, intelligent sound box and storage medium
CN112992796A (en) * 2021-02-09 2021-06-18 深圳市众芯诺科技有限公司 Intelligent visual sound box chip
CN113311939A (en) * 2021-04-01 2021-08-27 江苏理工学院 Intelligent sound box control system based on gesture recognition

Also Published As

Publication number Publication date
WO2019153382A1 (en) 2019-08-15

Similar Documents

Publication Publication Date Title
CN108064006A (en) Intelligent sound box and control method for playing back
US11450146B2 (en) Gesture recognition method, apparatus, and device
US20230077355A1 (en) Tracker assisted image capture
JP5366824B2 (en) Method and system for converting 2D video to 3D video
CN108197618B (en) Method and device for generating human face detection model
US11042991B2 (en) Determining multiple camera positions from multiple videos
CN109858563B (en) Self-supervision characterization learning method and device based on transformation recognition
KR20150110697A (en) Systems and methods for tracking and detecting a target object
JP2007328746A (en) Apparatus and method for tracking object
JP2011134114A (en) Pattern recognition method and pattern recognition apparatus
WO2021031954A1 (en) Object quantity determination method and apparatus, and storage medium and electronic device
CN107169417B (en) RGBD image collaborative saliency detection method based on multi-core enhancement and saliency fusion
CN107169503B (en) Indoor scene classification method and device
CN111914878A (en) Feature point tracking training and tracking method and device, electronic equipment and storage medium
JP2017228224A (en) Information processing device, information processing method, and program
CN103105924A (en) Man-machine interaction method and device
JP2020071875A (en) Deep learning model used for image recognition, and apparatus and method for training the model
CN109697385A (en) A kind of method for tracking target and device
KR102102164B1 (en) Method, apparatus and computer program for pre-processing video
CN112149557A (en) Person identity tracking method and system based on face recognition
CN109711267A (en) A kind of pedestrian identifies again, pedestrian movement's orbit generation method and device
KR20140068746A (en) Method, system and computer-readable recording media for motion recognition
Cordea et al. Real-time 2 (1/2)-D head pose recovery for model-based video-coding
CN116611491A (en) Training method and device of target detection model, electronic equipment and storage medium
CN110197123A (en) A kind of human posture recognition method based on Mask R-CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180522