WO2021248535A1 - 一种手势动作判断方法、装置、电子设备和存储介质 - Google Patents

一种手势动作判断方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021248535A1
WO2021248535A1 PCT/CN2020/096743 CN2020096743W WO2021248535A1 WO 2021248535 A1 WO2021248535 A1 WO 2021248535A1 CN 2020096743 W CN2020096743 W CN 2020096743W WO 2021248535 A1 WO2021248535 A1 WO 2021248535A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
speaker
gesture action
original
collected
Prior art date
Application number
PCT/CN2020/096743
Other languages
English (en)
French (fr)
Inventor
黄远芳
吴锐兴
叶利剑
Original Assignee
瑞声声学科技(深圳)有限公司
瑞声科技(新加坡)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瑞声声学科技(深圳)有限公司, 瑞声科技(新加坡)有限公司 filed Critical 瑞声声学科技(深圳)有限公司
Publication of WO2021248535A1 publication Critical patent/WO2021248535A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/162Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures

Definitions

  • the present invention relates to the technical field of loudspeakers, and in particular to a method, device, electronic equipment and storage medium for determining gesture actions.
  • the terminal equipment is designed with ranging functions such as infrared ranging, and further gesture recognition functions to improve the interactive experience.
  • infrared sensors have many limitations, such as difficulty in application in dark environments. Therefore, more and more terminal devices use the speaker of the terminal device (such as the receiver of a mobile phone) to transmit ultrasonic distance measurement.
  • the speaker of the terminal device such as the receiver of a mobile phone
  • the transmitted ultrasonic signal will be the same as the voice signal. Intermodulation distortion occurs, which affects the judgment accuracy of ultrasonic ranging and related applications.
  • a method for judging gesture actions including:
  • the original audio signal including an original ultrasound signal and an original voice signal
  • the predistortion signal is transmitted to the speaker, and the speaker plays to generate a target output signal
  • the target output signal is collected by the microphone after being propagated through the space medium
  • the determining whether there is a gesture action according to the frequency spectrum characteristics of the original ultrasound signal and the collected signal includes:
  • the method further includes:
  • the type of the gesture action corresponding to the energy spectrum feature is determined.
  • the method further includes:
  • non-linear parameters of the loudspeaker include:
  • the nonlinear parameter of the speaker obtained by offline testing, or the nonlinear parameter of the speaker updated online.
  • the method further includes:
  • the condition parameter of the speaker includes ambient temperature, working time, and dynamic input signal power One or more of the range.
  • a device for judging gesture actions including:
  • the first acquisition module is configured to acquire an original audio signal, where the original audio signal includes an original ultrasound signal and an original voice signal;
  • Non-linear parameter module used to obtain the non-linear parameters of the loudspeaker
  • a non-linear compensation module configured to perform predistortion processing on the original audio signal according to the non-linear parameters of the speaker to obtain a non-linear compensation signal
  • a loudspeaker module which outputs a target output signal under the excitation of the nonlinear compensation signal
  • a microphone module which collects the signal after the target output signal propagates through space
  • the second acquisition module acquires the acquisition signal collected by the microphone
  • the processing module is configured to determine whether there is a gesture action based on the original ultrasonic signal and the frequency spectrum characteristics of the collected signal.
  • an electronic device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the same as the first aspect and any of them.
  • the memory stores a computer program
  • the processor executes the same as the first aspect and any of them.
  • a computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executed as in the first aspect and any one thereof Possible implementation steps.
  • the beneficial effects of the present invention are: by identifying the nonlinear parameters of the loudspeaker system, the signal pre-distortion processing, that is, the distortion caused by the nonlinear system is pre-compensated at the input end, the intermodulation distortion of the ultrasonic signal and the voice signal can be reduced, and the device can be significantly improved Use speakers to determine the accuracy of ultrasonic ranging and related applications.
  • FIG. 1 is a schematic flowchart of a method for judging gesture actions provided by the present invention
  • FIG. 2 is a schematic flowchart of another method for judging gesture actions provided by the present invention.
  • Fig. 3 is a schematic diagram of the spectral energy distribution of a collected signal provided by the present invention.
  • FIG. 4 is a schematic diagram of a system flow diagram including a nonlinear compensation module provided by the present invention.
  • FIG. 5 is a schematic diagram of intermodulation distortion in a method for judging gesture actions provided by the present invention.
  • Fig. 6 is a schematic structural diagram of a gesture action judging device provided by the present invention.
  • Fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
  • FIG. 1 is a schematic flowchart of a method for determining gesture actions according to an embodiment of the present invention.
  • the method may include:
  • the execution subject of the embodiment of the present invention may be a gesture action judging device.
  • the gesture action judging device includes a speaker, and can use ultrasonic signals for distance measurement and gesture recognition.
  • the above-mentioned gesture action judging device may be an electronic device, and the above-mentioned electronic device may be a terminal device, including but not limited to mobile terminals, earphones, audio playback devices, and other devices such as laptop computers and tablet computers. Other portable devices or desktop computers.
  • the aforementioned original audio signal may be an audio signal that needs to be finally output by a speaker.
  • the above-mentioned ultrasonic signal can also be output by the above-mentioned speaker.
  • the receiver/speaker of a mobile phone emits ultrasound for distance measurement and further gesture recognition.
  • the 20kHz ultrasonic transmitting signal will modulate the 18kHz-22kHz signal with the voice signal (300-1.5kHz), resulting in intermodulation distortion, which affects the judgment accuracy of the overall algorithm.
  • Intermodulation distortion refers to the generation of new frequency components after two or more signals of different frequencies pass through an amplifier or speaker. This distortion is usually produced by active devices in the circuit (such as transistors and tubes). The magnitude of the distortion is related to the output power. Since these newly generated frequency components have no similarity with the original signal, the less intermodulation distortion is also easy to be noticed by the human ear.
  • the speakers will more or less show non-linear characteristics, and there will be signal components that do not exist in the input signal.
  • the distortion object is the amplitude and/or phase of the output signal
  • the nonlinear distortion implies that the output signal contains frequency components that do not exist in the input signal.
  • the indirect test method can generally be used to analyze the nonlinearity of the speaker vibration, that is, the first speaker circuit model is built in advance, and then the speaker analyzer is used to test and the adaptive fitting calculation method is used to obtain the relevant first speaker.
  • Non-linear parameters of a loudspeaker that is, the first speaker circuit model is built in advance
  • the nonlinear parameter of the speaker includes: the nonlinear parameter of the speaker obtained by offline testing, or the nonlinear parameter of the speaker updated online.
  • the offline test may include direct testing of the first speaker through a speaker test system, rangefinder and other equipment to analyze the nonlinearity of the vibration of the first speaker, so as to directly obtain the nonlinear parameters of the first speaker, and preset Provided during use in the device.
  • a DC bias voltage signal can be applied to the first speaker (which can be the same as another speaker or an analog speaker) to bias the voice coil of the first speaker in the magnetic gap, and then use a rangefinder such as laser measurement
  • the distance meter measures the bias displacement of the voice coil of the first speaker under the DC bias voltage signal, and outputs to the first speaker through the speaker test system while keeping the value of the DC bias voltage signal at both ends of the first speaker unchanged AC analysis signal to obtain the impedance curve and displacement-voltage transfer function curve of the voice coil of the first loudspeaker in the bias position, and then calculate the nonlinear parameters of the first loudspeaker in the biased state of the voice coil based on these curves The value of.
  • the magnitude of the DC bias voltage signal can be changed multiple times, the above steps are repeated, the offset displacement of the voice coil of the first speaker in the magnetic gap under the corresponding DC bias voltage signal is measured and the corresponding offset displacement is calculated.
  • the value of the nonlinear parameter of the first loudspeaker under the bias displacement of the voice coil is calculated.
  • the above-mentioned method further includes:
  • the nonlinear characteristic curve of the loudspeaker through simulation or measurement, which may include the mapping relationship between the aforementioned preset loudspeaker condition parameters and the nonlinear parameters.
  • the conditional parameters of the loudspeaker are all factors that affect the nonlinear distortion of the loudspeaker, which may include the environment One or more of temperature, working time, and dynamic range of input signal power, for example, the mapping relationship between the ambient temperature where the speaker is located and the non-linear parameters.
  • the non-linear parameters of the loudspeaker can be updated periodically.
  • the specific method is to obtain the conditional parameters of the current loudspeaker, and determine the non-linear parameters of the current loudspeaker according to the mapping relationship between the preset loudspeaker condition parameters and the non-linear parameters. Realize real-time acquisition of speaker nonlinear parameters.
  • predistortion processing may be performed before the original audio signal is transmitted to the speaker to obtain the foregoing predistortion signal, and then the predistortion signal is transmitted to the foregoing speaker for playback to generate the foregoing target output signal.
  • a microphone is set to collect audio signals in the space, and the above-mentioned target output signal may be collected by the microphone after being propagated through the space medium. For the related processing of the signals collected by the microphone, see the subsequent steps 104 and 105.
  • the compensation processing can be realized by a non-linear filter, which is a non-linear compensator, which can eliminate the non-linear behavior of the speaker by controlling the excitation signal without changing the structure of the speaker.
  • the non-linear filter can form an all-pass filter with the actual first speaker.
  • the gesture recognition uses the Doppler effect of the ultrasonic wave, that is, when the target output signal is reflected by the gesture action, the collected signal frequency changes, and when the moving object approaches the sound The frequency rises when the source is moving, and drops when the moving object moves away from the sound source.
  • the speaker of the terminal device can be used as the transmitting device to transmit ultrasonic signals, and the microphone as the receiving device to collect the above-mentioned collected signals.
  • the reflected ultrasonic waves can be received, and the human hand or human head/face can be used as the sound wave reflection medium.
  • the Doppler effect of ultrasound can be used for spectrum analysis.
  • the specific formulas involved include:
  • f r is the received reflected frequency
  • f e is the emission frequency
  • v i is the speed of sound in air
  • v 0 is the velocity of the object relative to the apparatus.
  • a mobile phone can emit 18-22kHz ultrasonic signals
  • a microphone can be used as a receiving device to receive reflected ultrasonic waves
  • a human hand or head can be used as a sound wave reflecting medium.
  • step 105 specifically includes:
  • the spectral energy distribution of the collected signal collected by the microphone will change compared to the spectral energy distribution of the original ultrasonic signal. By comparing the energy distribution of the transmit frequency and the collected frequency, it can be judged whether there is a gesture action.
  • a corresponding instruction when it is determined that there is a gesture action, a corresponding instruction can be triggered to perform a corresponding operation, for example, it can be applied to a user to control a terminal device through a gesture to realize various functions.
  • the above-mentioned gestures can be from no gestures to gestures, or from one gesture to another gesture, including the position movement of the gesture, the gesture change of the specific gesture, etc.; in one embodiment , Also includes determining changes in other body movements, not limited to gestures, and the present invention does not limit the above aspects.
  • the present invention obtains the non-linear parameters of the speaker by obtaining the original audio signal, the original audio signal including the original ultrasonic signal and the original voice signal, and then performs predistortion processing on the original audio signal according to the non-linear parameters of the speaker to obtain the predistortion signal
  • the predistortion signal is transmitted to the speaker, the speaker is played to generate a target output signal, and the target output signal is collected by a microphone after being propagated through a spatial medium, and the collection signal collected by the microphone is obtained, based on the original ultrasound signal and the frequency spectrum of the collection signal Features to determine whether there is a gesture action.
  • the gesture action judgment method of the present invention can reduce the intermodulation distortion problem in the gesture judgment processing using the ultrasonic signal by pre-distorting the signal in front of the speaker, and significantly improve the judgment accuracy of the ultrasonic ranging and related applications using the speaker in the device .
  • FIG. 2 is a schematic flowchart of another method for determining gesture actions according to an embodiment of the present invention. As shown in Figure 2, the method may include:
  • step 203 For the foregoing steps 201 to step 203, reference may be made to the specific descriptions in step 101 to step 103 in the embodiment shown in FIG. 1, which will not be repeated here.
  • frequency domain processing may be performed, which may include performing a fast Fourier transform (FFT) on the digitally acquired signal after windowing, to obtain the energy spectrum of the recovered sound pressure. Since the amplitude characteristics of the frequency spectrum are mainly used, the phase information can be discarded in the calculation process and the amount of data processing can be reduced.
  • FFT fast Fourier transform
  • Whether there is a difference in the above-mentioned energy distribution refers to whether the preset change difference caused by the gesture action is reached, and the general signal interference can be ignored. Specifically, it can be judged whether there is a difference between the energy distribution in the energy spectrum information of the acquired signal and the energy distribution in the energy spectrum information of the original ultrasound signal.
  • the amplitude characteristics of the spectrum can be used mainly according to the aforementioned Doppler effect calculation formula Obtain the maximum frequency shift and determine whether it reaches the preset frequency shift threshold. If it is reached, there is a difference in the energy distribution between the two, determine that there is the above gesture action, and go to step 206; otherwise, you can skip the subsequent steps and continue the periodic detection .
  • FIG. 3 For example, you can refer to a schematic diagram of the spectrum energy distribution of the collected signal as shown in FIG. 3.
  • the frequency of the original ultrasonic signal is 20KHz.
  • the spectrum energy of the collected signal collected by the microphone will not only be distributed at 20KHz.
  • the spectrum energy of the collected signal is almost concentrated in 20KHz, as shown in the right frame in Figure 3, the spectrum effect without movement.
  • the gesture action can be further classified.
  • Image Binarization is the process of setting the gray value of the pixels on the image to 0 or 255, that is, the entire image presents an obvious black and white effect.
  • the binarization of the image greatly reduces the amount of data in the image, which can highlight the contour of the target, that is, the trend of energy spectrum distribution.
  • Edge detection is a basic problem in image processing and computer vision.
  • the purpose of edge detection is to identify points with obvious brightness changes in digital images.
  • Significant changes in image attributes usually reflect important events and changes in attributes. These include discontinuities in depth, discontinuities in surface orientation, changes in material properties, and changes in scene lighting.
  • Edge detection is a research field in image processing and computer vision, especially in feature extraction. Through image edge detection, the amount of data to be analyzed in the energy spectrum is greatly reduced, and information that can be considered irrelevant is eliminated, and important structural attributes are retained, so that the change area can be more accurately focused.
  • binarization and edge detection can be performed to extract the energy spectrum characteristics of the above gestures.
  • the amplitude vector of the frequency shift interval of the energy spectrum can be extracted.
  • the amplitude vector in the frequency shift interval can reflect the energy distribution characteristics of the gesture implementation process.
  • the mapping relationship between energy spectrum features and gesture actions can be preset. For example, logging in a gesture action template in advance includes collecting and storing the template feature vector corresponding to the preset gesture action. By comparing the extracted amplitude vector of the frequency shift interval with the template feature vector of the aforementioned preset gesture action, when the similarity is higher than the preset similarity threshold, it is determined that there is a gesture action and which gesture action type can be determined .
  • the method further includes:
  • the corresponding relationship between the type of gesture action and the target control instruction can also be preset. After the type of gesture action is determined through spectrum analysis, the target control instruction corresponding to the type of gesture action can be further determined to trigger the target control. Instructions are used to perform corresponding operations to realize the control of the operation function of the terminal device through gesture actions, which is convenient for operation and improves the interactive experience.
  • Fig. 4 a schematic diagram of a system flow including a nonlinear compensation module.
  • 1 is the ultrasonic signal, generally around 20KHz
  • 2 is the voice signal, or the mid-low frequency part of the music signal (around 300Hz-500Hz)
  • 3 is the original audio signal, that is, the superposition of the ultrasonic signal and the voice signal;
  • 4 is a nonlinear compensation module, which is used to pre-distorte the signal
  • 5 is the speaker nonlinear parameter test system that is updated offline or online; 6 is the speaker nonlinear parameter; 7 is the predistortion signal after nonlinear compensation processing;
  • 9 is a gesture action
  • 10 is the sound signal reflected by the gesture action 9
  • 11 is the sound pressure signal (collected signal) collected by the microphone
  • 12 is frequency domain processing, which can use FFT and other methods to obtain the energy spectrum of the recovered sound pressure; 13 is the processed sound pressure energy spectrum information; 14 is the judgment operation, judge whether the energy distribution of the sound pressure energy spectrum is based on whether there is a frequency offset Change to determine whether there is a gesture change;
  • the present invention judges whether a gesture change occurs mainly by detecting the change of the energy concentration frequency on the frequency spectrum. This method needs to maintain a good linear relationship between the collected signal and the transmitted signal. For terminal devices such as mobile phones, due to the non-linearity of the mobile phone speaker/receiver system, the transmitted signal and the voice signal will produce intermodulation distortion, which affects the accuracy of the overall algorithm.
  • FIG. 5 See Figure 5 for a schematic diagram of intermodulation distortion in a gesture action judgment method.
  • Figure 5 it intuitively and clearly shows the original intermodulation distortion in the general method and the improved gesture action judgment method of the present invention.
  • Intermodulation distortion For the same original audio signal processing, after adding a nonlinear compensation module, the intermodulation distortion of the ultrasonic signal and the voice signal is reduced by about 30dB. It can be seen that the method in the embodiment of the present invention can reduce the intermodulation distortion, and cooperate with the accurate feature judgment method to significantly improve the accuracy of gesture judgment.
  • the original audio signal includes the original ultrasonic signal and the original voice signal
  • the original audio signal is predistorted according to the nonlinear parameters of the speaker to obtain the predistorted signal.
  • the predistorted signal is transmitted To the aforementioned speaker, the aforementioned speaker plays to generate a target output signal, and the target output signal is collected by a microphone after being propagated through the spatial medium. Determine whether there is a difference between the energy distribution in the energy spectrum information of the acquired signal and the energy spectrum information of the original ultrasonic signal.
  • Gesture actions can perform binarization and edge detection processing on the energy spectrum information, extract the energy spectrum features of the gesture actions, and then determine the corresponding energy spectrum features according to the mapping relationship between the preset energy spectrum features and the preset gesture action types If it does not exist, it is determined that there is no such gesture action, and you can continue to perform frequency domain processing on the collected signal at the next moment.
  • the present invention pre-compensates the distortion caused by the nonlinearity of the system at the input end.
  • the collected signal collected after passing through the loudspeaker system is the linear response of the transmitted signal (original ultrasonic signal), which can significantly reduce the system misjudgment caused by the intermodulation distortion. Furthermore, the linearized system after nonlinear compensation is matched with accurate The feature judgment method can significantly improve the accuracy of gesture action judgment.
  • the embodiment of the present invention also discloses a gesture action judging device.
  • the gesture action judging device 600 includes:
  • the first acquisition module 610 is configured to acquire an original audio signal, where the original audio signal includes an original ultrasound signal and an original voice signal;
  • the non-linear parameter module 620 is used to obtain the non-linear parameter of the loudspeaker
  • the nonlinear compensation module 630 is configured to perform predistortion processing on the original audio signal according to the nonlinear parameters of the speaker to obtain a nonlinear compensation signal;
  • the speaker module 640 outputs a target output signal under the excitation of the aforementioned nonlinear compensation signal
  • the microphone module 650 collects the signal after the above-mentioned target output signal propagates through space;
  • the second obtaining module 660 obtains the collected signal collected by the microphone
  • the processing module 670 is configured to determine whether there is a gesture action based on the frequency spectrum characteristics of the original ultrasonic signal and the collected signal.
  • processing module 670 is specifically configured to:
  • the above-mentioned processing module 670 is further configured to: in the case where the above-mentioned gesture action is determined to exist, perform binarization and edge detection processing on the above-mentioned energy spectrum information, and extract the energy spectrum characteristics of the above-mentioned gesture action;
  • the type of the gesture action corresponding to the energy spectrum feature is determined.
  • processing module 670 is further used for:
  • the target control instruction corresponding to the type of the gesture action is determined according to the preset correspondence between the type of the gesture action and the target control instruction;
  • non-linear parameters of the aforementioned loudspeaker include:
  • the aforementioned nonlinear parameter module 620 is specifically configured to:
  • the steps involved in the methods shown in FIG. 1 and FIG. 2 may all be executed by each module in the gesture action judging apparatus 600 shown in FIG. 6, which will not be repeated here.
  • the non-linear compensation module 4 shown in FIG. 4 may correspond to the non-linear compensation module 630 described above.
  • the gesture action judging device 600 and the gesture action judging device 600 can obtain the original audio signal.
  • the original audio signal includes the original ultrasonic signal and the original voice signal, and the nonlinear parameters of the speaker are obtained.
  • the parameters perform predistortion processing on the original audio signal to obtain a predistorted signal.
  • the predistorted signal is transmitted to the speaker, the speaker plays to generate a target output signal, and the target output signal is transmitted through the space medium and collected by the microphone to obtain the microphone collection
  • the signal can be pre-distorted in front of the speaker, which can reduce the intermodulation distortion problem in the gesture judgment processing using the ultrasonic signal. Significantly improve the judgment accuracy of ultrasonic ranging and related applications using speakers in the device.
  • an embodiment of the present invention also provides an electronic device.
  • the electronic device includes at least a processor 710, a non-volatile storage medium 720, an internal memory 730, and a network interface 740, where the processor 710, a non-volatile storage medium 720, the internal memory 730, and a network interface
  • the 740 can be connected via the system bus 750 or other methods, and can communicate with other devices via the network interface 740.
  • the non-volatile storage medium 720 may be stored in the memory.
  • the above-mentioned computer storage medium is used to store a computer program and an operating system.
  • the internal memory 730 also stores a computer program.
  • the above-mentioned computer program includes program instructions. Can be used to execute the above program instructions.
  • the processor or CPU (Central Processing Unit, Central Processing Unit)
  • CPU Central Processing Unit
  • the processor is the computing core and control core of the terminal. It is suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to implement the corresponding method Process or corresponding function; in one embodiment, the above-mentioned processor 710 in the embodiment of the present invention may be used to perform a series of processing, including the method in the embodiment shown in FIG. 1 and FIG. 2 and so on.
  • the embodiment of the present invention also provides a computer storage medium (Memory).
  • the above-mentioned computer storage medium is a memory device in a terminal for storing programs and data. It can be understood that the computer storage medium herein may include a built-in storage medium in the terminal, and of course, may also include an extended storage medium supported by the terminal.
  • the computer storage medium provides storage space, and the storage space stores the operating system of the terminal. Moreover, one or more instructions suitable for being loaded and executed by the processor are stored in the storage space, and these instructions may be one or more computer programs (including program codes).
  • the computer storage medium here can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor.
  • Computer storage media can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor.
  • one or more instructions stored in the computer storage medium can be loaded and executed by the processor to implement the corresponding steps in the above-mentioned embodiments; in specific implementation, one or more instructions in the computer storage medium can be executed by The processor loads and executes any steps of the method in FIG. 1 and/or FIG. 2, which will not be repeated here.
  • the disclosed system, device, and method may be implemented in other ways.
  • the division of the modules is only a logical function division, and there can be other divisions in actual implementation.
  • multiple modules or components can be combined or integrated into another system, or some features can be ignored or not. implement.
  • the displayed or discussed mutual coupling, or direct coupling, or communication connection may be indirect coupling or communication connection through some interfaces, devices or modules, and may be in electrical, mechanical, or other forms.
  • the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from one website, computer, server, or data center to another via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) A website, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium can be a read-only memory (ROM), or a random access memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, for example, Digital versatile disc (digital versatile disc, DVD), or semiconductor media, for example, solid state disk (solid state disk, SSD), etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic medium such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, for example, Digital versatile disc (digital versatile disc, DVD), or semiconductor media, for example, solid state disk (solid state disk, SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

一种手势动作判断方法、装置、电子设备和存储介质,其中方法包括:获取原始音频信号,所述原始音频信号包括原始超声信号和原始语音信号(101);获取扬声器的非线性参数(102);根据所述扬声器的非线性参数对所述原始音频信号进行预失真处理,获得预失真信号;所述预失真信号传输至所述扬声器,所述扬声器播放产生目标输出信号;所述目标输出信号经由空间媒介传播后被麦克风采集(103);获取所述麦克风采集的采集信号(104);根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作(105)。可以降低利用超声信号进行手势判断处理中的互调失真问题,显著提升装置中利用扬声器进行超声测距及相关应用的判断精度。

Description

一种手势动作判断方法、装置、电子设备和存储介质 技术领域
本发明涉及扬声器技术领域,尤其涉及一种手势动作判断方法、装置、电子设备和存储介质。
背景技术
随着终端设备行业的不断发展以及全面屏的应用,终端设备设计了测距功能如红外测距方式,以及进一步的手势识别功能,提高交互体验。但红外传感器存在很多局限性,例如在黑暗环境中应用困难。因此越来越多的终端设备改为使用终端设备的扬声器(如手机受话器)发射超声测距,但由于终端设备的扬声器(比如手机的受话器)系统的非线性,发射的超声信号会与语音信号发生互调失真,影响超声测距及相关应用的判断精度。
发明内容
基于此,有必要针对上述问题,提供一种手势动作判断方法、装置、电子设备和存储介质,用于解决上述超声信号与语音信号发生互调失真,影响超声测距及相关应用的判断精度问题。
本发明的技术方案如下:
一方面,提供了一种手势动作判断方法,包括:
获取原始音频信号,所述原始音频信号包括原始超声信号和原始语音信号;
获取扬声器的非线性参数;
根据所述扬声器的非线性参数对所述原始音频信号进行预失真处理,获得预失真信号;
所述预失真信号传输至所述扬声器,所述扬声器播放产生目标输出信号;
所述目标输出信号经由空间媒介传播后被麦克风采集;
获取所述麦克风采集的采集信号;
根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作。
可选的,所述根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作,包括:
对所述原始超声信号和所述采集信号进行频域处理,获得所述原始超声信号和所述采集信号的能量谱信息;
判断所述采集信号的所述能量谱信息中的能量分布与所述原始超声信号的所述能量谱信息中的能量分布是否存在差异,若存在,确定存在所述手势动作;若不存在,确定不存在所述手势动作。
可选的,在所述确定存在所述手势动作的情况下,所述方法还包括:
对所述能量谱信息进行二值化和边缘检测处理,提取所述手势动作的能量谱特征;
根据预设能量谱特征与预设手势动作类型的映射关系,确定所述能量谱特征对应的所述手势动作的类型。
可选的,所述确定所述能量谱特征对应的所述手势动作的类型之后,所述方法还包括:
根据预设手势动作的类型与目标控制指令的对应关系,确定所述手势动作的类型所对应的目标控制指令;
触发所述目标控制指令对应的操作。
可选的,所述扬声器的非线性参数包括:
离线测试获得的所述扬声器的非线性参数,或者在线更新的所述扬声器的非线性参数。
可选的,所述扬声器的非线性参数为在线更新的情况下,所述方法还包括:
获取所述扬声器的条件参数;
依据预设的扬声器条件参数与非线性参数的映射关系和所述获取的扬声器的条件参数,更新所述扬声器的非线性参数,所述扬声器的条件参数包括环境温度、工作时间、输入信号功率动态范围中的一种或几种。
另一方面,提供了一种手势动作判断装置,包括:
第一获取模块,用于获取原始音频信号,所述原始音频信号包括原始超声信号和原始语音信号;
非线性参数模块,用于获取扬声器的非线性参数;
非线性补偿模块,用于根据所述扬声器的非线性参数对所述原始音频信号进行预失真处理,获得非线性补偿信号;
扬声器模块,在所述非线性补偿信号的激励下输出目标输出信号;
麦克风模块,采集所述目标输出信号经由空间传播后的信号;
第二获取模块,获取麦克风采集的采集信号;
处理模块,用于根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作。
另一方面,提供了一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如第一方面及其任一种可能的实现方式的步骤。
另一方面,提供了一种计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如上述第一方面及其任一种可能的实现方式的步骤。
本发明的有益效果在于:通过识别扬声器系统的非线性参数,对信号预失真处理,即将非线性系统导致的失真在输入端预先补偿掉,可以降低超声信号和语音信号互调失真,显著提升装置中利用扬声器进行超声测距及相关应用的判断精度。
附图说明
图1为本发明提供的一种手势动作判断方法的流程示意图;
图2为本发明提供的另一种手势动作判断方法的流程示意图;
图3为本发明提供的一种采集信号的频谱能量分布示意图;
图4为本发明提供的一种包含非线性补偿模块的系统流程示意图;
图5为本发明提供的一种手势动作判断方法中互调失真的示意图;
图6为本发明提供的一种手势动作判断装置的结构示意图;
图7为本发明提供的一种电子设备的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所 描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
下面结合本发明实施例中的附图对本发明实施例进行描述。
请参阅图1,图1是本发明实施例提供的一种手势动作判断方法的流程示意图。该方法可包括:
101、获取原始音频信号,上述原始音频信号包括原始超声信号和原始语音信号。
本发明实施例的执行主体可以为一种手势动作判断装置,该手势动作判断装置包括扬声器,可以利用超声信号进行测距、手势识别。在一种实施方式中,上述手势动作判断装置可以为电子设备,上述电子设备可以为终端设备,包括但不限于移动终端、耳机、音频播放设备,以及诸如膝上型计算机、平板计算机之类的其它便携式设备或者台式计算机。
上述原始音频信号可以为需要最终由扬声器进行输出的音频信号。对于不同的电子设备,可以通过不同的操作形式选择播放音乐,获取对应的音频信号进行输出,此处不做限制。上述超声信号也可以由上述扬声器输出,比如手机的受话器/扬声器发射超声,用于测距以及更进一步的手势识别。
但一般由于扬声器系统的非线性失真,比如,小体积的扬声器,在驱动电压较大的情况下,会产生很大的非线性失真(THD),并且低频信号和高频信 号还会产生互调失真(IMD),导致实际播放的信号,在THD和IMD的作用下产生畸变,与希望得到的输出信号有偏差。尤其是终端设备扬声器/受话器系统的非线性,20kHz的超声发射信号会与语音信号(300-1.5kHz)调制出18kHz-22kHz的信号,产生了互调失真,从而影响整体算法的判断精度。
互调失真是指两种或多种不同频率的信号通过放大器或扬声器后产生新的频率分量,这种失真通常都是由电路中的有源器件(如晶体管、电子管)产生的。失真的大小与输出功率有关,由于新产生的这些频率分量与原信号没有相似性,因此较少的互调失真也很容易被人耳觉察到。
102、获取扬声器的非线性参数。
在较大振幅情况下,扬声器都或多或少地会表现出非线性特性,并会有在输入信号中并不存在的信号分量产生。在扬声器的线性失真中,失真对象是输出信号的幅度和/或相位,而非线性失真则暗示输出信号包含输入信号中不存在的频率组成,为了解决非线性失真带来的问题,可以先确定扬声器的非线性参数,再进行针对性的补偿。
具体的,一般可以利用间接测试法来分析扬声器振动的非线性问题,即通过事先建一个第一扬声器的电路模型,再利用扬声器分析仪测试并经过自适应拟合运算的方法来得到相关的第一扬声器的非线性参数。
可选的,上述扬声器的非线性参数包括:离线测试获得的上述扬声器的非线性参数,或者在线更新的上述扬声器的非线性参数。
可选的,离线测试可以包括通过扬声器测试系统和测距仪等设备对第一扬声器进行直接测试,来分析第一扬声器振动的非线性问题,以直接获取第一扬声器的非线性参数,预置在装置中在使用时提供。具体的,可以给第一扬声器(可以是相同的另一个扬声器或者模拟扬声器)施加直流偏置电压信号使该第一扬声器的音圈在磁间隙中发生偏置,再通过测距仪如激光测距仪,测量在该直流偏置电压信号下第一扬声器的音圈的偏置位移,保持第一扬声器两端的直流偏置电压信号值不变的条件下,通过扬声器测试系统向第一扬声器输出交流分析信号,获取该第一扬声器的音圈在偏置位置下的阻抗曲线以及位移-电压传递函数曲线,再根据这些曲线计算该第一扬声器在音圈发生偏置状态下的各个非线性参数的数值。
则可以多次改变直流偏置电压信号的大小,重复上述步骤,测量出在对应 的直流偏置电压信号下该第一扬声器的音圈在磁间隙中的偏置位移以及计算得出在相对应的音圈偏置位移下该第一扬声器的非线性参数的数值。
在一种可选的实施方式中,上述扬声器的非线性参数为在线更新的情况下,上述方法还包括:
获取上述扬声器的条件参数;
依据预设的扬声器条件参数与非线性参数的映射关系和上述获取的扬声器的条件参数,更新上述扬声器的非线性参数,上述扬声器的条件参数包括环境温度、工作时间、输入信号功率动态范围中的一种或几种。
还可以通过模拟或者测量,获得扬声器的非线性特征曲线,可以包括上述预设的扬声器条件参数与非线性参数的映射关系,上述扬声器的条件参数都是影响扬声器非线性失真的因素,可包括环境温度、工作时间、输入信号功率动态范围中的一种或几种,比如,扬声器所处的环境温度与非线性参数的映射关系。进而,可以周期性地进行扬声器的非线性参数更新,具体的方法是,获取当前扬声器的条件参数,依据上述预设的扬声器条件参数与非线性参数的映射关系,确定当前扬声器的非线性参数,实现实时的扬声器非线性参数获取。
综上,可以用不同的方式,获得扬声器离线测试的非线性参数,或者,通过获得的非线性特征曲线,确定扬声器在工作状态在线更新的非线性参数,本发明实施例对此不作限制。
103、根据上述扬声器的非线性参数对上述原始音频信号进行预失真处理,获得预失真信号,上述预失真信号传输至上述扬声器,上述扬声器播放产生目标输出信号,上述目标输出信号经由空间媒介传播后被麦克风采集。
本申请实施例在原始音频信号传输到扬声器之前可以进行预失真处理,获得上述预失真信号,再将该预失真信号传输至上述扬声器进行播放,产生上述目标输出信号。本申请实施例中设置了麦克风用于采集空间内的音频信号,则上述目标输出信号可以经由空间媒介传播后被该麦克风采集。对于麦克风采集信号的相关处理可以见后续步骤104和步骤105。
具体的,可以通过一个非线性滤波器实现补偿处理,该非线性滤波器为一个非线性补偿器,可与在不改变扬声器结构的条件下,通过控制激励信号消除扬声器的非线性行为。理想状态下,该非线性滤波器可与实际第一扬声器构成全通滤波器。
104、获取上述麦克风采集的采集信号。
本发明实施例中,由于原始音频信号中加入了超声信号,手势识别利用超声波的多普勒效应,即当存在手势动作使目标输出信号发生反射时,采集信号频率发生变化,当移动物体靠近声源时频率上升,当移动物体远离声源时频率下降。
一般可使用终端设备的扬声器作为发射设备,发射超声信号,麦克风作为接收设备,采集上述采集信号,在发生反射时可以接收反射的超声波,人手或人头/人脸作为声波反射介质。
105、根据上述原始超声信号和上述采集信号的频谱特征,确定是否存在手势动作。
在获取到上述采集信号之后,可以利用超声波的多普勒效应进行频谱分析。涉及到的具体公式包括:
Figure PCTCN2020096743-appb-000001
其中f r为接收到的反射频率,f e为发射频率,v i为声波在空气中的传播速度,v 0为物体相对于设备的运动速度。
比如,手机可以发射18-22kHz的超声波信号,麦克风可作为接收设备,接收反射的超声波,人手或人头作为声波反射介质。
在一种实施方式中,上述步骤105具体包括:
对上述原始超声信号和上述采集信号进行频域处理,获得上述原始超声信号和上述采集信号的能量谱信息;
判断上述采集信号的上述能量谱信息中的能量分布与上述原始超声信号的上述能量谱信息中的能量分布是否存在差异,若存在,确定存在上述手势动作;若不存在,确定不存在上述手势动作。
当物体靠近/远离终端设备时,麦克风采回的采集信号的频谱能量分布相比原始超声信号的频谱能量分布会发生变化,通过比较发射频率和采集频率的能量分布,可以判断是否存在手势动作。
其中,对于确定存在手势动作的情况,可以触发相应的指令,执行对应的操作,比如可以应用于用户通过手势控制终端设备,实现各种功能。上述存在手势动作,可以是从无手势到出现手势的情况,也可以是从一种手势变化为另 一种手势的情况,包括手势的位置移动、具体手势的手势改变等;在一种实施方式中,也包括确定其他肢体动作的变化,不限于手势,本发明对以上方面不做限制。
本发明通过获取原始音频信号,上述原始音频信号包括原始超声信号和原始语音信号,获取扬声器的非线性参数,再根据上述扬声器的非线性参数对上述原始音频信号进行预失真处理,获得预失真信号,上述预失真信号传输至上述扬声器,上述扬声器播放产生目标输出信号,上述目标输出信号经由空间媒介传播后被麦克风采集,获取上述麦克风采集的采集信号,根据上述原始超声信号和上述采集信号的频谱特征,确定是否存在手势动作。本发明的手势动作判断方法,通过在扬声器前对信号预失真处理,可以降低利用超声信号进行手势判断处理中的互调失真问题,显著提升装置中利用扬声器进行超声测距及相关应用的判断精度。
请参阅图2,图2是本发明实施例提供的另一种手势动作判断方法的流程示意图。如图2所示,该方法可包括:
201、获取原始音频信号,上述原始音频信号包括原始超声信号和原始语音信号。
202、根据扬声器的非线性参数对上述原始音频信号进行预失真处理,获得预失真信号,上述预失真信号传输至上述扬声器,上述扬声器播放产生目标输出信号,上述目标输出信号经由空间媒介传播后被麦克风采集。
203、获取上述麦克风采集的采集信号。
其中,上述步骤201-步骤203可以参考图1所示实施例的步骤101-步骤103中的具体描述,此处不再赘述。
204、对上述原始超声信号和上述采集信号进行频域处理,获得上述原始超声信号和上述采集信号的能量谱信息。
具体的,可以进行频域处理,可以包括对数字化采集得到的信号加窗后,对其进行快速傅里叶变换(FFT),得到采回声压的能量谱。由于主要利用频谱的幅值特性,在计算过程中可以舍弃相位信息,减少数据处理量。
205、判断上述采集信号的上述能量谱信息中的能量分布与上述原始超声信号的上述能量谱信息中的能量分布是否存在差异,若存在,确定存在上述手势动作;若不存在,确定不存在上述手势动作。
通过判断能量谱的能量分布是否变化,可以判断是否存在手势动作。
上述能量分布是否存在差异是指是否达到手势动作引起的预设变化差异,对于一般的信号干扰可以忽略。具体的,可以判断上述采集信号的能量谱信息中的能量分布与原始超声信号的上述能量谱信息中的能量分布是否存在差异,主要可以利用频谱的幅值特性,根据前述多普勒效应计算公式得到最大频移量,判断是否达到预设频移阈值,若达到,两者能量分布存在差异,确定存在上述手势动作,执行步骤206;反之不存在,可以不执行后续步骤,继续周期性的检测。
举例来讲,可以参考如图3所示的一种采集信号的频谱能量分布示意图。原始超声信号的频率为20KHz,当手靠近/远离装置时,麦克风采回的采集信号的频谱能量会不仅仅分布于20KHz,如图3中左边框所示的有运动的频谱效果,而一般没有手势动作的情况下,采集信号的频谱能量几乎集中分布于20KHz,如图3中右边框所示的无运动的频谱效果。通过比较发射频率(原始超声信号的频率)和采集频率(采集信号的频率)的能量分布,可以判断是否有手势动作发生。
206、在确定存在上述手势动作的情况下,对上述能量谱信息进行二值化和边缘检测处理,提取上述手势动作的能量谱特征。
具体的,初步检测到存在手势变化的情况下,可以进一步对手势动作分类。
图像二值化(Image Binarization)就是将图像上的像素点的灰度值设置为0或255,也就是将整个图像呈现出明显的黑白效果的过程。在数字图像处理中,图像的二值化使图像中数据量大为减少,从而能凸显出目标的轮廓,即能量谱分布变化趋势。
边缘检测是图像处理和计算机视觉中的基本问题,边缘检测的目的是标识数字图像中亮度变化明显的点。图像属性中的显著变化通常反映了属性的重要事件和变化。这些包括深度上的不连续、表面方向不连续、物质属性变化和场景照明变化。边缘检测是图像处理和计算机视觉中,尤其是特征提取中的一个研究领域。通过图像边缘检测大幅度地减少了能量谱中需分析的数据量,并且剔除了可以认为不相关的信息,保留了其中重要的结构属性,使能够更准确地关注变化区域。
对于可视化的能量谱信息,可以进行二值化和边缘检测处理,提取上述手 势动作的能量谱特征。通过上述处理,可以提取上述能量谱频移区间的幅值向量。频移区间的幅值向量能反应手势实施过程的能量分布特征。
207、根据预设能量谱特征与预设手势动作类型的映射关系,确定上述能量谱特征对应的上述手势动作的类型。
不同的手势动作其对应产生的能量变化是不同的,通过能量谱特征可以分析手势动作的类型。在一种实施方式中,可以预设能量谱特征与手势动作的映射关系,比如预先登入手势动作模板,包括采集并存储该预设手势动作所对应的模板特征向量。通过提取的频移区间的幅值向量与上述预设手势动作的模板特征向量进行比对,当相似度高于预设相似度阈值的情况下,确定存在手势动作并且能够确定为何种手势动作类型。
可选的,在上述步骤207之后,该方法还包括:
根据预设手势动作的类型与目标控制指令的对应关系,确定上述手势动作的类型所对应的目标控制指令;
触发上述目标控制指令对应的操作。
具体的,还可以预设手势动作的类型与目标控制指令的对应关系,当通过频谱分析确定手势动作的类型之后,可以进一步确定该手势动作的类型所对应的目标控制指令,进而触发该目标控制指令以执行对应的操作,实现通过手势动作控制终端设备的操作功能,操作方便,提高交互体验。
请参见图4的一种包含非线性补偿模块的系统流程示意图。可以以手机中的受话器/扬声器进行手势判断为例,如图4所示,其中1为超声信号,一般为20KHz左右;2为语音信号,或音乐信号的中低频部分(300Hz-500Hz左右);3为原始音频信号,即超声信号和语音信号的叠加;
4为非线性补偿模块,用于将信号进行预失真处理;
5为离线或在线更新的扬声器非线性参数测试系统;6为扬声器非线性参数;7为经过非线性补偿处理的预失真信号;
8为预失真信号经过扬声器播放后的声信号(目标输出信号),此信号为原始音频信号的线性响应,不会产生严重的互调失真和谐波失真;而在无预失真处理的原始信号经过扬声器播放后的信号中包含大量的互调失真,会严重影响14中的判断准确程度;
9为手势动作;10为经过手势动作9反射后的声信号;11为麦克风采回 的声压信号(采集信号);
12为频域处理,可使用FFT等方式得到采回声压的能量谱;13为处理后的声压能量谱信息;14为判断操作,通过是否出现频率偏移判断声压能量谱的能量分布是否变化,从而判断是否有手势变化;
若是(Y),执行15手势分类,判断手势动作的具体类型;若不是(N),继续对下一时刻的采集信号进行频域处理。
本发明主要通过检测频谱上能量集中频率的变化,来判断是否出现手势变化。该方法需要采集信号和发射信号保持较好的线性关系。对于终端设备如手机来说,由于手机扬声器/受话器系统的非线性,发射信号会与语音信号产生互调失真,从而影响整体算法的判断精度。
可以参见图5的一种手势动作判断方法中互调失真的示意图,如图5中所示,直观清晰地展示了一般方法中的原始互调失真和本发明中的手势动作判断方法改进后的互调失真,其中对于相同的原始音频信号处理,在添加了非线性补偿模块后,超声信号和语音信号的互调失真下降了30dB左右。可见本发明实施例中的方法可以降低互调失真,配合准确的特征判断方式,可以显著的提升手势判断的精准度。
而本发明实施例通过获取原始音频信号,上述原始音频信号包括原始超声信号和原始语音信号,根据扬声器的非线性参数对上述原始音频信号进行预失真处理,获得预失真信号,上述预失真信号传输至上述扬声器,上述扬声器播放产生目标输出信号,所述目标输出信号经由空间媒介传播后被麦克风采集,获取上述麦克风采集的采集信号后,对上述原始超声信号和上述采集信号进行频域处理,获得上述原始超声信号和上述采集信号的能量谱信息,判断上述采集信号的上述能量谱信息中的能量分布与上述原始超声信号的上述能量谱信息中的能量分布是否存在差异,若存在,确定存在上述手势动作,可以对上述能量谱信息进行二值化和边缘检测处理,提取上述手势动作的能量谱特征,再根据预设能量谱特征与预设手势动作类型的映射关系,确定上述能量谱特征对应的上述手势动作的类型;若不存在,确定不存在上述手势动作,可以继续对下一时刻的采集信号进行频域处理。本发明通过识别扬声器系统的线性和非线性参数,通过将信号预失真处理,将系统由于非线性导致的失真在输入端预先补偿掉。这样通过扬声器系统之后采回的采集信号为发射信号(原始超 声信号)的线性响应,可以将互调失真引起的系统误判明显降低,进而,经过非线性补偿实现线性化的系统,配合准确的特征判断方式,可以显著的提升手势动作判断的精准度。
基于上述手势动作判断方法实施例的描述,本发明实施例还公开了一种手势动作判断装置。请参见图6,手势动作判断装置600包括:
第一获取模块610,用于获取原始音频信号,上述原始音频信号包括原始超声信号和原始语音信号;
非线性参数模块620,用于获取扬声器的非线性参数;
非线性补偿模块630,用于根据上述扬声器的非线性参数对上述原始音频信号进行预失真处理,获得非线性补偿信号;
扬声器模块640,在上述非线性补偿信号的激励下输出目标输出信号;
麦克风模块650,采集上述目标输出信号经由空间传播后的信号;
第二获取模块660,获取麦克风采集的采集信号;
处理模块670,用于根据上述原始超声信号和上述采集信号的频谱特征,确定是否存在手势动作。
可选的,上述处理模块670具体用于:
对上述原始超声信号和上述采集信号进行频域处理,获得上述原始超声信号和上述采集信号的能量谱信息;
判断上述采集信号的上述能量谱信息中的能量分布与上述原始超声信号的上述能量谱信息中的能量分布是否存在差异,若存在,确定存在上述手势动作;若不存在,确定不存在上述手势动作。
可选的,上述处理模块670还用于:在上述确定存在上述手势动作的情况下,对上述能量谱信息进行二值化和边缘检测处理,提取上述手势动作的能量谱特征;
根据预设能量谱特征与预设手势动作类型的映射关系,确定上述能量谱特征对应的上述手势动作的类型。
可选的,上述处理模块670还用于:
在上述确定上述能量谱特征对应的上述手势动作的类型之后,根据预设手势动作的类型与目标控制指令的对应关系,确定上述手势动作的类型所对应的目标控制指令;
触发上述目标控制指令对应的操作。
可选的,上述扬声器的非线性参数包括:
离线测试获得的上述扬声器的非线性参数,或者在线更新的上述扬声器的非线性参数。
可选的,上述非线性参数模块620具体用于:
上述扬声器的非线性参数为在线更新的情况下,获取上述扬声器的条件参数;
依据预设的扬声器条件参数与非线性参数的映射关系和上述获取的扬声器的条件参数,更新上述扬声器的非线性参数,上述扬声器的条件参数包括环境温度、工作时间、输入信号功率动态范围中的一种或几种。
根据本发明的一个实施例,图1和图2所示的方法所涉及的各个步骤均可以是由图6所示的手势动作判断装置600中的各个模块执行的,此处不再赘述。
举例来讲,上述图4中所示的非线性补偿模块4,可对应于上述非线性补偿模块630。
本发明实施例中的手势动作判断装置600,手势动作判断装置600可以获取原始音频信号,上述原始音频信号包括原始超声信号和原始语音信号,获取扬声器的非线性参数,再根据上述扬声器的非线性参数对上述原始音频信号进行预失真处理,获得预失真信号,上述预失真信号传输至上述扬声器,上述扬声器播放产生目标输出信号,上述目标输出信号经由空间媒介传播后被麦克风采集,获取上述麦克风采集的采集信号,根据上述原始超声信号和上述采集信号的频谱特征,确定是否存在手势动作,可以通过在扬声器前对信号预失真处理,可以降低利用超声信号进行手势判断处理中的互调失真问题,显著提升装置中利用扬声器进行超声测距及相关应用的判断精度。
基于上述方法实施例以及装置实施例的描述,本发明实施例还提供一种电子设备。请参见图7,该电子设备至少包括处理器710、非易失性存储介质720、内存储器730和网络接口740,其中,处理器710、非易失性存储介质720、内存储器730和网络接口740可通过系统总线750或其他方式连接,通过网络接口740可以与其他设备进行通信。
非易失性存储介质720即计算机存储介质可以存储在存储器中,上述计算机存储介质用于存储计算机程序和操作系统,内存储器730也存储有计算机程 序,上述计算机程序包括程序指令,上述处理器710可用于执行上述程序指令。处理器(或称CPU(Central Processing Unit,中央处理器))是终端的计算核心以及控制核心,其适于实现一条或多条指令,具体适于加载并执行一条或多条指令从而实现相应方法流程或相应功能;在一个实施例中,本发明实施例上述的处理器710可以用于进行一系列的处理,包括如图1和图2所示实施例中方法等等。
本发明实施例还提供了一种计算机存储介质(Memory),上述计算机存储介质是终端中的记忆设备,用于存放程序和数据。可以理解的是,此处的计算机存储介质既可以包括终端中的内置存储介质,当然也可以包括终端所支持的扩展存储介质。计算机存储介质提供存储空间,该存储空间存储了终端的操作系统。并且,在该存储空间中还存放了适于被处理器加载并执行的一条或多条的指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。需要说明的是,此处的计算机存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的计算机存储介质。
在一个实施例中,可由处理器加载并执行计算机存储介质中存放的一条或多条指令,以实现上述实施例中的相应步骤;具体实现中,计算机存储介质中的一条或多条指令可以由处理器加载并执行图1和/或图2中方法的任意步骤,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本发明所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,该模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。所显示或讨论的相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可 以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本发明实施例的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者通过该计算机可读存储介质进行传输。该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是只读存储器(read-only memory,ROM),或随机存储存储器(random access memory,RAM),或磁性介质,例如,软盘、硬盘、磁带、磁碟、或光介质,例如,数字通用光盘(digital versatile disc,DVD)、或者半导体介质,例如,固态硬盘(solid state disk,SSD)等。

Claims (9)

  1. 一种手势动作判断方法,其特征在于,包括:
    获取原始音频信号,所述原始音频信号包括原始超声信号和原始语音信号;
    获取扬声器的非线性参数;
    根据所述扬声器的非线性参数对所述原始音频信号进行预失真处理,获得预失真信号;
    所述预失真信号传输至所述扬声器,所述扬声器播放产生目标输出信号;
    所述目标输出信号经由空间媒介传播后被麦克风采集;
    获取所述麦克风采集的采集信号;
    根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作。
  2. 根据权利要求1所述的手势动作判断方法,其特征在于,所述根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作,包括:
    对所述原始超声信号和所述采集信号进行频域处理,获得所述原始超声信号和所述采集信号的能量谱信息;
    判断所述采集信号的所述能量谱信息中的能量分布与所述原始超声信号的所述能量谱信息中的能量分布是否存在差异,若存在,确定存在所述手势动作;若不存在,确定不存在所述手势动作。
  3. 根据权利要求2所述的手势动作判断方法,其特征在于,在所述确定存在所述手势动作的情况下,所述方法还包括:
    对所述能量谱信息进行二值化和边缘检测处理,提取所述手势动作的能量谱特征;
    根据预设能量谱特征与预设手势动作类型的映射关系,确定所述能量谱特征对应的所述手势动作的类型。
  4. 根据权利要求3所述的手势动作判断方法,其特征在于,所述确定所述能量谱特征对应的所述手势动作的类型之后,所述方法还包括:
    根据预设手势动作的类型与目标控制指令的对应关系,确定所述手势动作的类型所对应的目标控制指令;
    触发所述目标控制指令对应的操作。
  5. 根据权利要求1-4任一项所述的手势动作判断方法,其特征在于,所述扬声器的非线性参数包括:
    离线测试获得的所述扬声器的非线性参数,或者在线更新的所述扬声器的非线性参数。
  6. 根据权利要求5所述的手势动作判断方法,其特征在于,所述扬声器的非线性参数为在线更新的情况下,所述方法还包括:
    获取所述扬声器的条件参数;
    依据预设的扬声器条件参数与非线性参数的映射关系和所述获取的扬声器的条件参数,更新所述扬声器的非线性参数,所述扬声器的条件参数包括环境温度、工作时间、输入信号功率动态范围中的一种或几种。
  7. 一种手势动作判断装置,其特征在于,包括:
    第一获取模块,用于获取原始音频信号,所述原始音频信号包括原始超声信号和原始语音信号;
    非线性参数模块,用于获取扬声器的非线性参数;
    非线性补偿模块,用于根据所述扬声器的非线性参数对所述原始音频信号进行预失真处理,获得非线性补偿信号;
    扬声器模块,在所述非线性补偿信号的激励下输出目标输出信号;
    麦克风模块,采集所述目标输出信号经由空间传播后的信号;
    第二获取模块,获取麦克风采集的采集信号;
    处理模块,用于根据所述原始超声信号和所述采集信号的频谱特征,确定是否存在手势动作。
  8. 一种存储介质,存储有计算机指令程序,其特征在于,所述计算机指令程序被处理器执行时,使得所述处理器执行如权利要求1至6中任一项所述方法的步骤。
  9. 一种计算机设备,其特征在于,包括至少一个存储器、至少一个处理器,所述存储器存储有计算机指令程序,所述计算机指令程序被所述处理器执行时,使得所述处理器执行如权利要求1至6中任一项所述方法的步骤。
PCT/CN2020/096743 2020-06-12 2020-06-18 一种手势动作判断方法、装置、电子设备和存储介质 WO2021248535A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010537961.1 2020-06-12
CN202010537961.1A CN111796792B (zh) 2020-06-12 2020-06-12 一种手势动作判断方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021248535A1 true WO2021248535A1 (zh) 2021-12-16

Family

ID=72804403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096743 WO2021248535A1 (zh) 2020-06-12 2020-06-18 一种手势动作判断方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN111796792B (zh)
WO (1) WO2021248535A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530420A (zh) * 2020-10-30 2021-03-19 联想(北京)有限公司 一种控制方法、电子设备及存储介质
CN112860070A (zh) * 2021-03-03 2021-05-28 北京小米移动软件有限公司 设备交互方法、设备交互装置、存储介质及终端

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718064A (zh) * 2016-01-22 2016-06-29 南京大学 基于超声波的手势识别系统与方法
CN105916079A (zh) * 2016-06-07 2016-08-31 瑞声科技(新加坡)有限公司 一种扬声器非线性补偿方法及装置
CN106560722A (zh) * 2015-10-02 2017-04-12 奥音科技(北京)有限公司 基于超声噪声的声纳

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10448161B2 (en) * 2012-04-02 2019-10-15 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for gestural manipulation of a sound field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106560722A (zh) * 2015-10-02 2017-04-12 奥音科技(北京)有限公司 基于超声噪声的声纳
CN105718064A (zh) * 2016-01-22 2016-06-29 南京大学 基于超声波的手势识别系统与方法
CN105916079A (zh) * 2016-06-07 2016-08-31 瑞声科技(新加坡)有限公司 一种扬声器非线性补偿方法及装置
US20170353795A1 (en) * 2016-06-07 2017-12-07 AAC Technologies Pte. Ltd. Loudspeaker nonlinear compensation method and apparatus

Also Published As

Publication number Publication date
CN111796792A (zh) 2020-10-20
CN111796792B (zh) 2024-04-02

Similar Documents

Publication Publication Date Title
KR101981879B1 (ko) 음성 신호를 처리하기 위한 방법 및 장치
WO2021248535A1 (zh) 一种手势动作判断方法、装置、电子设备和存储介质
US9513727B2 (en) Touch input surface microphone
JP2014531141A (ja) 雑音を制御するための電子デバイス
CN113475097B (zh) 作为声音发射器的显示器的反馈控制
CN111405416B (zh) 立体声录制方法、电子设备及存储介质
CN111385714B (zh) 扬声器的音圈温度确定方法、电子设备及存储介质
WO2021248526A1 (zh) 一种低音增强方法、系统、电子设备和存储介质
TWI504283B (zh) 揚聲器的音源信號量測方法及具有揚聲器的電子裝置
CN108430024A (zh) 一种降噪耳机的测量方法
CN110515085A (zh) 超声波处理方法、装置、电子设备及计算机可读介质
CN111343540B (zh) 一种钢琴音频的处理方法及电子设备
WO2021248525A1 (zh) 一种信号非线性补偿方法、装置、电子设备和存储介质
US20230368761A1 (en) Methods and apparatus to extract a pitch-independent timbre attribute from a media signal
CN109545237A (zh) 一种计算机可读存储介质和应用该介质的语音交互音箱
WO2024051521A1 (zh) 音频信号处理方法、装置、电子设备及可读存储介质
CN105764008A (zh) 一种调试扩声系统传输频率特性的方法及装置
CN205336536U (zh) 一种短波电台扬声器的测量装置
CN109769175B (zh) 一种音频处理方法和电子设备
CN110648685A (zh) 设备检测方法、装置、电子设备和可读存储介质
WO2019185015A1 (zh) 一种压电传感器信号噪声去除方法
US20160234587A1 (en) Ultrasonic filter for microphone
CN103796135A (zh) 具有回声消除的动态扬声器管理
Yeh et al. Nonlinear modeling of a guitar loudspeaker cabinet
CN110297543B (zh) 一种音频播放方法及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20940120

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20940120

Country of ref document: EP

Kind code of ref document: A1