CN109754817A - signal processing method and terminal device - Google Patents

signal processing method and terminal device Download PDF

Info

Publication number
CN109754817A
CN109754817A CN201810401796.XA CN201810401796A CN109754817A CN 109754817 A CN109754817 A CN 109754817A CN 201810401796 A CN201810401796 A CN 201810401796A CN 109754817 A CN109754817 A CN 109754817A
Authority
CN
China
Prior art keywords
signal
input
voice
characteristic information
improper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810401796.XA
Other languages
Chinese (zh)
Inventor
王宪亮
王立众
尹成万
朱恒
刘长滔
闵超
杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Samsung Telecom R&D Center
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Samsung Telecommunications Technology Research Co Ltd, Samsung Electronics Co Ltd filed Critical Beijing Samsung Telecommunications Technology Research Co Ltd
Publication of CN109754817A publication Critical patent/CN109754817A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to technical field of voice recognition, providing a kind of signal processing method and terminal device, the signal processing method includes: to extract characteristic information from the signal of input;According to the characteristic information of extraction, determine whether the signal of input is improper voice signal.In the present invention, determined by the characteristic information extracted in the signal according to input the input signal whether be improper voice signal treatment process, realize effective identification to improper voice signal, improve the precision of identification, and safety guarantee is provided for the interactive voice of user, improves the use feeling of user.

Description

Signal processing method and terminal device
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of signal processing method and terminal device.
Background technique
Ultrasound attack (also known as " dolphin attack ") is in ACM CCS (The ACM Conference in 2017 Computer and Communications Security, computer communication and safety) propose in meeting, it is said that ultrasound is attacked The voice assistant application that can break through terminal is hit, this allows people to produce query to the safety of interactive voice.
Ultrasound attack is exactly that the basic principle of ultrasound is utilized, and realization approach is: in signal to attack stage of modulating, place Attack voice signal (referred to herein as baseband signal) in first frequency range (normal frequency range), be modulated to allow user without The ultrasonic range that method is heard.Then the stage is transmitted in signal to attack, using ultrasonic transmitter come to equipment under test (mobile phone, intelligence The terminal devices such as home equipment) the modulated attack voice signal of transmitting.In the speech signal collection stage, the attack voice signal It can be collected by the speech control system of equipment under test.Rank further is demodulated comprising voice signal in the speech signal collection stage Section carries out automatic demodulation function by the hardware loophole of equipment under test end speech collecting system, so that baseband signal is recovered, it Baseband signal after demodulating afterwards by the amplifier in speech signal collection stage, low-pass filter and analog-digital converter (ADC, Analog-to-Digital Converter) after, it is identified and is responded by the speech control system of terminal device, thus realization pair The control of equipment, detailed process are as shown in Figure 1.
For the safety problem of Speaker Recognition System, current research is only for the sound for how distinguishing playback and really Sound.Its solution generally by extract sound for example mute feature of feature, mel-frequency cepstrum coefficient (MFCC, Mel-Frequency CepstrumCoefficients), then the feature of the extraction is trained, training pattern such as GMM- UBM (Gaussian Mixture Model-Universal Background Model, gauss hybrid models-common background mould Type) model, the support vector machines (SVM, Support Vector Machine) or K nearest neighbo(u)r classification algorithm of feature (KNN, k-nearest neighbors) is classified by using above-mentioned trained model, come distinguish playback sound and True sound.It is above-mentioned to be based on GMM-UBM Speaker Recognition System as shown in Fig. 2, sound and true be reset based on SVM difference The system of sound is as described in Figure 3.
For the solution of above-mentioned voice secure context of the existing technology, such as Speaker Recognition System (GMM- UBM), or be based on mel-frequency cepstrum coefficient (MFCC) feature support vector machines (SVM) or K nearest neighbo(u)r classification algorithm (KNN), although certain safety guarantee can be provided, there is also following problems:
1) the existing Speaker Recognition System (GMM-UBM) based on gauss hybrid models can not be distinguished accurately really Voice or the voice of recorded broadcast, so that ultrasonic attack can not be resisted.
2) the existing Speaker Recognition System (GMM-UBM) based on gauss hybrid models is only capable of the work under quiet environment Make, can not work normally under other complex environments.
3) existing support vector machines (SVM) or K nearest-neighbor based on mel-frequency cepstrum coefficient (MFCC) feature Sorting algorithm (KNN) is only capable of the lower attack voice messaging of quality that processing uses the equipment recorded broadcasts such as mobile phone or notebook, The recording attack of high-fidelity rank can not be handled.And if for attack voice messaging be the recording of high-fidelity rank when, just without Method carries out effective identifying processing.
To sum up, existing voice security solution not can solve above-mentioned ultrasonic attack problem, cause There are very big security risks for the interactive voice of user.
Summary of the invention
The present invention provides a kind of signal processing method and terminal devices, to realize effective knowledge to improper voice signal Not, identification precision is improved, and provides safety guarantee for user speech interaction, to promote the use feeling of user.
The signal processing method, comprising:
Characteristic information is extracted from the signal of input;
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal.
Preferably, the characteristic information includes the energy feature of signal and/or the periodic characteristic of signal.
Preferably, the energy feature includes short-time energy feature;And/or the periodic characteristic includes short-time zero-crossing rate Feature.
Preferably, according to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
According to the characteristic information of extraction, by way of machine learning, determine whether the signal of input is improper voice Signal.
Preferably, characteristic information is extracted from the signal of input, comprising:
Characteristic information is extracted out of the signal of input setpoint frequency.
Preferably, according to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
According to the variation of the characteristic information of extraction, determine whether the signal of input is improper voice signal.
Preferably, the characteristic information includes the energy feature of signal and/or the direction character of signal.
Preferably, according to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
Determine characteristic information wake-up characteristic information matching whether corresponding with current dynamic wake-up instruction;If not Match, it is determined that the voice signal is improper voice signal.
Preferably, the characteristic information includes mel-frequency cepstrum coefficient MFCC feature and/or dynamic verification code.
Preferably, the characteristic information includes voice command,
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
Determine whether institute's speech commands wake-up word content corresponding with current dynamic wake-up image matches;If not matching, Then determine that the voice signal is improper voice signal.
Preferably, the characteristic information includes background noise information;
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
According to the background noise information of signal, determine whether the signal of input is improper voice signal.
Preferably, according to the background noise information of signal, determine whether the signal of input is improper voice signal, packet It includes:
Current context information is determined according to the ambient noise of signal, according to the current context information, it is determined whether be non- Normal speech signals.
Preferably, the characteristic information includes sound source position information;
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
Determine the sound source position information extracted from the voice signal of input whether with user's mouth position information for detecting Matching;If not matching, it is determined that the voice signal is improper voice signal.
Preferably, further includes: send improper voice interference signal within the preset frequency range.
Preferably, the improper voice interference signal includes ultrasonic wave random interfering signal.
Preferably, before extracting characteristic information in the signal of input, further includes:
It is higher than the sample devices of setpoint frequency threshold value according to sample frequency, the signal of input is sampled;
Low-pass filtering treatment is carried out to the signal after sampling.
Preferably, further includes: when the signal for determining input is improper voice signal, send corresponding prompt information.
The present invention also provides a kind of terminal devices, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that described Processor executes above-mentioned method.
The present invention also provides a kind of signal processing methods, comprising:
Determine the corresponding control operation of the signal of input;
According to the corresponding control operation of the signal of input, determine whether the signal of input is improper voice signal.
Preferably, it is operated according to the corresponding control of the signal of input, determines whether the signal of input is improper voice letter Number, comprising:
Determine the control operation of present feasible;
If the corresponding control operation of the signal of input and the control of present feasible are operated and are mismatched, it is determined that the signal of input For improper voice signal.
The present invention also provides a kind of terminal devices, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that described Processor executes above-mentioned method.
The present invention also provides a kind of signal processing methods, comprising:
Receive the signal of input;
Improper voice interference signal is sent within the preset frequency range.
Preferably, the improper voice interference signal includes ultrasonic wave random interfering signal.
The present invention also provides a kind of terminal devices, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that described Processor executes above-mentioned method.
The present invention also provides a kind of signal processing methods, comprising:
Receive the signal of input;
It is higher than the sample devices of setpoint frequency threshold value according to sample frequency, the signal of input is sampled;
Low-pass filtering treatment is carried out to the signal after sampling.
The present invention also provides a kind of terminal devices, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that described Processor executes above-mentioned method.
In the present invention, determine whether the signal of the input is non-by the characteristic information extracted in the signal according to input The treatment process of normal speech signals realizes effective identification to improper voice signal, improves the precision of identification, and And safety guarantee is provided for the interactive voice of user, improve the use feeling of user.
Detailed description of the invention
Fig. 1 is the flow diagram of ultrasonic Attack Theory in the prior art;
Fig. 2 is the flow diagram in the prior art based on the processing of GMM-UBM Speaker Recognition System;
Fig. 3 is the flow diagram for the system processing that sound and actual sound are retransmitted in the difference based on SVM in the prior art;
Fig. 4 is the flow diagram of the discrimination improper voice signal provided by the present invention based on machine learning;
Fig. 5 is that the language of normal speech signals provided by the present invention and improper voice signal within the scope of fixed frequency is composed The contrast schematic diagram of figure information;
Fig. 6 is the comparison signal of the characteristic information of normal speech signals provided by the present invention and improper voice signal Figure;
Fig. 7 is that the voice signal of transformation wake up instruction provided by the present invention identifies schematic diagram;
Fig. 8 is the schematic diagram that terminal device provided by the present invention acquires sound;
Fig. 9 is the schematic diagram provided by the present invention that improper voice signal is distinguished according to changing features;
Figure 10 is the spectrum diagram under different sample frequencys provided by the present invention;
Figure 11 is high sample frequency analog-to-digital conversion flow diagram provided by the present invention;
Figure 12 is the flow diagram of the first signal processing method provided by the present invention;
Figure 13 is the block schematic illustration that machine learning provided by the present invention distinguishes improper voice signal;
Figure 14 is the display schematic diagram of the instruction of dynamic wake-up provided by the embodiment of the present invention two;
Figure 15 is that flow diagram is waken up provided by the embodiment of the present invention two;
Figure 16 is the process provided by the embodiment of the present invention seven according to the improper voice signal of user behavior comprehensive descision Schematic diagram;
Figure 17 is the process for judging improper voice signal provided by the embodiment of the present invention eight according to ambient condition information Schematic diagram;
Figure 18 is the structural schematic diagram of terminal device provided by the present invention;
Figure 19 is the flow diagram of second of signal processing method provided by the present invention;
Figure 20 is the flow diagram of the third signal processing method provided by the present invention;
Figure 21 is the flow diagram of the 4th kind of signal processing method provided by the present invention;
Figure 22 is the schematic block diagram of computing system provided by the present invention;
Figure 23 is the voice signal identification schematic diagram that combination dynamic verification code provided by the present invention verifies wake up instruction;
Figure 24 is the voice signal identification schematic diagram that combination dynamic verification code provided by the present invention verifies voice command;
Figure 25 is the schematic diagram that dynamic wallpaper provided by the present invention is embedded in voice command;
Figure 26 is the schematic diagram that the position provided by the present invention by hand-held terminal device judges mouth position;
Figure 27 is the flow diagram that combination dynamic verification code provided by the present invention verifies voice command method.
Specific embodiment
The present invention proposes a kind of signal processing method and terminal device, with reference to the accompanying drawing, to specific embodiment party of the present invention Formula is described in detail.
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.
For the signal processing method of the invention to be realized, the present invention provides following several processing modes, and respectively Respective explanations are done to each processing mode:
1) according to the feature differentiation of signal whether be normal speech signals processing mode
Under interactive voice state, the signal that terminal device receives may be voice signal, it is also possible to be ultrasonic signal Etc. improper voice signal.After terminal device receives signal, the processing in the speech signal collection stage in Fig. 1 is carried out, then The characteristic information of setting can be extracted from treated voice signal, the characteristic information of the setting may include for characterizing The energy feature of signal energy and/or periodic characteristic for characterizing the signal period, can be with area according to the characteristic information extracted Whether whether dividing is normal speech signals, i.e., be real speech.
For example, as shown in figure 4, terminal device speech signal collection phase acquisition signal, to through analog-to-digital conversion (ADC) The voice signal obtained afterwards extracts the language in setpoint frequency range (such as 2000 hertz to 3000 hertz) using bandpass filter Sound signal extracts the characteristic information of the voice signal, and using the mode of machine learning, according to the phonic signal character extracted Information distinguishes that the voice signal is true normal speech signals or demodulates improper voice signal for attack.
Wherein, the energy feature of above-mentioned voice signal can be short-time energy, and periodic characteristic can be short-time zero-crossing rate; Certainly, for the energy feature, it is not limited in arbitrarily indicating the feature of the energy feature of signal for short-time energy Information is within the scope of the present invention;For the periodic characteristic, it is not limited in as short-time zero-crossing rate, any energy Enough indicate that the characteristic information of the periodic characteristic of signal is within the scope of the present invention.
In the processing mode, as shown in figure 5, existing for normal speech signals and the improper voice signal for attack The comparison of 2000 hertz and 3000 hertz of sound spectrograph information, in the figure, location A, the B location of abscissa are in normal voice Signal and for distinguishing optional two corresponding positions in the sound spectrograph of the improper voice signal of attack, ordinate is brightness Value, wherein the brightness value of color is higher, illustrates that energy is stronger.The colour intensity value comparison of same position can be seen that from figure The energy of normal voice is apparently higher than the energy of attack voice.So being needed for the voice messaging after analog-to-digital conversion of input The voice signal that its frequency range is 2000 hertz to 3000 hertz is extracted using the bandpass filter in Fig. 4.
Wherein, for using the bandpass filter in Fig. 4 to extract (frequency range is 2000 hertz to 3000 hertz) Normal speech signals and improper voice signal for attack, characteristic information include short-time zero-crossing rate and short-time energy, such as Shown in Fig. 6, it can be seen that short-time energy and the difference of short-time zero-crossing rate are obvious, wherein the improper voice for attack Short-time energy (0~5x10 of signal-6) it is significantly lower than short-time energy (0~1x10 of normal speech signals-4), and be used to attack Improper voice signal short-time zero-crossing rate 0.1 or so, be but apparently higher than 0.06 or so normal speech signals in short-term Zero-crossing rate.
Wherein, short-time zero-crossing rate indicates that voice signal waveform is embodied across the number of horizontal axis (zero level) in a frame voice Voice cycle feature.Short-time energy indicates the energy of a frame voice signal.
To the characteristic information of the signal of extraction, using the mode of machine learning, identify the voice signal be it is true just Normal voice signal still demodulates the improper voice signal for attack.Machine learning includes but is not limited to support vector machines (SVM), the modes such as deep neural network (DNN, Deep Neural Network) and K nearest neighbo(u)r classification algorithm (KNN).Make Distinguish that the entire block diagram of the improper voice signal for attack is as shown in Figure 4 with machine learning.Wherein, Machine learning classifiers Effect be to distinguish whether voice signal is for attacking according to the signal characteristic (periodic characteristic and/or energy feature) of extraction The improper voice signal hit, if judging, the voice signal is true normal speech signals, executes corresponding voice and refers to It enables, if judging, the voice signal is the improper voice signal for attack, ignores corresponding phonetic order, or give user Corresponding prompt.
2) processing mode based on dynamic wake-up instruction
The voice wake up instruction of terminal device is fixed at present, such as user inputs terminal device and wakes up word ' Hey XXX ', or ' Hi XXX ', terminal device detect user input wake-up word be preset wake-up word after, can be waken up, into Enter interactive voice state.It is that product is initially defined that these, which wake up word, can not be changed.
And attacker can very easily obtain these voice documents for waking up word, say by recording or voice Synthesis obtains.Therefore, though terminal device user at one's side, attacker can also be called out by the ultrasound that people can not be allowed to hear The terminal device of awake user.
In view of the above-mentioned problems, present invention proposition may provide the user with dynamic wake-up instruction, example as shown in Figure 7.
Dynamic wake-up instruction is that is voice wake up instruction is variation, and this variation only has user oneself to know, Attacker can not just obtain the wake up instruction of user equipment in this way, and the terminal device of user also can not be just waken up by ultrasound ?.
3) by the changing features situation of signal come distinguishing attack signal
When usual user's using terminal equipment, user is very short with a distance from terminal device, when user's hand-held terminal device slightly Change the position between user and terminal device, then can the feature (direction character and/or energy feature) to its voice signal produce Raw larger impact.And the acoustic distance terminal device for being used to attack is slightly remote, and the placement location of general ultrasonic signal playback equipment Immobilize, though change, due to slightly remote apart from terminal device, the feature of signal can also be considered as it is constant, such as the institute of Fig. 8~9 Show.In Fig. 8, it is assumed that terminal device has the acquisition device of two acquisition sound, such as the sound collection point 1 and sound collection in figure Point 2, the sound source of user is closer apart from terminal device, the signal that the change in location between sound source and terminal device propagates it Feature is affected, and attacks sound source apart from the terminal device letter that the change in location between terminal device propagates it farther out Number feature influence it is little.In Fig. 9, what the normal speech signals of user were characterized in can changing, and the language for being used to attack Sound signal, it is characterized in that fixed, therefore terminal device can by detecting the situation of change of feature, come distinguishing attack signal and The normal speech signals of user.
When in order to realize that user inputs phonetic order, change the direction character of signal, user can be defeated to terminal device When entering phonetic order, change direction character by rocking terminal device or changing the modes such as sound source position.In addition, can also To change energy feature by rocking terminal device or doing in a manner of gesture (such as waving).And attacker does not often know hand Gesture movement, so that the feature of signal can not be changed.
Equipment end can extract energy feature and/or direction character, then for the voice signal after analog-to-digital conversion It, according to the situation of change of the energy feature of signal and/or direction character, can be distinguished for attacking by way of machine learning Improper voice signal and user normal speech signals.
4) using the ADC (analog-digital converter) of higher sample frequency
Under recording situation of the experimental phenomena discovery in the recording pen that sample frequency (Fs) is 44.1KHz, can normally it connect Ultrasonic wave signal to attack is received, ultrasonic wave signal to attack can be normally parsed.Ultrasonic wave cannot be parsed in sample frequency 96KHz to attack Hit signal.
Since big multi-terminal equipment supports the sample frequency of 44.1KHz.From emulation phenomenon discovery, 44.1KHz sample frequency Spectrum leakage superposition can occur, to make ultrasonic wave signal to attack be mixed into phonetic order, such as sample frequencys different in Figure 10 Shown in spectrogram.
By Nyquist criterion for sampling sampling frequency Fs >=2*Fh (highest frequency of input), if using the ADC of 96KHz Sample frequency can sample the ultrasonic wave signal to attack of highest 48KHz.Due to sending more than the ultrasonic signal of 25KHz or more Cost and instrument size, it is all relatively difficult in actual attack operation.So in the speech signal collection stage, use is high-frequency 96KHz ADC sample frequency can inhibit spectral aliasing.Condition is provided to further filter out ultrasonic wave signal to attack.It is specific real It is as shown in figure 11 to apply scheme, modulus sample frequency is increased to 96KHz, then using wave digital lowpass filter.Certainly, software side Face is also required to support the sample frequency of 96KHz.
5) interference signal is used
When terminal device has detected signal input, the loudspeaker of equipment side can be made in setpoint frequency range (such as 20KHz to 48KHz) interior frequency hopping or full range send ultrasonic wave random noise signal, then detect the phonetic order of input, such as Fruit is normal voice instruction input, and terminal device can be parsed correctly.Because of the ultrasonic wave of normal phonetic order and loudspeaker Random noise signal is in different frequencies, so not interfered by ultrasonic wave random noise signal, if instead attacking for ultrasonic wave Signal is hit, then cannot be normally resolved for the Ultrasonic Voice instruction of attack.Ultrasonic wave signal to attack can be raised with terminal device Ultrasonic wave random noise signal transmitted by sound device forms interference, and the signal that final terminal device receives is that ultrasonic wave is made an uproar at random Acoustical signal is superimposed with ultrasonic wave signal to attack, and terminal device finally can not correctly parse ultrasonic wave signal to attack, to reach The purpose of defence ultrasonic wave attack.In order to save power consumption, this scheme can use under working condition upon awakening.
Further, since ultrasonic wave random noise signal may have an impact other terminal devices, the ultrasonic wave of transmitting Random noise signal should suitably reduce output power, prevent from causing ultrasonic wave to pollute on other terminal devices.In order to as far as possible Power consumption is saved, terminal device loudspeaker should be defeated using frequency hopping or full range as far as possible in device frequency range (such as 20KHz to 48KHz) Out.
6) according to ambient condition information and the user behavior comprehensive descision learnt
The comprehensive descisions such as the environmental information of the behavior of user's routine use and surrounding that are learnt according to speech control system are worked as Whether the phonetic order content of preceding input is normal user's usage behavior.If it is determined that current input signal content is not just Normal user's usage behavior then requires to input new check information, for example requires to say special instruction, or operate as requested The behaviors such as terminal device are verified, and have only been passed through the additional checking treatment, can just have been executed corresponding instruction.
7) dynamic verification code is combined to be waken up and identified
Current terminal device inputs " Hello XXX " to terminal device using fixed phonetic order, such as user, or " hello, XXX ", and terminal device detects the wake-up word of user's input as that can be waken up after preset wake-up word.It is set in terminal After being waken up, when user is to terminal device input " opening XXX application ", " making a phone call to XXX ", terminal device is identified correctly Voice content, the instruction of corresponding interactive voice will be executed.
Attacker can wake up the terminal device of user, or the terminal to user by the ultrasound that people can not be allowed to hear Equipment transmission executes instruction.
In view of the above-mentioned problems, the present invention proposes a kind of method that combination dynamic verification code is waken up, as shown in figure 23, when User inputs the phonetic order of " Hi XX ", and terminal device detects that current speech instruction is consistent with default wake-up command, then eventually End equipment screen can show dynamic verification code (such as random number word string, i.e., immediately identifying code), such as " 015 " immediately, and user is then Input " 015 " voice or written order, terminal device detect user input identifying code whether the dynamic verification code with generation Unanimously, if unanimously, it is believed that the voice of user's input is normal speech signals, equipment just can successfully be waken up, if inconsistent, It is believed that the voice of user's input is improper voice signal.In addition, terminal device can also be before wake-up by dynamic verification code It is shown to user, user can simultaneously or successively input wake up instruction and dynamic verification code, terminal when waking up terminal device Equipment is consistent with default wake-up command in the wake up instruction for detecting input, and the dynamic of the dynamic verification code and generation inputted is tested When card code is consistent, it is believed that the voice of user's input is normal speech signals, and terminal device just can successfully be waken up.
A kind of method that the present invention also proposes combination dynamic verification code to execute voice command, as shown in figure 24.Terminal is set After standby wake-up, random string, such as " 756 " can be shown in voice command input interface, user can be simultaneously or successively defeated Enter to execute instruction and dynamic verification code, dynamic verification code one of the terminal device in the dynamic verification code and generation for detecting input When cause, the instruction of user is just can be performed in terminal device, if user's input " opens XX and applies 756 ", just can successfully open XX application.
In the above scheme, dynamic verification code can regard to the characteristic information of voice signal as, terminal device can be from connecing Extracted in the voice signal of receipts user input dynamic verification code, then determine extract dynamic verification code whether with generation Dynamic verification code (i.e. current dynamic wake-up instructs corresponding wake-up characteristic information) matching;If not matching, it is determined that input Voice signal is improper voice signal.
8) dynamic wallpaper is embedded in voice command
It is easy to be waken up by attacker to solve the problem of that terminal device wake-up command is single, the present invention proposes a kind of dynamic The method that state wallpaper insertion voice command is waken up, user can choose several images, as dynamic wallpaper of mobile phone, and to every It opens wallpaper and registers respective voice command respectively, as shown in figure 25, wallpaper understands auto-changing to terminal device when in use, works as user When intending to wake up terminal device, the corresponding voice command of current wallpaper can be inputted, terminal device detects the language of user's input When sound order voice command corresponding with current wallpaper (being referred to as dynamic wake-up image) (i.e. wake-up word content) is consistent, really Surely the voice signal inputted is normal speech signals, and terminal device can be just waken up, if inconsistent, it is determined that the voice of input is believed Number be improper voice signal.For example, when picture 2 is shown in current phone wallpaper, and receive the phonetic order of user For " when Hi, little dog ", terminal device will be waken up.
9) judge mouth position and auditory localization
The source of the sound of voice command should be the mouth of user, and the mouth position of user can be by using camera Detection, or is judged by the position of user's hand-held terminal device, as shown in figure 26, by the position of hand-held terminal device, It can judge that voice command should be respectively derived from the rear portion of terminal device, front and left side respectively.Sound localization method can To judge the sounding point of voice command, by judging whether mouth position and sound source position unanimously determine that voice command is source In real user or attacker.
Based on above-mentioned provided several processing modes, the present invention provides a kind of audio signal processing methods, such as Figure 12 It is shown, comprising:
Step 1201, characteristic information is extracted from the signal of input.
Wherein, characteristic information is extracted from the signal of input, comprising:
Characteristic information is extracted out of the signal of input setpoint frequency.
Wherein, the signal of above-mentioned input can indicate the voice signal obtained after signal acquisition stage is handled.
Before this step process, further includes:
It is higher than the sample devices of setpoint frequency threshold value according to sample frequency, the signal of input is sampled;
Low-pass filtering treatment is carried out to the signal after sampling.
Sample devices can be, but not limited to as ADC.
Step 1202, according to the characteristic information of extraction, determine whether the signal of input is improper voice signal.
The characteristic information according to extraction determines whether for the processing of improper voice signal, including following several processing sides Formula:
(1) processing of the improper voice signal of the determination, comprising:
According to the characteristic information of extraction, by way of machine learning, determine whether the signal of input is improper voice Signal.
Wherein, features described above information includes the energy feature of signal and/or the periodic characteristic of signal.
The energy feature specifically includes short-time energy feature;The periodic characteristic specifically includes short-time zero-crossing rate feature.
It further,, can when whether the signal for determining input is improper voice signal by way of machine learning Can also be can also be certainly according only to periodic characteristic according to both energy feature and periodic characteristic according only to energy feature Combination be determined.
Wherein it is possible to extract energy feature in setpoint frequency range (such as frequency range is 2000 hertz to 3000 hertz) And/or the periodic characteristic of signal.
(2) processing of the improper voice signal of the determination, comprising:
According to the variation of the characteristic information of extraction, determine whether the signal of input is improper voice signal.
Wherein, this feature information includes the energy feature of signal and/or the direction character of signal.
(3) processing of the improper voice signal of the determination, comprising:
Determine characteristic information wake-up characteristic information matching whether corresponding with current dynamic wake-up instruction;If not Match, it is determined that the voice signal is improper voice signal.
Preferably, the wake-up characteristic information includes mel-frequency cepstrum coefficient MFCC feature and/or dynamic verification code.
(4) characteristic information extracted includes background noise information;
The processing of the improper voice signal of the determination, comprising:
According to the background noise information of signal, determine whether the signal of input is improper voice signal.
Wherein, according to the background noise information of signal, determine whether the signal of input is improper voice signal, comprising:
Current context information is determined according to the ambient noise of signal, according to the current context information, it is determined whether be non- Normal speech signals.
Wherein, it is mismatched if current context information control corresponding with the signal of input operates corresponding environmental information, It is considered that the signal of input is improper voice signal.
(5) characteristic information extracted includes voice command,
The processing of the improper voice signal of the determination, comprising:
It can be, but not limited to determine the voice command and current dynamic wake-up image pair by utilizing keyword detection technology Whether the wake-up word content answered matches;If not matching, it is determined that the voice signal is improper voice signal.
(6) characteristic information extracted includes sound source position information;
The processing of the improper voice signal of the determination, comprising:
Determine the sound source position information extracted from the voice signal of input whether with user's mouth position information for detecting Matching;If not matching, it is determined that the voice signal is improper voice signal.
Certainly, it when determining improper voice signal, can also be realized by handling as follows to improper voice signal Interference send improper voice within the preset frequency range to achieve the effect that directly to filter out improper voice signal Interference signal, to generate interference to improper voice signal by the improper voice interference signal, so that it is improper to destroy this The voice of voice signal identify it can not by terminal device.
Wherein, which includes ultrasonic wave random interfering signal.The improper voice interference signal Be not limited in the ultrasonic wave random interfering signal, for arbitrarily can be realized the voice of improper voice signal is generated it is dry The signal for destroying its voice is disturbed and then reaches, it is within the scope of the present invention.
Further, when the signal for determining input is improper voice signal, corresponding prompt information is sent.
For signal processing method provided by aforementioned present invention and each corresponding processing mode, separately below to aforementioned Each processing mode is specifically illustrated in a manner of specific embodiment.
Embodiment one
The embodiment of the invention discloses one kind to believe voice by extracting phonic signal character, and using machine learning mode Number feature is distinguished, thus the method for identifying the improper voice signal for attack, the detailed system frame of the present embodiment Figure is as shown in figure 13.
Step:
1. preparing a large amount of normal speech signals and the improper voice signal data for attack being used for training machine The network model and classifier of habit, the data volume the big more the discrimination rate of the stability of network model and classifier can be improved.
2. extraction normal speech signals reconciliation recalls the improper voice signal for attack by bandpass filter The voice messaging that frequency range is 2000 hertz to 3000 hertz.
3. extracting the short-time zero-crossing rate of normal speech signals and improper voice signal after bandpass filter and short When energy feature.
4. and the voice characteristics information extracted (short-time zero-crossing rate and short-time energy feature) is inputted machine learning network Network model training and classifier training are carried out, is trained until Machine learning classifiers high-precision can be identified with this The similar improper voice messaging and normal voice information for attack of data.
5. trained Machine learning classifiers are used, using the collected voice messaging of terminal device microphone as the figure Tested speech in 13, by bandpass filter by the tested speech information frequency scope limitation to 2000 hertz to 3000 hertz Hereby, the characteristic information for the onestep extraction tested speech information of going forward side by side includes short-time zero-crossing rate and short-time energy, then by this feature Information is directly inputted in trained Machine learning classifiers, thus judge the voice messaging classification of microphone acquisition, Judge that the tested speech information is normal voice or the improper voice for attack.
Embodiment two
Dynamic wake up instruction generates, and notifies user.
Dynamic instruction can be customized by the user, such as the word or sentence of wake up instruction oneself can be arranged in user; The word or sentence that one group of user wakes up can also be arranged by system default;It can also be that system passes through detection user's this period Hobby and focus, then extract word or sentence of the keyword as wake up instruction.
Current many terminal devices can float on the screen when standby mode shows information.As shown in figure 14, So connected to it wearable set can also be may be displayed on by showing on the screen in standby mode by waking up word On standby screen, user sees current wake up instruction, then wakes up the terminal device by reading wake up instruction.
For example, one of the wake-up word that terminal device defines before currently randomly choosing, shows, user on the screen See wake-up word as shown on screen, then says current awake word.
Terminal device extracts MFCC feature, will extract for the voice signal after speech signal collection phase process Feature wake-up feature (MFCC feature) corresponding with current dynamic wake-up instruction (such as the wake-up word shown on the screen) into Row matching, if mismatching, it is determined that the voice signal is improper voice signal, keeps standby mode, or provide corresponding prompt Information, for normal speech signals, wakes up screen if matching.As shown in figure 15, terminal device is first called out in the manner described above Word of waking up detects, if being detected as normal speech signals, carries out Speaker Identification, if being identified by, wakes up screen.
Embodiment three
By the changing features situation of signal come distinguishing attack signal.
This mode can for user to terminal device initiate phonetic order when slightly left and right, rock up and down terminal device or Gesture motion etc. is added in person before terminal device and user, to change the feature of user voice signal.Terminal device is according to letter Number feature situation of change, distinguish whether be normal speech signals, if normal speech signals, just phonetic order is rung It answers.
In this approach, regular variation can occur for the feature of the voice signal of input, if terminal device detects The changing pattern of the feature of the voice signal of input is consistent with setting changing pattern, then it is assumed that the voice signal is normal language Sound signal, otherwise it is assumed that being non-normal speech signals.
The machine learning network model for above-mentioned changing pattern can also be trained in advance, according to the voice signal of input Whether the machine learning network model of feature and training, distinguishing is normal speech signals.
Example IV
The embodiment of the invention discloses a kind of methods of prevention ultrasonic wave attack, including the anti-of hardware and software aspects Model measure.
Purpose is the information security of guarantee user, takes precautions against potential malicious attack.
Step:
1. for ultrasonic signal during through the speech sample of electromechanical Mike, use is higher in the speech signal collection stage The ADC of frequency is sampled, to effectively inhibit spectral aliasing.If attacker is set using the ultrasonic signal attack of 25KHz It is standby, and terminal device is sampled with the ADC of 96KHz, and spectral aliasing will not occur.
2. ultrasonic wave signal to attack is completely retained in corresponding frequency band since spectral aliasing not occurring, pass through number Word LPF (Low-Pass Filter, low-pass filter) can filter out mixed ultrasonic wave signal to attack, to ensure that equipment Safety, specific implementation process high sample frequency analog-to-digital conversion flow chart shown in Figure 11.It discloses through the embodiment of the present invention Interference ultrasonic wave attack method, and then guarantee user information security, take precautions against potential malicious attack.
Embodiment five
In the present embodiment, the mode of ultrasonic wave random noise signal, Lai Shixian attack defending are sent by using frequency hopping Purpose.
Step:
The prompt 1. waiting voice input prompt or Mike (Microphone, Mic) work.
2. prompting if there is voice inputs, loudspeaker Frequency hopping transmissions random noise is triggered.The range of frequency hopping be (20KHz~ 48KHz) carry out the transmission of ultrasonic wave random noise signal, the ultrasonic signal in interference band.
3. since the voice signal of people and the ultrasonic wave random noise signal of loudspeaker sending be not in identical frequency range It is interior, thus the voice signal of people not will receive loudspeaker sending ultrasonic wave random noise signal interference, the voice of people it is normal Instruction can be parsed normally by equipment.
4. if can be made an uproar at random by ultrasonic wave of the frequency hopping in (20KHz~48KHz) input is ultrasonic wave signal to attack Acoustical signal is interfered, and then the demodulation of interference attack signal, destroys the voice of signal to attack, identify it can not by equipment, from And achieve the purpose that attack defending.
In the above-described embodiments, it for ultrasonic wave random noise signal sending method, is exemplified below: in 20KHz~48KHz In frequency range, first selection 20KHz~24KHz frequency carry out the transmission of ultrasonic wave random noise signal, then select 44KHz~ The frequency of 48KHz carries out the transmission of ultrasonic wave random noise signal, and the frequency for choosing 24KHz~28KHz again later carries out ultrasonic wave Random noise signal is sent, successively back and forth.
Embodiment six
Similar to above-described embodiment five, in the present embodiment, the complexity for reducing software and hardware is considered, in entire ultrasonic frequency band Interior (20KHz~48KHz) sends ultrasonic wave random noise signal.
Step:
The prompt 1. waiting voice input prompt or Mike (Mic) work.
2. prompting if there is voice inputs, loudspeaker shot noise is triggered.The frequency range of noise be (20KHz~ 48KHz) carry out the transmission of ultrasonic wave random noise signal, the ultrasonic signal in interference band.
3. since the voice signal of people and the ultrasonic wave random noise signal of loudspeaker sending be not in identical frequency range It is interior, so the voice signal of people not will receive the ultrasonic wave random noise signal interference of loudspeaker sending, the normal finger of human speech sound Order can be parsed normally by equipment.
4. if can be made an uproar at random by the ultrasonic wave in frequency band in (20KHz~48KHz) input is ultrasonic wave signal to attack Acoustical signal is interfered, and then the demodulation of interference attack signal, destroys the voice of signal to attack, identify it can not by equipment, from And achieve the purpose that attack defending.
In the above-described embodiments, it for ultrasonic wave random noise signal sending method, is exemplified below: in 20KHz~48KHz In frequency range, the frequency band for choosing 20KHz~48KHz carries out the transmission of ultrasonic wave random noise signal, chooses again after 5ms The frequency band of the 20KHz~48KHz carries out the transmission of ultrasonic wave random noise signal, loops back and forth like this.
Embodiment seven
Present embodiment discloses one kind by voice assistant learn user behavior comprehensive descision current speech instruct whether be The method of improper voice for attack,
It determines the corresponding control operation of the signal of input, is operated according to the corresponding control of the signal of input, determine input Whether signal is improper voice signal.Wherein it is possible to the control operation of present feasible be determined, if the corresponding control of signal of input System operation and the control of present feasible are operated and are mismatched, it is determined that the signal of input is improper voice signal.
It can be, but not limited to the control operation that present feasible is determined according to user behavior (such as historical user's operation behavior).
As shown in figure 16.Specific steps include:
While normal use speech control system, speech control system can carry out the usage behavior of user comprehensive user Close study.
When voice assistant receives new instruction, user's usage behavior for being learnt according to voice assistant to present instruction into Row is comprehensive to distinguish judge whether it is normal use behavior.
If identifying active user's behavior is normal use behavior, then it is assumed that current speech signal is true normal language Sound signal, and executing instruction operations.
If identifying active user's behavior is non-normal use behavior, user can be required to carry out additional verification letter Breath verifying includes but is not limited to say special instruction, or carry out the behaviors such as terminal device operation as requested.If passing through volume Outer check information verifying, then it is assumed that current speech signal is actual speech signal and executes corresponding instruction operation, is otherwise regarded For the improper voice signal for attack, to ignore the phonetic order.
For example, when user is transferred accounts by voice assistant to other people, it will usually first make a phone call to be confirmed, then make again The instruction of " transferring accounts to so-and-so " is issued with voice assistant.And others using for attack improper voice signal when, can It can be the instruction that " transferring accounts to so-and-so " is directly issued using voice assistant, because of its process for not making a phone call, then by this behavior Be considered as improper user behavior, thus it requires carry out additional verification, such as said to terminal device shown on screen at this time it is short Language rocks terminal device etc. as requested, if being unable to complete the check information, which is considered as and is used to attack Improper voice signal, directly ignore the instruction of " transferring accounts to so-and-so ".It, should if the check information can be smoothly completed Instruction is considered as true normal speech signals, and executes the instruction of " transferring accounts to so-and-so ".
Embodiment eight
Learn to be referred to according to ambient condition information comprehensive descision current speech by voice assistant present embodiment discloses a kind of Whether be the method for attacking voice, as shown in figure 17 if enabling.
Step:
For user while normal use voice assistant, the environmental information that voice assistant judges surrounding (can be according to back Scape noise determines), the usage behavior of user and ambient condition information are combined, integrated learning is carried out.
When voice assistant receives new instruction, voice assistant carries out synthesis to present instruction according to ambient condition information and distinguishes Not, judge whether it is normal use behavior.
If identifying active user's behavior is normal use behavior, then it is assumed that current speech signal is true normal language Sound signal, and executing instruction operations.
If identifying active user's behavior is non-normal use behavior, tested it is required that user carries out additional check information Card includes but is not limited to say special instruction, or carry out the behaviors such as terminal device operation as requested.If passed through additionally Check information verifying, then it is assumed that current speech signal is true normal speech signals and executing instruction operations, is otherwise considered as use In the improper voice signal of attack, to ignore the phonetic order.
For example, user usually operates under quieter environment when being transferred accounts by voice assistant to other people, And others using for attack improper voice signal when, occasion may be the more noisy environment such as subway, station, Ambient noise is also more noisy, and the phonetic order transferred accounts then is considered as improper user behavior at this time, thus it requires carrying out additional Verification, for example say the phrase shown on screen at this time to terminal device or rock terminal device etc. as requested, such as Fruit is unable to complete the check information, then the phonetic order is considered as to the improper voice signal for being used for attacking, directly ignores and transfers accounts Phonetic order.If the check information can be smoothly completed, which is considered as true normal speech signals, and Execute subsequent transfer operation.
Embodiment nine
The method that present embodiment discloses a kind of to prevent attack voice by using dynamic verification code, as shown in figure 27.
Step:
After terminal device receives voice command, voice wake-up module or voice assistant progress keyword detection (can also With the detection of referred to as order word), judge whether voice command is the default voice command that wakes up word or terminal device and can be performed;
If the voice command detected is not default wake-up word, nor the voice command that voice assistant is executable, then It is judged for improper voice, may be signal to attack;
If the voice command detected is default wake-up word or the voice command that voice assistant can be performed, terminal Device screen can show dynamic verification code (can be random verification code);
User reads the dynamic verification code that terminal device is shown;
The dynamic verification code that terminal device reads user detects, if dynamic verification code and screen that user reads The dynamic authentication digital content of display is consistent, then is judged as true normal speech signals, and executing instruction operations (as waken up), no Then, it is judged as improper voice, may is signal to attack.
Embodiment ten
Present embodiment discloses a kind of methods for preventing attack voice by dynamic wallpaper insertion voice command.
Step:
User can choose several images, be the respective voice command of every image registration;
The image setting of selection is the dynamic wallpaper of terminal device by terminal device;
When user intends to wake up terminal device, the voice command of capable of emitting current dynamic wallpaper registration;
Keyword detection is carried out to the voice command of user to sentence if the content detected is consistent with the order of registration Break as real speech and wake up terminal device, conversely, being judged as attack voice.
Embodiment 11
Present embodiment discloses a kind of by tracking mouth position and auditory localization to determine whether to attack the side of voice Method.
The source of the sound of voice command should be the mouth of user, may determine that the sounding of voice by auditory localization Point is judged as normal voice if mouth position is consistent with sound source position, conversely, being improper voice.
Step:
After the voice signal for detecting input, the sounding for the voice signal that auditory localization algorithm is received can be used Sound source position information is extracted from the voice signal of input in position;
Terminal device can also judge the mouth position of user.The mouth position of user can directly be detected by camera, Perhaps by way of user's hand-held terminal device or gravity sensing judges face location, to obtain the sounding side of user Position;
Judge whether mouth position is consistent with sound source position, if position consistency, judges that voice command is used from true Family, the voice signal of input are normal speech signals, conversely, being attack voice, the voice signal of input is improper voice letter Number.
Based on audio signal processing method provided by aforementioned present invention, the present invention also provides a kind of terminal devices, such as Shown in Figure 18, comprising:
Processor 1801;And
Memory 1802, be configured to storage machine readable instructions, described instruction by the processor execute when so that The processor executes above-mentioned method.
The present invention also provides a kind of signal processing methods, as shown in figure 19, comprising:
1901, determine the corresponding control operation of the signal of input.
1902, it is operated according to the corresponding control of the signal of input, determines whether the signal of input is improper voice signal.
Preferably, it is operated according to the corresponding control of the signal of input, determines whether the signal of input is improper voice letter Number, comprising:
Determine the control operation of present feasible;
If the corresponding control operation of the signal of input and the control of present feasible are operated and are mismatched, it is determined that the signal of input For improper voice signal.
Wherein it is possible to but be not limited to determine the control of present feasible according to user behavior (such as historical user's operation behavior) Operation.
Based on above-mentioned signal processing method, the present invention also correspondence provides a kind of terminal device, as shown in figure 18, comprising:
Processor 1801;And
Memory 1802, be configured to storage machine readable instructions, described instruction by the processor execute when so that The processor executes the above method.
The present invention also provides a kind of signal processing methods, as shown in figure 20, comprising:
2001, receive the signal of input.
2002, improper voice interference signal is sent within the preset frequency range.
Preferably, the improper voice interference signal includes ultrasonic wave random interfering signal.
Based on above-mentioned signal processing method, the present invention also correspondence provides a kind of terminal device, as shown in figure 18, comprising:
Processor 1801;And
Memory 1802, be configured to storage machine readable instructions, described instruction by the processor execute when so that The processor executes the above method.
The present invention also provides a kind of signal processing methods, as shown in figure 21, comprising:
2101, receive the signal of input.
2102, the sample devices of setpoint frequency threshold value is higher than according to sample frequency, the signal of input is sampled.
2103, low-pass filtering treatment is carried out to the signal after sampling.
Based on above-mentioned signal processing method, the present invention also correspondence provides a kind of terminal device, as shown in figure 18, comprising:
Processor 1801;And
Memory 1802, be configured to storage machine readable instructions, described instruction by the processor execute when so that The processor executes the above method.
Figure 22 is diagrammatically illustrated according to the base station that can be used for realizing the disclosure of the embodiment of the present disclosure or user equipment The block diagram of computing system.
As shown in figure 22, computing system 600 includes processor 610, computer readable storage medium 620, output interface 630 and input interface 640.The computing system 600 can execute the method above with reference to Figure 12 and Figure 19-21 description, With realize to the signal of input whether be improper voice signal judgement.
Specifically, processor 610 for example may include general purpose microprocessor, instruction set processor and/or related chip group And/or special microprocessor (for example, specific integrated circuit (ASIC)), etc..Processor 610 can also include using for caching The onboard storage device on way.Processor 610, which can be, refers to method flow that Figure 12 and Figure 19-21 are described not for executing With single treatment unit either multiple processing units of movement.
Computer readable storage medium 620, such as can be times can include, store, transmitting, propagating or transmitting instruction Meaning medium.For example, readable storage medium storing program for executing can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, Device or propagation medium.The specific example of readable storage medium storing program for executing includes: magnetic memory apparatus, such as tape or hard disk (HDD);Optical storage Device, such as CD (CD-ROM);Memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication chain Road.
Computer readable storage medium 620 may include computer program 621, which may include generation Code/computer executable instructions, by processor 610 execute when make processor 610 execute for example above in conjunction with Figure 12, with And method flow described in Figure 19-21 and its any deformation.
Computer program 621 can be configured to have the computer program code for example including computer program module.Example Such as, in the exemplary embodiment, the code in computer program 621 may include one or more program modules, for example including 621A, module 621B ....It should be noted that the division mode and number of module are not fixation, those skilled in the art can To be combined according to the actual situation using suitable program module or program module, when these program modules are combined by processor 610 When execution, allow processor 610 execute for example above in conjunction with method flow described in Figure 12 and Figure 19-21 and its Any deformation.
In accordance with an embodiment of the present disclosure, output interface 630 and input interface 640 can be used to execute in processor 610 Face combines method flow and its any deformation described in Figure 12 and Figure 19-21.
In the present invention, determine whether the signal of the input is non-by the characteristic information extracted in the signal according to input The treatment process of normal speech signals realizes effective identification to improper voice signal, improves the precision of identification, and And safety guarantee is provided for the interactive voice of user, improve the use feeling of user.
Those skilled in the art of the present technique be appreciated that can be realized with computer program instructions these structure charts and/or The combination of each frame and these structure charts and/or the frame in block diagram and/or flow graph in block diagram and/or flow graph.This technology neck Field technique personnel be appreciated that these computer program instructions can be supplied to general purpose computer, special purpose computer or other The processor of programmable data processing method is realized, to pass through the processing of computer or other programmable data processing methods The scheme specified in frame or multiple frames of the device to execute structure chart and/or block diagram and/or flow graph disclosed by the invention.
Wherein, the modules of apparatus of the present invention can integrate in one, can also be deployed separately.Above-mentioned module can close And be a module, multiple submodule can also be further split into.
It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, module or stream in attached drawing Journey is not necessarily implemented necessary to the present invention.
It will be appreciated by those skilled in the art that the module in device in embodiment can describe be divided according to embodiment It is distributed in the device of embodiment, corresponding change can also be carried out and be located in one or more devices different from the present embodiment.On The module for stating embodiment can be merged into a module, can also be further split into multiple submodule.
Aforementioned present invention serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
Disclosed above is only several specific embodiments of the invention, and still, the present invention is not limited to this, any ability What the technical staff in domain can think variation should all fall into protection scope of the present invention.

Claims (26)

1. a kind of signal processing method characterized by comprising
Characteristic information is extracted from the signal of input;
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal.
2. the method as described in claim 1, which is characterized in that the characteristic information includes the energy feature and/or letter of signal Number periodic characteristic.
3. method according to claim 2, which is characterized in that the energy feature includes short-time energy feature;And/or institute Stating periodic characteristic includes short-time zero-crossing rate feature.
4. method as claimed in claim 2 or claim 3, which is characterized in that according to the characteristic information of extraction, determine that the signal of input is No is improper voice signal, comprising:
According to the characteristic information of extraction, by way of machine learning, determine whether the signal of input is improper voice signal.
5. such as the described in any item methods of Claims 1 to 4, which is characterized in that extract characteristic information from the signal of input, wrap It includes:
Characteristic information is extracted out of the signal of input setpoint frequency.
6. the method as described in claim 1, which is characterized in that according to the characteristic information of extraction, whether the signal of determining input For improper voice signal, comprising:
According to the variation of the characteristic information of extraction, determine whether the signal of input is improper voice signal.
7. method as claimed in claim 6, which is characterized in that the characteristic information includes the energy feature and/or letter of signal Number direction character.
8. the method as described in claim 1, which is characterized in that according to the characteristic information of extraction, whether the signal of determining input For improper voice signal, comprising:
Determine characteristic information wake-up characteristic information matching whether corresponding with current dynamic wake-up instruction;If not matching, Then determine that the voice signal is improper voice signal.
9. method according to claim 8, which is characterized in that the characteristic information includes mel-frequency cepstrum coefficient MFCC special Sign and/or dynamic verification code.
10. the method as described in claim 1, which is characterized in that the characteristic information includes voice command,
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
Determine whether institute's speech commands wake-up word content corresponding with current dynamic wake-up image matches;If not matching, really The fixed voice signal is improper voice signal.
11. the method as described in claim 1, which is characterized in that the characteristic information includes background noise information;
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
According to the background noise information of signal, determine whether the signal of input is improper voice signal.
12. method as claimed in claim 11, which is characterized in that according to the background noise information of signal, determine the letter of input It number whether is improper voice signal, comprising:
Current context information is determined according to the ambient noise of signal, according to the current context information, it is determined whether be improper Voice signal.
13. the method as described in claim 1, which is characterized in that the characteristic information includes sound source position information;
According to the characteristic information of extraction, determine whether the signal of input is improper voice signal, comprising:
Determine the sound source position information extracted from the voice signal of input whether with user's mouth position information matches for detecting; If not matching, it is determined that the voice signal is improper voice signal.
14. such as method of any of claims 1-13, which is characterized in that further include: it sends out within the preset frequency range Send improper voice interference signal.
15. method as claimed in claim 14, which is characterized in that the improper voice interference signal includes that ultrasonic wave is random Interference signal.
16. the method as described in any one of claim 1-15, which is characterized in that extract characteristic information from the signal of input Before, further includes:
It is higher than the sample devices of setpoint frequency threshold value according to sample frequency, the signal of input is sampled;
Low-pass filtering treatment is carried out to the signal after sampling.
17. the method as described in any one of claim 1-16, which is characterized in that further include: be in the signal of determining input When improper voice signal, corresponding prompt information is sent.
18. a kind of terminal device, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that the processing Method described in any one of device perform claim requirement 1~17.
19. a kind of signal processing method characterized by comprising
Determine the corresponding control operation of the signal of input;
According to the corresponding control operation of the signal of input, determine whether the signal of input is improper voice signal.
20. method as claimed in claim 19, which is characterized in that operated, determined defeated according to the corresponding control of the signal of input Whether the signal entered is improper voice signal, comprising:
Determine the control operation of present feasible;
If the corresponding control operation of the signal of input and the control of present feasible are operated and are mismatched, it is determined that the signal of input is non- Normal speech signals.
21. a kind of terminal device, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that the processing Method described in any one of device perform claim requirement 19~20.
22. a kind of signal processing method characterized by comprising
Receive the signal of input;
Improper voice interference signal is sent within the preset frequency range.
23. method as claimed in claim 22, which is characterized in that the improper voice interference signal includes that ultrasonic wave is random Interference signal.
24. a kind of terminal device, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that the processing Method described in any one of device perform claim requirement 22~23.
25. a kind of signal processing method characterized by comprising
Receive the signal of input;
It is higher than the sample devices of setpoint frequency threshold value according to sample frequency, the signal of input is sampled;
Low-pass filtering treatment is carried out to the signal after sampling.
26. a kind of terminal device, comprising:
Processor;And
Memory is configured to storage machine readable instructions, and described instruction by the processor when being executed, so that the processing Method described in device perform claim requirement 25.
CN201810401796.XA 2017-11-02 2018-04-28 signal processing method and terminal device Pending CN109754817A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711066073 2017-11-02
CN2017110660730 2017-11-02

Publications (1)

Publication Number Publication Date
CN109754817A true CN109754817A (en) 2019-05-14

Family

ID=66402376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810401796.XA Pending CN109754817A (en) 2017-11-02 2018-04-28 signal processing method and terminal device

Country Status (1)

Country Link
CN (1) CN109754817A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310660A (en) * 2019-06-06 2019-10-08 上海工程技术大学 A kind of voice re-sampling detection method based on sound spectrograph
CN111145739A (en) * 2019-12-12 2020-05-12 珠海格力电器股份有限公司 Vision-based awakening-free voice recognition method, computer-readable storage medium and air conditioner
CN111326143A (en) * 2020-02-28 2020-06-23 科大讯飞股份有限公司 Voice processing method, device, equipment and storage medium
CN113300783A (en) * 2021-04-27 2021-08-24 厦门亿联网络技术股份有限公司 Ultrasonic data transmission method, device and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101936014A (en) * 2010-09-12 2011-01-05 上海市第二市政工程有限公司 On-line real-time monitoring method for safety stability of foundation pit during stratum vibration
CN102436810A (en) * 2011-10-26 2012-05-02 华南理工大学 Record replay attack detection method and system based on channel mode noise
CN102737480A (en) * 2012-07-09 2012-10-17 广州市浩云安防科技股份有限公司 Abnormal voice monitoring system and method based on intelligent video
CN102945675A (en) * 2012-11-26 2013-02-27 江苏物联网研究发展中心 Intelligent sensing network system for detecting outdoor sound of calling for help
CN103035238A (en) * 2012-11-27 2013-04-10 中国科学院自动化研究所 Encoding method and decoding method of voice frequency data
CN103632356A (en) * 2012-08-29 2014-03-12 华为技术有限公司 Method and device for enhancing image spatial resolution
CN103903633A (en) * 2012-12-27 2014-07-02 华为技术有限公司 Method and apparatus for detecting voice signal
CN104121985A (en) * 2013-04-29 2014-10-29 艾默生电气(美国)控股公司(智利)有限公司 Selective decimation and analysis of oversampled data
CN104795068A (en) * 2015-04-28 2015-07-22 深圳市锐曼智能装备有限公司 Robot awakening control method and robot awakening control system
CN104966053A (en) * 2015-06-11 2015-10-07 腾讯科技(深圳)有限公司 Face recognition method and recognition system
CN105304091A (en) * 2015-06-26 2016-02-03 信阳师范学院 Voice tamper recovery method based on DCT
CN106128475A (en) * 2016-07-12 2016-11-16 华南理工大学 Wearable intelligent safety equipment based on abnormal emotion speech recognition and control method
CN106725612A (en) * 2016-12-23 2017-05-31 深圳开立生物医疗科技股份有限公司 Four-dimensional ultrasound image optimization method and system
WO2017114307A1 (en) * 2015-12-30 2017-07-06 中国银联股份有限公司 Voiceprint authentication method capable of preventing recording attack, server, terminal, and system
CN107046517A (en) * 2016-02-05 2017-08-15 阿里巴巴集团控股有限公司 A kind of method of speech processing, device and intelligent terminal

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101936014A (en) * 2010-09-12 2011-01-05 上海市第二市政工程有限公司 On-line real-time monitoring method for safety stability of foundation pit during stratum vibration
CN102436810A (en) * 2011-10-26 2012-05-02 华南理工大学 Record replay attack detection method and system based on channel mode noise
CN102737480A (en) * 2012-07-09 2012-10-17 广州市浩云安防科技股份有限公司 Abnormal voice monitoring system and method based on intelligent video
CN103632356A (en) * 2012-08-29 2014-03-12 华为技术有限公司 Method and device for enhancing image spatial resolution
CN102945675A (en) * 2012-11-26 2013-02-27 江苏物联网研究发展中心 Intelligent sensing network system for detecting outdoor sound of calling for help
CN103035238A (en) * 2012-11-27 2013-04-10 中国科学院自动化研究所 Encoding method and decoding method of voice frequency data
CN103903633A (en) * 2012-12-27 2014-07-02 华为技术有限公司 Method and apparatus for detecting voice signal
CN104121985A (en) * 2013-04-29 2014-10-29 艾默生电气(美国)控股公司(智利)有限公司 Selective decimation and analysis of oversampled data
CN104795068A (en) * 2015-04-28 2015-07-22 深圳市锐曼智能装备有限公司 Robot awakening control method and robot awakening control system
CN104966053A (en) * 2015-06-11 2015-10-07 腾讯科技(深圳)有限公司 Face recognition method and recognition system
CN105304091A (en) * 2015-06-26 2016-02-03 信阳师范学院 Voice tamper recovery method based on DCT
WO2017114307A1 (en) * 2015-12-30 2017-07-06 中国银联股份有限公司 Voiceprint authentication method capable of preventing recording attack, server, terminal, and system
CN107046517A (en) * 2016-02-05 2017-08-15 阿里巴巴集团控股有限公司 A kind of method of speech processing, device and intelligent terminal
CN106128475A (en) * 2016-07-12 2016-11-16 华南理工大学 Wearable intelligent safety equipment based on abnormal emotion speech recognition and control method
CN106725612A (en) * 2016-12-23 2017-05-31 深圳开立生物医疗科技股份有限公司 Four-dimensional ultrasound image optimization method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310660A (en) * 2019-06-06 2019-10-08 上海工程技术大学 A kind of voice re-sampling detection method based on sound spectrograph
CN110310660B (en) * 2019-06-06 2021-10-08 上海工程技术大学 Speech resampling detection method based on spectrogram
CN111145739A (en) * 2019-12-12 2020-05-12 珠海格力电器股份有限公司 Vision-based awakening-free voice recognition method, computer-readable storage medium and air conditioner
CN111326143A (en) * 2020-02-28 2020-06-23 科大讯飞股份有限公司 Voice processing method, device, equipment and storage medium
CN113300783A (en) * 2021-04-27 2021-08-24 厦门亿联网络技术股份有限公司 Ultrasonic data transmission method, device and storage medium

Similar Documents

Publication Publication Date Title
US11450324B2 (en) Method of defending against inaudible attacks on voice assistant based on machine learning
CN109754817A (en) signal processing method and terminal device
CN105323648B (en) Caption concealment method and electronic device
US10455342B2 (en) Sound event detecting apparatus and operation method thereof
WO2019210796A1 (en) Speech recognition method and apparatus, storage medium, and electronic device
Wang et al. Ghosttalk: Interactive attack on smartphone voice system through power line
CN106572411A (en) Noise cancelling control method and relevant device
WO2022033556A1 (en) Electronic device and speech recognition method therefor, and medium
CN110047481A (en) Method for voice recognition and device
WO2020088483A1 (en) Audio control method and electronic device
CN110097875A (en) Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN102981615B (en) Gesture identifying device and recognition methods
CN110223711A (en) Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN110428806A (en) Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN110111776A (en) Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN108650402A (en) A kind of method of Anti-addiction, equipment and computer readable storage medium
CN103428328A (en) Method and system for automatically setting sound volume of mobile terminal
Basak et al. mmspy: Spying phone calls using mmwave radars
CN112347450A (en) Identity verification method based on blink sound signal
WO2022199405A1 (en) Voice control method and apparatus
CN104599667B (en) Information processing method and electronic equipment
Chen et al. Sok: A modularized approach to study the security of automatic speech recognition systems
CN106782498A (en) Voice messaging player method, device and terminal
US20230239800A1 (en) Voice Wake-Up Method, Electronic Device, Wearable Device, and System
CN209606794U (en) A kind of wearable device, sound-box device and intelligent home control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination