CN108806702B - Detection method and device for ultrasonic voice hidden attack - Google Patents

Detection method and device for ultrasonic voice hidden attack Download PDF

Info

Publication number
CN108806702B
CN108806702B CN201810804883.XA CN201810804883A CN108806702B CN 108806702 B CN108806702 B CN 108806702B CN 201810804883 A CN201810804883 A CN 201810804883A CN 108806702 B CN108806702 B CN 108806702B
Authority
CN
China
Prior art keywords
voice
ultrasonic
frequency
attack
time window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810804883.XA
Other languages
Chinese (zh)
Other versions
CN108806702A (en
Inventor
毛剑
祝施施
刘建伟
关振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201810804883.XA priority Critical patent/CN108806702B/en
Publication of CN108806702A publication Critical patent/CN108806702A/en
Application granted granted Critical
Publication of CN108806702B publication Critical patent/CN108806702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a detection method and a device for ultrasonic voice hiding attack, wherein the method comprises the following steps: receiving current environmental noise, and obtaining a current noise threshold according to the current environmental noise; receiving all ultrasonic signals around the equipment which is possibly attacked; carrying out spectrum analysis on the ultrasonic signal and acquiring a central frequency; judging whether the central frequency is in the attack frequency range or not, and sending out a reminding alarm when the central frequency is in the attack frequency range; demodulating the ultrasonic signal through the central frequency and carrying out band-pass filtering to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency and obtain a baseband signal; and detecting whether the baseband signal contains voice according to the current noise threshold so as to send out a high-level alarm when the baseband signal contains voice. The method effectively improves the reliability and economy of detection, effectively ensures the privacy safety of the user, improves the use experience of the user, and is simple and easy to implement.

Description

Detection method and device for ultrasonic voice hidden attack
Technical Field
The invention relates to the technical field of intelligent voice equipment, communication and information security, in particular to a detection method and a detection device for ultrasonic voice hidden attack.
Background
The voice control is widely applied to intelligent equipment due to convenience and non-contact, and particularly at the present that intelligent home is more and more popular, the voice becomes a bridge connected with a person and an object, and people can directly control electric appliances in the home to work through the voice. However, if the voice control system is attacked maliciously, an attacker can manipulate the device freely to perform malicious behaviors to acquire personal data of the user, and the privacy disclosure and the potential safety hazard follow up are worthy of attention and attention.
At present, the attack for a speech recognition system is endless, and one attack for a speech recognition algorithm makes a classification model obtain an error result; one is to utilize the mechanism that the robot and the human recognize the voice to be different, produce the instruction that the human can't understand but the machine can't understand, achieve the goal of controlling the machine; the other is ultrasonic voice hiding attack, an attacker modulates the voice command to an ultrasonic band by using amplitude so as not to cause the user to be alert, and due to the inherent nonlinearity of the microphone, the original voice command can be demodulated from the ultrasonic signal generated by modulating the attack command inside the microphone so as to control the equipment. The last type of ultrasonic voice hiding attack is of primary concern here.
The existing detection and defense modes comprise ① speaker recognition, wherein equipment only recognizes the voice of a holder, but firstly, a speaker recognition algorithm is not perfect at present and can be cracked violently, secondly, attack voice can be made through simple voice synthesis as long as a voice fragment of an attacked is obtained, and finally, in the application scene of smart home, a large number of family members are provided, and the family voice assistant cost is too high for everyone to train the family voice assistant, ② trains a classifier to recognize attack signals by utilizing the difference of the demodulated voice signals and the original voice signals on the frequency spectrum.
The inaudible ultrasonic voice hiding attack causes certain loss under the condition that a user is difficult to perceive, and the privacy and the safety of the user are difficult to be reliably protected due to the lack of a detection method aiming at the attack at present.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one purpose of the present invention is to provide a detection method for ultrasonic voice hiding attack, which effectively improves the reliability and economy of detection, effectively ensures the privacy and safety of users, improves the user experience, and is simple and easy to implement.
Another object of the present invention is to provide a detection apparatus for ultrasonic voice hiding attack.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting an ultrasonic voice hiding attack, including the following steps: receiving current environmental noise, and obtaining a current noise threshold according to the current environmental noise; receiving all ultrasonic signals around the equipment which is possibly attacked; carrying out spectrum analysis on the ultrasonic signal and acquiring a central frequency; judging whether the central frequency is in an attack frequency range or not, and sending out a reminding alarm when the central frequency is in the attack frequency range; demodulating the ultrasonic signal through the central frequency and carrying out band-pass filtering to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency and obtain a baseband signal; detecting whether the baseband signal contains speech according to the current noise threshold to issue a high-level alarm when the speech is contained.
According to the detection method for the ultrasonic voice hiding attack, the voice hiding attack is detected according to the current environmental noise and all ultrasonic signals around the equipment, so that huge loss caused by the voice hiding attack under the condition that a user is difficult to perceive is effectively avoided, and the voice hiding attack is effectively detected under the condition that the internal structure of the attacked equipment is not changed, so that the reliability and the economical efficiency of detection are effectively improved, the privacy safety of the user is effectively ensured, the use experience of the user is improved, and the method is simple and easy to implement.
In addition, the detection method for the ultrasonic voice hiding attack according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the obtaining a current noise threshold according to the current environmental noise further includes: collecting noise of k time windows, respectively calculating the energy of each time window to be stored in a queue, and acquiring an initial noise threshold value by taking the average energy of the k time windows as a reference; and after receiving a new time window, adding the energy of the new time window into the queue, discarding the earliest time window, and updating according to the average energy of k time windows of the current array to obtain the current noise threshold.
Further, in an embodiment of the present invention, the calculation formula of the energy per time window is:
Figure GDA0002502202860000021
wherein x isj(i) Is the ith sample of the jth time window, and s is the number of samples of each time window;
the calculation formula of the average energy is as follows:
Figure GDA0002502202860000022
wherein k is the number of time windows.
Further, in an embodiment of the present invention, the method further includes: and when the duration of the ultrasonic wave is greater than the preset maximum time limit, intercepting the ultrasonic wave signal according to the maximum time limit.
In order to achieve the above object, another embodiment of the present invention provides a detection apparatus for ultrasonic voice hiding attack, including: the first receiving module is used for receiving the current environmental noise and obtaining a current noise threshold value according to the current environmental noise; the second receiving module is used for receiving all ultrasonic signals around the equipment which is possibly attacked; the analysis module is used for carrying out spectrum analysis on the ultrasonic signal and acquiring a central frequency; the judging module is used for judging whether the central frequency is in an attack frequency range or not and sending out a reminding alarm when the central frequency is in the attack frequency range; the demodulation and filtering module is used for demodulating the ultrasonic signal through the central frequency and carrying out band-pass filtering so as to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency and obtain a baseband signal; and the detection module is used for detecting whether the baseband signal contains voice according to the current noise threshold so as to send out a high-level alarm when the baseband signal contains the voice.
The detection device for the ultrasonic voice hiding attack, provided by the embodiment of the invention, is used for detecting the voice hiding attack according to the current environmental noise and all ultrasonic signals around the equipment, so that huge loss caused by the voice hiding attack under the condition that a user is difficult to perceive is effectively avoided, and the voice hiding attack is effectively detected under the condition that the internal structure of the attacked equipment is not changed, thereby effectively improving the reliability and economical efficiency of detection, effectively ensuring the privacy safety of the user, improving the use experience of the user, and being simple and easy to implement.
In addition, the detection device for ultrasonic voice hiding attack according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the first receiving module is further configured to collect noise of k time windows, calculate energy of each time window to exist in a queue, obtain an initial noise threshold with the average energy of the k time windows as a reference, add the energy of the new time window into the queue when a new time window is received, discard an oldest time window, and update the current noise threshold according to the average energy of the k time windows of the current array.
Further, in an embodiment of the present invention, the calculation formula of the energy per time window is:
Figure GDA0002502202860000031
wherein x isj(i) Is the ith sample of the jth time window, and s is the number of samples of each time window;
the calculation formula of the average energy is as follows:
Figure GDA0002502202860000032
wherein k is the number of time windows.
Further, in an embodiment of the present invention, the method further includes: and the intercepting module is used for intercepting the ultrasonic signal according to the maximum time limit when the duration of the ultrasonic is greater than the preset maximum time limit.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for detection of ultrasonic voice hiding attacks according to one embodiment of the present invention;
FIG. 2 is a flow chart of a method for detecting an ultrasonic voice hiding attack according to an embodiment of the present invention;
FIG. 3 is a flow diagram of short-term energy-based voice activity detection according to one embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a detection apparatus for ultrasonic voice hiding attack according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a detection apparatus for ultrasonic voice hiding attack according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for detecting an ultrasonic voice hiding attack according to an embodiment of the present invention with reference to the drawings, and first, a method for detecting an ultrasonic voice hiding attack according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 1 is a flowchart of a detection method for ultrasonic voice hiding attack according to an embodiment of the present invention.
As shown in fig. 1, the detection method for ultrasonic voice hiding attack includes the following steps:
in step S101, the current environmental noise is received, and a current noise threshold is obtained according to the current environmental noise.
It will be appreciated that, as shown in fig. 2, embodiments of the present invention first collect ambient noise, for example, embodiments of the present invention use a microphone to periodically receive ambient noise, which is used to calculate and update the noise threshold T in real time.
Further, in an embodiment of the present invention, obtaining the current noise threshold according to the current environmental noise further includes: collecting noise of k time windows, respectively calculating the energy of each time window to be stored in a queue, and acquiring an initial noise threshold value by taking the average energy of the k time windows as a reference; and after receiving the new time window, adding the energy of the new time window into the queue, discarding the earliest time window, and updating according to the average energy of the k time windows of the current array to obtain the current noise threshold.
Specifically, as shown in fig. 2, the ambient noise is received using the microphone timing (which is always looping) and used to calculate and update the noise threshold T in real time. Calculation score initial value T of T0① initial value threshold value T0And (3) calculating: initially collecting noise of k time windows, respectively calculating energy of each time window to be stored in a queue, and calculating average energy of the k time windows
Figure GDA0002502202860000041
Determining an initial threshold T as a reference0
Further, in one embodiment of the present invention, the energy for each time window is calculated by the formula:
Figure GDA0002502202860000042
wherein x isj(i) Is the ith sample of the jth time window, s is the sample of each time windowCounting;
the average energy is calculated as follows, that is, the average energy of the current background noise is:
Figure GDA0002502202860000051
wherein k is the number of time windows.
Finally, T0Is formed by
Figure GDA0002502202860000052
The initial threshold value of the decision is made,
Figure GDA0002502202860000053
further, the update policy of T: every time a new time window is received, its energy is calculated and added to the queue, while the oldest time window is discarded, T is updated by the average energy calculation of k time windows in the current array, the calculation method and T0And (5) the consistency is achieved. Setting the maximum value T of the thresholdmaxIf T is>TmaxIf T is equal to Tmax
In step S102, all ultrasonic signals around the device that may be attacked are received.
In an embodiment of the present invention, the method of an embodiment of the present invention further comprises: and when the duration of the ultrasonic wave is greater than the preset maximum time limit, intercepting the ultrasonic wave signal according to the maximum time limit.
It is understood that, as shown in fig. 2, the embodiment of the present invention may use the ultrasonic receiving apparatus to receive all the ultrasonic waves around the device that may be attacked, and set a maximum time limit τ for reception. If the actual time length t of the ultrasonic signal0Tau is less than or equal to, the time length of the signal sent to the subsequent processing is the actual time length t of the detected ultrasonic wave0If t is0>And tau, intercepting the signal with the duration of the front tau and directly starting the subsequent processing.
In step S103, the ultrasonic signal is subjected to spectrum analysis, and the center frequency is acquired.
It can be understood thatAs shown in FIG. 2, if the ultrasonic signal c (t) is received, the signal is subjected to spectrum analysis and the center frequency f thereof is obtainedc
In step S104, it is determined whether the center frequency is within the attack frequency range, and if so, a warning is issued.
It is understood that the judgment fcIf the frequency range is in the attack frequency range (ultrasonic band), if the frequency range is in the attack frequency range, a low-level alarm can be sent out to remind the user equipment that the equipment is possibly attacked, and preventive measures need to be taken. If not, the process returns to step S102, and a new round of reception detection is restarted.
In step S105, the ultrasonic signal is demodulated by the center frequency and band-pass filtered to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency, and a baseband signal is obtained.
In one embodiment of the present invention, the demodulation formula is:
c′(t)=c(t)·cos(2·π·fc·t),
wherein f iscIs the center frequency, c (t) is the ultrasonic signal, and t is the time parameter.
It will be appreciated that embodiments of the present invention, as shown in figure 2, use a center frequency fcDemodulating the received ultrasonic signal to obtain c' (t), performing band-pass filtering, and filtering frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency to obtain a baseband signal m (t). Wherein the content of the first and second substances,
c′(t)=c(t)·cos(2·π·fc·t)。
in step S106, it is detected whether the baseband signal contains speech according to the current noise threshold, so that when speech is contained, a high-level alarm is issued.
It can be understood that, as shown in fig. 2, the embodiment of the present invention detects whether m (t) contains speech. The detection needs to be based on the latest noise threshold T calculated in step S101, if m (T) is detected as a voice signal, it is considered that the ultrasonic signal is modulated with an attack instruction and is an attack signal, at this time, a high-level alarm is issued, and then, the process returns to step S102 to repeat the receiving and detecting process.
The method for detecting the ultrasonic voice hiding attack will be further described with reference to specific embodiments.
The embodiment of the invention makes a section of ultrasonic signal which modulates the attack instruction as the attack signal. The modulation mode is amplitude modulation, and the carrier frequency is 25 kHz. The detection steps for the attack are as follows:
step 1-1: and T is calculated according to the background noise of the current environment. Calculating the average energy of the noise data of 12 time windows currently in the queue
Figure GDA0002502202860000061
Here T was determined to be 0.05.
Step 1-2: the ultrasonic receiving device receives the ultrasonic signal c (t), and let τ be 5s, namely, when the ultrasonic signal lasts for more than 5 seconds, the ultrasonic receiving device intercepts the content of 5s and starts the subsequent processing. The cut-off frequency of the low-pass filter is set to 50kHz, the amplification factor of the power amplifier is 10, and the sampling rate of the analog-to-digital converter is set to 192 kHz.
Step 2: carrying out fast Fourier transform of 2^19 points on digital signals c (n) obtained by digital-to-analog conversion to obtain central frequency fcIs 24999 Hz.
And step 3: f. ofcAnd sending low-level alarm within the optimal attack range of 20 k-50 kHz.
And 4, step 4: using fcDemodulating c (n) to obtain a demodulated signal c' (n). Band pass filtering c' (n). And filtering out non-speech components and keeping 200-4000 Hz components.
Step 5, voice activation detection is carried out on m (n), the voice activation detection based on short-time energy is adopted in the embodiment, the process comprises ① resampling m (n) to ensure that the sampling rate is reduced to 8kHz, ② takes 20ms as a time window to frame data, the number of samples in each time window is 160, ③ because the filter is not an ideal filter, the average value of all samples in the time window needs to be subtracted from each sample value when calculating the short-time energy of each time window, namely the average value of all samples in the time window is calculated
Figure GDA0002502202860000062
Wherein q isj(i) Is the ith sample value of the jth time window, q'j(i) To subtract the sample values from the mean value,
Figure GDA0002502202860000063
is the mean of all samples within the time window. Then calculate the short-time energy Qj
Figure GDA0002502202860000064
④ mixing QjComparing with a noise threshold T, if Qj>And T, judging the jth time window as voice. The general flow of short-term energy-based voice activity detection is shown in fig. 3.
Step 6: and determining whether m (n) contains an attack instruction according to the time window number p judged as the voice. When p >10, m (n) can be considered to contain an attack instruction. At this time, it is detected that all time windows from m (n) to 36 are voice, so that m (n) is judged to be voice signal, it is considered that attack instruction is modulated in c (t), and it is an attack signal, and a high-grade alarm is sent out, and then it returns to step 1-2, and the receiving and detecting process is repeated.
According to the detection method for the ultrasonic voice hiding attack, provided by the embodiment of the invention, the voice hiding attack is detected according to the current environmental noise and all ultrasonic signals around the equipment, so that huge loss caused by the voice hiding attack under the condition that a user is difficult to perceive is effectively avoided, and the voice hiding attack is effectively detected under the condition that the internal structure of the attacked equipment is not changed, thereby effectively improving the reliability and economical efficiency of detection, effectively ensuring the privacy safety of the user, improving the use experience of the user, and being simple and easy to implement.
Next, a detection apparatus for ultrasonic voice hiding attack according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 4 is a schematic structural diagram of a detection apparatus for ultrasonic voice hiding attack according to an embodiment of the present invention.
As shown in fig. 4, the detection apparatus 10 for ultrasonic voice hiding attack includes: the device comprises a first receiving module 100, a second receiving module 200, an analyzing module 300, a judging module 400, a demodulating and filtering module 500 and a detecting module 600.
The first receiving module 100 is configured to receive current environmental noise, and obtain a current noise threshold according to the current environmental noise. The second receiving module 200 is used to receive all the ultrasonic signals around the device that may be attacked. The analysis module 300 is used for performing spectrum analysis on the ultrasonic signal and acquiring a center frequency. The determining module 400 is configured to determine whether the center frequency is within the attack frequency range, and send a warning when the center frequency is within the attack frequency range. The demodulation and filtering module 500 is configured to demodulate the ultrasonic signal through the center frequency and perform band-pass filtering to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency, and obtain a baseband signal. The detection module 600 is used to detect whether the baseband signal contains voice according to the current noise threshold, so as to issue a high-level alarm when the baseband signal contains voice. The device 10 of the embodiment of the invention performs voice hiding attack detection according to the current environmental noise and all ultrasonic signals around the equipment, thereby effectively improving the reliability and economy of detection, effectively ensuring the privacy safety of users, improving the use experience of the users, and being simple and easy to implement.
Further, in an embodiment of the present invention, the first receiving module 100 is further configured to collect noise of k time windows, and calculate that energy of each time window exists in a queue, obtain an initial noise threshold based on average energy of the k time windows, add energy of a new time window into the queue when the new time window is received, discard an oldest time window, and update to obtain a current noise threshold according to average energy of the k time windows of the current array.
Further, in one embodiment of the present invention, the energy for each time window is calculated by the formula:
Figure GDA0002502202860000071
wherein x isj(i) Is the ith sample of the jth time window, and s is the number of samples of each time window;
the average energy is calculated as:
Figure GDA0002502202860000081
wherein k is the number of time windows.
Further, in one embodiment of the present invention, the demodulation formula is:
c′(t)=c(t)·cos(2·π·fc·t),
wherein f iscIs the center frequency, c (t) is the ultrasonic signal, and t is the time parameter.
Further, in one embodiment of the present invention, the apparatus 10 of the embodiment of the present invention further comprises: and (5) an intercepting module. The intercepting module is used for intercepting the ultrasonic signals according to the maximum time limit when the duration of the ultrasonic is greater than the preset maximum time limit.
The detection apparatus for ultrasonic voice hiding attack will be further described with reference to fig. 5.
As shown in fig. 5, in an embodiment of the present invention, the main modules include an ultrasonic receiving device, a low pass filter, a power amplifier, an analog-to-digital converter, a frequency analyzing module, a demodulator, a band pass filter, a microphone, a noise threshold extracting module, a voice detecting module, and an alarm. The ultrasonic receiving device can receive external ultrasonic waves in real time; the low-pass filter is for anti-aliasing and has a cut-off frequency fdWith the sampling rate F of the A/D convertersThe interval satisfies:
Fs≥2fd
the power amplifier is used for amplifying the signal; the analog-to-digital converter converts the analog signal into a digital signal so as to carry out digital signal processing subsequently; performing fast Fourier transform on the received ultrasonic signal in a frequency analysis module and obtaining the center frequency f thereofcJudgment fcWhether or not it is attackingHit the frequency range and if so, issue a low level alarm. If not, the subsequent processing is not carried out, and a new round of detection circulation is restarted; making carrier frequency f in demodulatorcDemodulation of (2); the band-pass filter is used for filtering non-voice frequency components; the microphone is used for collecting environmental noise; the noise threshold value extraction module calculates and updates a noise threshold value T by using the environmental noise collected by the microphone; the voice detection module carries out voice activation detection according to the noise threshold T and judges whether m (n) is a voice signal or not; the alarm sends out alarms of different levels under different conditions.
In summary, the embodiment of the present invention innovatively provides an independent device for detecting an ultrasonic voice hiding attack, so that a user can conveniently place the detection device in an environment that needs protection, such as an intelligent home scene. Compared with the prior detection by utilizing the difference between the attack signal frequency domain and the normal voice frequency domain, the method does not need to change the internal structure and the algorithm of the attacked device, and does not cause the increase of the operation amount and the time delay in the identification process. Meanwhile, the embodiment of the invention can not misjudge the normal voice command as the attack command, because the ultrasonic receiving device can not receive the normal voice signal, and the detection with quite high accuracy can be realized along with the improvement of the voice detection.
According to the detection device for the ultrasonic voice hiding attack, provided by the embodiment of the invention, the voice hiding attack is detected according to the current environmental noise and all ultrasonic signals around the equipment, so that huge loss caused by the voice hiding attack under the condition that a user is difficult to perceive is effectively avoided, and the voice hiding attack is effectively detected under the condition that the internal structure of the attacked equipment is not changed, thereby effectively improving the reliability and economical efficiency of detection, effectively ensuring the privacy safety of the user, improving the use experience of the user, and being simple and easy to realize.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. A detection method aiming at ultrasonic voice hidden attack is characterized by comprising the following steps:
receiving current environmental noise, and obtaining a current noise threshold according to the current environmental noise;
receiving all ultrasonic signals around the equipment which is possibly attacked;
carrying out spectrum analysis on the ultrasonic signal and acquiring a central frequency;
judging whether the central frequency is in an attack frequency range or not, and sending out a reminding alarm when the central frequency is in the attack frequency range;
demodulating the ultrasonic signal through the central frequency and carrying out band-pass filtering to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency and obtain a baseband signal; and
detecting whether the baseband signal contains speech according to the current noise threshold to issue a high-level alarm when the speech is contained.
2. The method according to claim 1, wherein the obtaining a current noise threshold according to the current environmental noise further comprises:
collecting noise of k time windows, respectively calculating energy of each time window to be stored in a queue, and acquiring an initial noise threshold value by taking the energy of the k time windows as a reference;
and after receiving a new time window, adding the energy of the new time window into the queue, discarding the earliest time window, and updating according to the energy of k time windows of the current array to obtain the current noise threshold.
3. The method according to claim 2, wherein the energy of each time window is calculated by the following formula:
Figure FDA0002502202850000011
wherein x isj(i) Is the ith sample of the jth time window, and s is the number of samples per time window.
4. The method for detecting the ultrasonic voice hiding attack according to any one of claims 1 to 3, further comprising:
and when the duration of the ultrasonic wave is greater than the preset maximum time limit, intercepting the ultrasonic wave signal according to the maximum time limit.
5. A detection apparatus for ultrasonic voice hiding attack, comprising:
the first receiving module is used for receiving the current environmental noise and obtaining a current noise threshold value according to the current environmental noise;
the second receiving module is used for receiving all ultrasonic signals around the equipment which is possibly attacked;
the analysis module is used for carrying out spectrum analysis on the ultrasonic signal and acquiring a central frequency;
the judging module is used for judging whether the central frequency is in an attack frequency range or not and sending out a reminding alarm when the central frequency is in the attack frequency range;
the demodulation and filtering module is used for demodulating the ultrasonic signal through the central frequency and carrying out band-pass filtering so as to filter out frequency components higher than the upper limit of the voice frequency and lower than the lower limit of the voice frequency and obtain a baseband signal; and
and the detection module is used for detecting whether the baseband signal contains voice according to the current noise threshold so as to send out a high-level alarm when the baseband signal contains the voice.
6. The apparatus according to claim 5, wherein the first receiving module is further configured to collect noise of k time windows, calculate energy of each time window to exist in a queue, obtain an initial noise threshold based on the energy of the k time windows, add the energy of a new time window to the queue and discard an oldest time window when the new time window is received, and update the current noise threshold according to the energy of the k time windows of the current array.
7. The apparatus according to claim 6, wherein the energy of each time window is calculated by the following formula:
Figure FDA0002502202850000021
wherein x isj(i) Is the ith sample of the jth time window, and s is the number of samples per time window.
8. The apparatus for detecting an ultrasonic voice hiding attack according to any one of claims 5 to 7, further comprising:
and the intercepting module is used for intercepting the ultrasonic signal according to the maximum time limit when the duration of the ultrasonic is greater than the preset maximum time limit.
CN201810804883.XA 2018-07-20 2018-07-20 Detection method and device for ultrasonic voice hidden attack Active CN108806702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810804883.XA CN108806702B (en) 2018-07-20 2018-07-20 Detection method and device for ultrasonic voice hidden attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810804883.XA CN108806702B (en) 2018-07-20 2018-07-20 Detection method and device for ultrasonic voice hidden attack

Publications (2)

Publication Number Publication Date
CN108806702A CN108806702A (en) 2018-11-13
CN108806702B true CN108806702B (en) 2020-07-03

Family

ID=64077224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810804883.XA Active CN108806702B (en) 2018-07-20 2018-07-20 Detection method and device for ultrasonic voice hidden attack

Country Status (1)

Country Link
CN (1) CN108806702B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739560A (en) * 2020-05-21 2020-10-02 浙江大学 Ultrasonic time domain processing-based replay attack living body detection method
CN112581975A (en) * 2020-12-11 2021-03-30 中国科学技术大学 Ultrasonic voice instruction defense method based on signal aliasing and two-channel correlation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193353A (en) * 2006-11-22 2008-06-04 乐金电子(昆山)电脑有限公司 Illegal frequency detection device of portal terminal and its method
CN101816135A (en) * 2007-08-20 2010-08-25 索尼特技术公司 ultrasound detectors
CN101952860A (en) * 2008-02-22 2011-01-19 艾迪泰克有限公司 Intrusion detection system with signal recognition
CN108172224A (en) * 2017-12-19 2018-06-15 浙江大学 The method without vocal command control voice assistant based on the defence of machine learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4109863A (en) * 1977-08-17 1978-08-29 The United States Of America As Represented By The United States Department Of Energy Apparatus for ultrasonic nebulization
US9529071B2 (en) * 2012-12-28 2016-12-27 Rakuten, Inc. Ultrasonic-wave communication system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193353A (en) * 2006-11-22 2008-06-04 乐金电子(昆山)电脑有限公司 Illegal frequency detection device of portal terminal and its method
CN101816135A (en) * 2007-08-20 2010-08-25 索尼特技术公司 ultrasound detectors
JP5256291B2 (en) * 2007-08-20 2013-08-07 ソニター テクノロジーズ アクティーゼルスカブ Ultrasonic detector
CN101952860A (en) * 2008-02-22 2011-01-19 艾迪泰克有限公司 Intrusion detection system with signal recognition
CN108172224A (en) * 2017-12-19 2018-06-15 浙江大学 The method without vocal command control voice assistant based on the defence of machine learning

Also Published As

Publication number Publication date
CN108806702A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN107371085B (en) Safety protection method and device and intelligent sound box
CN109616140B (en) Abnormal sound analysis system
CN108806702B (en) Detection method and device for ultrasonic voice hidden attack
KR20200037399A (en) How to defend your voice assistant from being controlled by machine learning based silence commands
CN104200606B (en) Point-shaped light scattering type smoke detector without optical labyrinth, and signal processing method
CN105448303A (en) Voice signal processing method and apparatus
CN105100758B (en) Method, equipment and camera for safety monitoring
US20050171768A1 (en) Detection of voice inactivity within a sound stream
CN107645343A (en) Data transmission/method of reseptance and data transmission system based on sound wave
CN106328151B (en) ring noise eliminating system and application method thereof
CN109816987A (en) A kind of automobile whistle electronic police enforces the law capturing system and its grasp shoot method
CN108597164B (en) Anti-theft method, anti-theft device, anti-theft terminal and computer readable medium
CN106205606A (en) A kind of dynamic positioning and monitoring method based on speech recognition and system
CN100521670C (en) Detecting and analyzing method for multi system frequency shift key control signal
CN109065058B (en) Voice communication method, device and system
CN101833843A (en) Monitoring system based on voiceprint authentication
CN108039182B (en) Voice activation detection method
CN102930864A (en) Sound networking voice information keyword mining system based on child nodes
CN108469404B (en) PM2.5 concentration estimation system and method
CN106531193B (en) A kind of abnormal sound detection method that ambient noise is adaptive and system
CN105185381A (en) Intelligent robot-based voice identification system
CN104239510B (en) A kind of information processing method, device and electronic equipment
CN103236863B (en) Muting method of FM (frequency modulation) digital modulation-demodulation circuit by hard limiter
CN112382051B (en) Wisdom house security protection system based on block chain
CN104374885A (en) Explosive gas dangerous sound resource function detection device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant