CN111091831B - Silent lip language recognition method and system - Google Patents

Silent lip language recognition method and system Download PDF

Info

Publication number
CN111091831B
CN111091831B CN202010016710.9A CN202010016710A CN111091831B CN 111091831 B CN111091831 B CN 111091831B CN 202010016710 A CN202010016710 A CN 202010016710A CN 111091831 B CN111091831 B CN 111091831B
Authority
CN
China
Prior art keywords
signals
wave
silent
signal
carrier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010016710.9A
Other languages
Chinese (zh)
Other versions
CN111091831A (en
Inventor
顾昌展
温力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010016710.9A priority Critical patent/CN111091831B/en
Publication of CN111091831A publication Critical patent/CN111091831A/en
Application granted granted Critical
Publication of CN111091831B publication Critical patent/CN111091831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

A silent lip language identification method and system are disclosed, wherein millimeter wave signals as carrier waves are continuously emitted and focused in the oral cavity area of a user, the millimeter wave signals are modulated and partially reflected on phases through the speaking behavior of the user, and the speech behavior information of the user is obtained by converting the reflected signals to a baseband, correcting the reflected signals and then adopting speech phase fuzzy linear reconstruction based on triangular transformation. The invention has the advantages of no contact, strong penetrability, high precision and the like; the method can accurately track the fine displacement movement of the lips and well detect the accurate voice command.

Description

Silent lip language recognition method and system
Technical Field
The invention relates to a technology in the field of information security, in particular to a silent lip language identification method and a silent lip language identification system based on a small 120GHz interference radar system.
Background
As interaction with computer devices becomes more prevalent, the trend in interaction is to become more natural and intelligent. People have therefore developed a variety of natural user interaction interfaces such as touch screens, gaze tracking, gesture recognition and speech recognition systems, where speech recognition is of particular interest because it is similar to the way people issue commands in daily life. However, it is inconvenient for people to use speech recognition in some situations, such as where silence should be maintained, or where privacy is desired in public. In addition, some people may lose the ability to speak because of illness, and their need for language communication should be taken into account. Thus, the concept of silent lip language perception emerged. Several methods for silent lip language perception are currently being investigated.
Disclosure of Invention
The invention provides a silent lip language identification method and system based on millimeter wave radar interference phase, and provides a silent lip language identification method and system aiming at the problem of phase ambiguity in millimeter wave nonlinear phase modulation in the prior art, and the silent lip language identification method and system have the advantages of no need of contact, strong penetrability, high precision and the like; the method can accurately track the fine displacement movement of the lips and well detect the accurate voice command.
The invention is realized by the following technical scheme:
the invention relates to a silent lip language identification method, which is characterized in that millimeter wave signals serving as carrier waves are continuously sent out and focused on an oral cavity area of a user, the millimeter wave signals are modulated and partially reflected on phases through the speaking behavior of the user, and the reflected signals are converted to a baseband and corrected, and then voice phase fuzzy linear reconstruction based on triangular transformation is adopted to obtain the speaking behavior information of the user.
The invention relates to a silent lip language recognition system, comprising: power supply unit, radar transceiver, carrier wave generating unit and intermediate frequency amplifying unit, wherein: the power supply unit is connected with other units and provides working voltage, the input end of the radar transceiver can be independently selected to be connected with the carrier generation unit or connected with fixed reference voltage through a switch, the output end of the radar transceiver is connected with the intermediate frequency amplification unit and transmits I/Q signals, and the intermediate frequency amplification unit is connected with the signal output end and transmits the amplified I/Q signals.
The carrier wave is a frequency modulation continuous wave, preferably a sawtooth wave.
Technical effects
The invention integrally solves the problem of lip Doppler phase ambiguity obtained by a millimeter wave radar interference phase measurement method.
Compared with the prior art, the method can measure the Doppler phase shift caused by lip movement by using a millimeter wave radar interference phase method, senses lip language by using 120GHz millimeter waves, customizes a fully integrated 120GHz millimeter wave radar miniaturized system comprising a radio frequency front end, an intermediate frequency, power management, signal transmission and the like, and realizes signal reconstruction of micro lip movement by using a phase linear reconstruction algorithm based on a coherent radar.
Drawings
FIG. 1 is a schematic diagram of a silent lip language identification method based on short-range millimeter wave radar sensing according to the present invention;
FIG. 2 is a schematic view of a radar sensor system of the present invention;
FIG. 3 is a schematic diagram of a sawtooth signal with two different pulse repetition times and amplitudes according to an embodiment;
FIG. 4 is a normalized spectrum graph of I/Q signals and I/Q signals of command phrases "Cancel" and "Up" output by the intermediate frequency amplifier in the embodiment
FIG. 5 is a schematic diagram of the I/Q signals and displacement waveforms for eight command phrases in an embodiment;
in the figure: (a) "Delete"; (b) "Left"; (c) "Off"; (d) "Yes"; (e) "Go"; (f) "Next"; (g) "Stop"; (h) "Play".
FIG. 6 is a schematic diagram of the I/Q signals and displacement waveforms of three command sentences in an embodiment;
in the figure: (a) "Buy a/7:30/ticket/for/frozen/tonight "; (b) "How's the/weather/today"; (c) "Text Lucy and tell her/that the house for diner/is book".
Detailed Description
As shown in fig. 2, the 120GHz millimeter wave radar sensing system for implementing the above method according to the present embodiment uses a 3.24 cm × 4.27 cm double-sided printed circuit board processed by a component surface patch, and has two modes of Frequency Modulated Continuous Wave (FMCW) and Continuous Wave (CW), both of which allow radar interference, wherein the FMCW mode has range finding capability and greatly expands sensing dimensions. The sensing system adopts a TRX-120 _001radar radio frequency transceiver of the company SiliconRadar, the frequency range of which is 119.1GHz to 125.9GHz, and the Tx power of which is-7 dBm to 1dBm. The system comprises: the device comprises a power module and a power management circuit for providing 5v voltage, a radar transceiver in the form of a chip, a carrier generation unit and an intermediate frequency amplification unit, wherein the radar transceiver is respectively connected with the power module.
The power module comprises a USBtype-C connector and a low dropout regulator (LDO) and outputs stable 3.3V voltage.
The radar transceiver includes: a Power Amplifier (Power Amplifier), a Low Noise Amplifier (LNA), a quadrature mixer, a polyphase filter, a Voltage Controlled Oscillator (VCO), a package Transmit-receive antenna (TX/RX), and a local oscillator, wherein: the power amplifier is connected with the local oscillator and the transmitting antenna respectively and transmits a transmitting signal, the input end of the low-noise amplifier is connected with the receiving antenna and transmits a receiving signal, the quadrature mixer is connected with the low-noise amplifier and transmits the receiving signal converted to a baseband, the polyphase filter is connected with the voltage-controlled oscillator, and the voltage-controlled oscillator is connected with the input voltage and the local oscillator respectively.
The carrier generation unit is a self-excited oscillation circuit designed based on a triangular wave generation circuit, and different integration paths can be realized by using the unidirectional conductivity of a diode, and the circuit comprises: the input hysteresis comparator of homophase and integral arithmetic circuitry, wherein: when the time constant of the forward integration is far larger than that of the backward integration, the slope of the rising edge is greatly different from that of the falling edge, so that the triangular wave is converted into the sawtooth wave.
The self-oscillation circuit is further provided with a trimming potentiometer for controlling the amplitude and the period of the sawtooth wave so as to realize adjustable scanning near a reference voltage.
The sensing system is further provided with an intermediate frequency amplifier (IF amplifier) which is connected with the radar transceiver and is used for improving the signal-to-noise ratio (SNR) level of the output of the radio frequency mixer.
As shown in FIG. 3, two different waveforms represent sawtooth waveforms with different amplitudes and pulse repetition times for two sawtooth signal examples. Four analog tuning input ends of a 120GHz Local Oscillator (LO) are connected in a short circuit mode and are selectively connected with a fixed voltage output end in a CW mode or a sawtooth wave output end in an FMCW mode through a switch, so that the local oscillator correspondingly works at a fixed frequency point or within a certain bandwidth range.
As shown in fig. 1, the present embodiment relates to the silent lip-language identification method of the above system, by continuously emitting millimeter-wave signals as carriers and focusing the millimeter-wave signals on the oral area of the user, the millimeter-wave signals are modulated and partially reflected in phase by the speaking behavior of the user, and the reflected signals are converted to baseband and corrected, and then speech phase fuzzy linear reconstruction based on triangular transformation is adopted, so as to obtain the information of the speaking behavior of the user.
The millimeter wave signals, namely carriers, are as follows: x c (t)=A cos[2πf c t+φ(t)]Wherein: a is amplitude, f c Is the carrier frequency and is,
Figure SMS_1
is the phase noise of the transmitter.
The reflected signal is converted to a baseband to obtain:
Figure SMS_2
Figure SMS_3
wherein: a. The I And A Q The amplitudes of the I and Q signals are, theta is a constant phase shift and->
Figure SMS_4
Is residual phase noise, λ is carrier wavelength, DC I And DC Q Is the dc offset in the I and Q signals.
The corrected signals are:
Figure SMS_5
for 120GHz millimeter waves, the wavelength is only 2.5mm, which easily results in phase ambiguity since facial muscle movements are likely to exceed half a wavelength. In this case, it is necessary to perform complicated phase unwrapping.
The speech phase fuzzy linear reconstruction based on the triangular transformation refers to: and sequentially differentiating the correction signal and the signal and then integrating the signals to obtain displacement information, wherein the specific time domain expression and the discrete form thereof are as follows:
Figure SMS_6
Figure SMS_7
after obtaining the displacement information of silent lip movements, various signal processing methods are further utilized, such as: and obtaining a multi-dimensional feature vector by using a feature extraction method in the traditional machine learning, or realizing optimized fitting by using a Convolutional Neural Network (CNN) in deep learning so as to identify the features of different lip languages.
The present embodiment performs the effect evaluation by performing the effect evaluation in the office environment: the radar sensor system is connected to a data acquisition Device (DAQ) to acquire real-time I and Q signals. To achieve a better signal-to-noise ratio, the radar sensor is placed approximately 5 centimeters away from the subject's mouth. In a first set of experiments, silent lip commands for 20 different words were tested, including "Yes", "cancel", "No", "play", "cause", "search", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Save", "delete", "Send", "Next", "Enter", and "Return".
As shown in fig. 4 (a) and (b), I/Q signals of command phrases "Cancel" and "Up" output from the if amplifier, and face displacement that moves when a silent lip command is issued, which is recovered by the new algorithm, are shown.
As shown in fig. 4 (c) and (d), for normalizing the spectrograms of the I/Q signals for the two command phrases, the shaded area represents the portion of the frequency range that the human ear cannot perceive, since the human can only perceive frequencies between 20 and 20000 Hz. As shown, on the one hand, the radar sensor system can detect every small lip movement with high accuracy in displacement sensing, and on the other hand, different commands correspond to different relative displacements and spectral information. It is therefore possible to derive that the lip language from which each command phrase is issued has its own unique characterization. The same signal processing procedure can be applied to other command phrases, where experimental results for 8 command words are shown in fig. 5. It can be seen that the time-domain patterns of different command words are different. Further processing may extract richer features. Based on these features, machine learning or pattern recognition may be used to identify different silent lips.
The second set of experiments tested sentences of 12 silent lip commands, with the results of 3 command sentences as shown in fig. 6, with corresponding words approximately marked next to the waveform. The 3 command sentences are "Buy a 7:30ticket for freezenith "," at's the weather today "and" Text Lucy and Text her th the house for diner signed ". The results also indicate that unique patterns also exist in different command sentences, and that they are not simple combinations of each word pattern, but are caused by speaking habits such as continuous reading and weak reading.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (6)

1. A silent lip language identification method is characterized in that millimeter wave signals serving as carrier waves are continuously sent out and focused in the oral cavity area of a user, the millimeter wave signals are modulated and partially reflected on the phase through the speaking behavior of the user, the reflected signals are converted to a baseband and corrected, and then voice phase fuzzy linear reconstruction based on triangular transformation is adopted to obtain the speaking behavior information of the user;
the carrier wave is frequency modulation continuous wave;
the carrier wave is sawtooth wave X c (t)=A cos[2πf c t+φ(t)]Wherein: a is amplitude, f c Is the carrier frequency and is,
Figure FDA0004078000070000016
phase noise for the transmitter;
the reflected signal is converted to a baseband to obtain:
Figure FDA0004078000070000011
Figure FDA0004078000070000012
wherein: a. The I And A Q The amplitudes of the I and Q signals are, theta is a constant phase shift and->
Figure FDA0004078000070000017
Is residual phase noise, λ is carrier wavelength, DC I And DC Q Is the DC offset in the I and Q signals;
the corrected signals are:
Figure FDA0004078000070000013
the speech phase fuzzy linear reconstruction based on the triangular transformation refers to: the correction signal and the signal are differentiated and then integrated in sequence, so that displacement information is obtained, and a specific time domain expression and a discrete form thereof are as follows:
Figure FDA0004078000070000014
Figure FDA0004078000070000015
after displacement information of silent lip language movement is obtained, a multi-dimensional feature vector is further obtained by using a feature extraction method in machine learning, or features of different lip languages are identified by using a convolutional neural network in deep learning.
2. A silent lip language identification system according to the method of claim 1, comprising: power supply unit, radar transceiver, carrier wave generation unit and intermediate frequency amplification unit, wherein: the power supply unit is connected with other units and provides working voltage, the input end of the radar transceiver can be independently selected to be connected with the carrier generation unit or connected with fixed reference voltage through a switch, the output end of the radar transceiver is connected with the intermediate frequency amplification unit and transmits I/Q signals, and the intermediate frequency amplification unit is connected with the signal output end and transmits the amplified I/Q signals.
3. The system of claim 2, wherein said radar transceiver comprises: power amplifier, low noise amplifier, quadrature mixer, polyphase filter, voltage controlled oscillator, packaged transmit receive antenna and local oscillator, wherein: the power amplifier is connected with the local oscillator and the transmitting antenna respectively and transmits a transmitting signal, the input end of the low-noise amplifier is connected with the receiving antenna and transmits a receiving signal, the quadrature mixer is connected with the low-noise amplifier and transmits the receiving signal converted to a baseband, the polyphase filter is connected with the voltage-controlled oscillator, and the voltage-controlled oscillator is connected with the input voltage and the local oscillator respectively.
4. The system of claim 2, wherein the carrier generation unit is a self-oscillating circuit designed based on a triangular wave generation circuit, and different integration paths are realized by using unidirectional conductivity of a diode, the circuit comprising: the input hysteresis comparator of homophase and integral arithmetic circuitry, wherein: when the time constant of the forward integration is far larger than that of the backward integration, the difference between the slope of the rising edge and the slope of the falling edge is large, so that the triangular wave is converted into the sawtooth wave.
5. The system as claimed in claim 4, wherein the self-oscillation circuit further comprises a trimming potentiometer for controlling the amplitude and period of the sawtooth wave to achieve an adjustable sweep around the reference voltage.
6. The system of claim 2, further comprising an intermediate frequency amplifier coupled to the radar transceiver for increasing the signal-to-noise level of the output of the rf mixer.
CN202010016710.9A 2020-01-08 2020-01-08 Silent lip language recognition method and system Active CN111091831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010016710.9A CN111091831B (en) 2020-01-08 2020-01-08 Silent lip language recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010016710.9A CN111091831B (en) 2020-01-08 2020-01-08 Silent lip language recognition method and system

Publications (2)

Publication Number Publication Date
CN111091831A CN111091831A (en) 2020-05-01
CN111091831B true CN111091831B (en) 2023-04-07

Family

ID=70398876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010016710.9A Active CN111091831B (en) 2020-01-08 2020-01-08 Silent lip language recognition method and system

Country Status (1)

Country Link
CN (1) CN111091831B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628617A (en) * 2020-05-09 2021-11-09 西安电子科技大学青岛计算技术研究院 Intelligent voice equipment control method based on millimeter wave radar
CN111505601A (en) * 2020-05-21 2020-08-07 上海交通大学 Linear motion demodulation implementation method based on improved differential cross multiplication
CN111856422A (en) * 2020-07-03 2020-10-30 西安电子科技大学 Lip language identification method based on broadband multichannel millimeter wave radar
CN111986674B (en) * 2020-08-13 2021-04-09 广州仿真机器人有限公司 Intelligent voice recognition method based on three-level feature acquisition
CN113314121B (en) * 2021-05-25 2024-06-04 北京小米移动软件有限公司 Soundless voice recognition method, soundless voice recognition device, soundless voice recognition medium, soundless voice recognition earphone and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015210419A (en) * 2014-04-28 2015-11-24 カシオ計算機株式会社 Converter, method, and program
CN106778179A (en) * 2017-01-05 2017-05-31 南京大学 A kind of identity identifying method based on the identification of ultrasonic wave lip reading
CN107171994A (en) * 2017-06-06 2017-09-15 南京理工大学 Radio Fuze Signal is recognized and reconfiguration system and method
CN110584631A (en) * 2019-10-10 2019-12-20 重庆邮电大学 Static human heartbeat and respiration signal extraction method based on FMCW radar

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015210419A (en) * 2014-04-28 2015-11-24 カシオ計算機株式会社 Converter, method, and program
CN106778179A (en) * 2017-01-05 2017-05-31 南京大学 A kind of identity identifying method based on the identification of ultrasonic wave lip reading
CN107171994A (en) * 2017-06-06 2017-09-15 南京理工大学 Radio Fuze Signal is recognized and reconfiguration system and method
CN110584631A (en) * 2019-10-10 2019-12-20 重庆邮电大学 Static human heartbeat and respiration signal extraction method based on FMCW radar

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAYAO TAN et al..《 SilentKey: A New Authentication Framework through Ultrasonic-based Lip Reading》.《Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies》.2018,第2卷(第1期),第1-18页. *
W.Xu et al..《Robust Doppler radar demodulation via compressed sensing》.《ELECTRONICS LETTERS》.2012,第48卷(第22期),第1-2页. *

Also Published As

Publication number Publication date
CN111091831A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN111091831B (en) Silent lip language recognition method and system
US9846226B2 (en) Motion detection device
US9645234B2 (en) RFID device, methods and applications
Costanzo et al. Energy autonomous UWB localization
Li et al. Recent advances in Doppler radar sensors for pervasive healthcare monitoring
Zhou et al. Ultra low-power UWB-RFID system for precise location-aware applications
US8924214B2 (en) Radar microphone speech recognition
Wu et al. Dynamic hand gesture recognition using FMCW radar sensor for driving assistance
Bao et al. A wirelessly powered UWB RFID sensor tag with time-domain analog-to-information interface
Wen et al. Silent speech recognition based on short-range millimeter-wave sensing
CN102247146A (en) Wireless sensing device and method
Heidrich et al. Local positioning with passive UHF RFID transponders
Yavari et al. System-on-Chip based Doppler radar occupancy sensor
US20230358856A1 (en) Harmonic radar based on field programmable gate array (fpga) and deep learning
Hui et al. Near-field coherent sensing of vibration with harmonic analysis and balance signal injection
CN110763923A (en) Specific absorption rate value control method and mobile terminal
CN109738885A (en) A kind of life detection radar system and method based on random code modulated sinusoid signal
Eggimann et al. Low power embedded gesture recognition using novel short-range radar sensors
Liu et al. Ultrasound-based 3-D gesture recognition: Signal optimization, trajectory, and feature classification
Zhao et al. Gesture Recognition System Resilient to Interdevice Interference Based on Direct Sequence Spread Spectrum
Chinnam et al. Implementation of a low cost synthetic aperture radar using software defined radio
Lai et al. Finger gesture sensing and recognition using a Wi-Fi-based passive radar
Fabbri et al. Micropower design of an energy autonomous rf tag for uwb localization applications
Fang et al. A silicon-based radio platform for integrated edge sensing and communication toward sustainable healthcare
Yuan et al. A high-sensitivity low-power vital sign radar sensor based on super-regenerative oscillator architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant