CN112634871B - Lie detection method and system based on voice and radar dual sensors - Google Patents

Lie detection method and system based on voice and radar dual sensors Download PDF

Info

Publication number
CN112634871B
CN112634871B CN202011492568.1A CN202011492568A CN112634871B CN 112634871 B CN112634871 B CN 112634871B CN 202011492568 A CN202011492568 A CN 202011492568A CN 112634871 B CN112634871 B CN 112634871B
Authority
CN
China
Prior art keywords
feature set
respiratory
voice
heartbeat
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011492568.1A
Other languages
Chinese (zh)
Other versions
CN112634871A (en
Inventor
洪弘
李新
李彧晟
孙理
顾陈
朱晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202011492568.1A priority Critical patent/CN112634871B/en
Publication of CN112634871A publication Critical patent/CN112634871A/en
Application granted granted Critical
Publication of CN112634871B publication Critical patent/CN112634871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/0205Simultaneously evaluating both cardiovascular conditions and different types of body conditions, e.g. heart and respiratory condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/164Lie detection
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7203Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/024Detecting, measuring or recording pulse rate or heart rate
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Signal Processing (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Acoustics & Sound (AREA)
  • Surgery (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Psychiatry (AREA)
  • Cardiology (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Developmental Disabilities (AREA)
  • Quality & Reliability (AREA)
  • Pulmonology (AREA)
  • Educational Technology (AREA)
  • Social Psychology (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)

Abstract

The invention discloses a lie detection method and system based on a voice and radar dual sensor. Firstly, synchronously acquiring a voice signal and a radar signal by utilizing a microphone and the radar; secondly, respectively preprocessing a voice signal and a radar signal; then respectively extracting a voice feature set, a respiratory feature set and a heartbeat feature set according to psychological and physiological characteristics of lie-lying time; then, the three feature sets are fused by utilizing a feature fusion technology; finally, a machine learning classifier is used for detecting and identifying lie. The invention can effectively combine and complement the advantages of the two non-contact sensors, has reliable performance and can accurately detect lie. The invention not only can not bring uncomfortable feeling to the testee, but also can eliminate the limitation of voice, effectively improves the lie detection accuracy, and has high reliability and wide applicability.

Description

Lie detection method and system based on voice and radar dual sensors
Technical Field
The invention belongs to the lie detection field, and particularly relates to a lie detection method and system based on a voice and radar dual sensor.
Background
Lie is a special behavior that people can hide the true looks of things and make the other party trust under certain scenes through false descriptions of the people. Lie detection technology is an important technology for cross fusion of various subjects such as psychology, physiology, linguistics, cognitive subjects, statistics, sensor technology, pattern recognition, criminal investigation and the like, and has important significance for understanding behavior characteristics of human beings, and particularly assisting criminal investigation judicial cases. Human judgment of lie is basically by subjective random guessing, and accuracy is low, so that a scientific and reliable system and method are needed to guide human detection of lie.
In the past, the detection of lie mainly uses a multi-channel physiological instrument to detect physiological parameters and abnormal conditions in the lie-lie process, and the method has certain reliability, because human beings do not autonomously generate some physiological reactions in the lie-lie process, such as acceleration of heartbeat, inhibition of breathing and the like. However, this method requires that the subject wear a plurality of sensors, which may cause discomfort to the subject and even cause a great sense of compression, which may affect the measurement result. Currently, home lie detection is mainly performed by means of voice analysis, and voice or semantic features of lie are analyzed from the perspective of voice or semantics. Although the method can achieve non-contact measurement, the method has great limitation because of great difference of speaking habits and expression contents of each person.
Disclosure of Invention
The invention aims to provide a voice and radar dual-sensor-based lie detection method and system with high reliability and wide applicability, which can realize non-contact acquisition of voice signals, respiratory signals and heartbeat signals, and fusion of the three signals to detect lie.
The technical solution for realizing the purpose of the invention is as follows: a lie detection method based on dual voice and radar sensors, the method comprising the steps of:
Step 1, synchronously acquiring a voice signal and a radar signal by using a microphone and a continuous wave radar;
step 2, noise reduction, sound event detection and pre-emphasis preprocessing are carried out on the voice signals acquired in the step 1;
step 3, extracting the characteristics of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to the preprocessed voice signals obtained in the step 2, and applying 6 statistical parameters of maximum value, minimum value, mean value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice characteristic set X;
step 4, demodulating and filtering the radar signals acquired in the step 1 to obtain respiratory signals and heartbeat signals;
step 5, carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signal and the heartbeat signal obtained in the step 4 respectively to obtain a respiratory feature set R and a heartbeat feature set H;
step 6, fusing the respiratory feature set R obtained in the step 5 with the heartbeat feature set H to obtain a physiological feature set Y, and then performing feature fusion on the physiological feature set Y and the voice feature set X obtained in the step 3 to obtain a fusion feature set Z;
and 7, training a classifier by using the fusion feature set Z obtained in the step 6, and performing lie detection classification on the voice sample.
Further, in the step 2, the noise reduction is performed on the voice signal acquired in the step 1, and the specific process includes:
step 2-1, recording a noise sample aiming at the noise of which the decibel exceeds a preset threshold value, namely main noise;
step 2-2, generating a noise sample configuration file by using the SOX audio processing program for the noise sample obtained in the step 2-1, and performing primary noise reduction on the voice signal according to the noise sample configuration file to remove main noise;
and 2-3, performing secondary noise reduction on the voice signal obtained in the step 2-2 after primary noise reduction by using improved spectral subtraction, and removing other types of noise except noise samples to obtain a pure voice signal.
Further, in the step 5, time domain, frequency domain and nonlinear feature extraction are performed on the respiratory signal and the heartbeat signal obtained in the step 4 respectively to obtain a respiratory feature set R and a heartbeat feature set H, which specifically include:
step 5-1, carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signal to obtain a respiratory feature set R;
A. time domain features: extracting a breath amplitude mean value, a breath amplitude standard deviation, a breath average amplitude difference and a breath normalized average amplitude difference as time domain features of a breath signal; wherein,
(1) Mean value mu of respiratory amplitude x The expression is used for reflecting the average amplitude condition of respiration in lie detection process and is as follows:
wherein X (N) is the nth respiration sequence, N is the total number of the respiration sequences, and N is more than or equal to 1 and less than or equal to N;
(2) Standard deviation sigma of respiratory amplitude x The expression is used for reflecting the overall change condition of respiration in lie detection process, and is as follows:
(3) Average respiratory amplitude delta x The expression is used for reflecting the short-time change of the respiratory amplitude in lie detection process:
(4) Breath normalized average amplitude difference delta rx The expression is used for reflecting the influence of short-time variation of the respiratory amplitude on the overall variation in lie detection process:
B. frequency domain characteristics: extracting respiratory low frequency band F L Mid-respiratory frequency band F M And respiratory high frequency band F H The power spectrum amplitude average value of the three frequency bands is used as the frequency domain characteristic of the breathing signal; wherein F is L <p 1 Hz,p 1 Hz≤F M <p 2 Hz,F H >p 2 Hz;
C. Nonlinear characteristics: extracting a respiration detrack fluctuation scale index and a respiration sample entropy as nonlinear characteristics of respiration signals;
(1) Respiratory detrence fluctuation scale index
The respiratory detrack fluctuation scale index is used for reflecting the nonstationary characteristic of respiratory signals in lie detection, and the computing steps are as follows:
1) Assuming that the respiratory sequence is X (n), the mean mu is calculated x
2) Calculating the cumulative difference y (n) of the breathing sequence:
3) Dividing y (n) into a windows with a window length of b in a non-overlapping manner;
4) Fitting a local trend y to each section of window length interval by using a least square method b (n) then removing the local trend of each interval to obtain a new respiratory sequence and calculating the root mean square F (n) of the new respiratory sequence:
5) Changing the size of the window length b and then repeating the steps until the required data volume is obtained;
6) According to the parameters calculated in the above steps, drawing a curve with log (n) as an abscissa and log [ F (n) ] as an ordinate, wherein the slope of the curve is the respiratory detrack fluctuation scale index of the respiratory sequence;
(2) Breath sample entropy
The breath sample entropy is used for evaluating the complexity of the breath signal in lie estimation, and the calculation steps are as follows:
1) The respiration time series is denoted as X (N), with m as window length, and is divided into s=n-m+1 respiration subsequences:
X m (t)=(X(t),X(t+1),…,X(t+m-1)),1≤t≤N-m+1
wherein X is m (t) is the t-th breath subsequence;
2) Definition of sequence X m (i) And sequence X m (j) The distance of the corresponding element is the absolute value of the maximum difference value of the corresponding element and the distance d between each breathing subsequence and all other breathing subsequences is calculated ij
d ij =max k=0,…,m-1 (|X m (i+k)-X m (j+k)|)
Wherein i is more than or equal to 1 and less than or equal to N-m, j is more than or equal to 1 and less than or equal to N-m, and i is not equal to j;
3) Calculating the standard deviation sigma of the respiratory amplitude x And defines a threshold f=r×σ x R is a constant, taking 0.1-0.25; the distance d calculated in the above 2) ij The ratio of the number less than or equal to F to s is recordedCalculate +.>Mean value phi of (1) m (t):
4) Changing window length to m+1, repeating steps 1) to 3) to obtain phi m+1 (t);
5) Calculating breath sample entropy samplen (t):
SampEn(t)=ln[φ m (t)]-ln[φ m+1 (t)]
step 5-2, carrying out time domain, frequency domain and nonlinear feature extraction on the heartbeat signal to obtain a respiratory feature set R;
A. time domain features: extracting a heartbeat amplitude mean value, a heartbeat amplitude standard deviation, a heartbeat average amplitude difference and a heartbeat normalized average amplitude difference as time domain features of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing feature extraction part;
B. frequency domain characteristics: extracting the heartbeat low frequency band F L ' Heartbeat intermediate frequency bandF M ' and heartbeat high frequency band F H ' the average value of the power spectrum amplitudes of the three frequency bands is used as the frequency domain characteristic of the heartbeat signal; wherein F is L '<p 3 Hz,p 3 Hz≤F M '<p 4 Hz,F H '>p 4 Hz,p 3 >p 1 ,p 4 >p 2
C. Nonlinear characteristics: and extracting a heartbeat trending fluctuation scale index and a heartbeat sample entropy as nonlinear characteristics of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing characteristic extraction part.
Further, the specific process of step 6 includes:
step 6-1, adopting a characteristic fusion mode of serial fusion for the respiratory characteristic set R and the heartbeat characteristic set H to obtain a physiological characteristic set Y:
Y=[RH]
Step 6-2, calculating vector average values of each class and all feature data of the voice feature set X and the physiological feature set Y, and describing the process of the voice feature set X, wherein Y is the same as the following, and the specific steps are as follows:
in the method, in the process of the invention,vector mean value of ith class of samples of voice feature set X ij For the j-th sample of the i-th class of the voice feature set X, n i The number of samples of the ith class of the voice feature set X;
in the method, in the process of the invention,the vector average value of all sample data of the voice feature set X is obtained, k is the class number, and n is the total sample number;
step 6-3, respectively calculating projection matrices of the voice signal feature set X and the physiological feature set Y to minimize the inter-class correlation of the feature sets X and Y, and describing the projection matrix calculation process of X, where Y is the same as the following, specifically:
first, an inter-class dispersion matrix S of a speech signal feature set X is calculated bx
Wherein,
phi when the correlation between classes is minimum bx T φ bx Is a diagonal matrix, also because of phi bx T φ bx Symmetrical semi-normal, there is the following transformation:
where P is the orthogonal eigenvector matrix,diagonal matrices ordered in descending order for non-negative real eigenvalues;
let Q be composed of r eigenvectors corresponding to r maximum non-zero eigenvalues in matrix P, corresponding to:
Q Tbx T φ bx )Q=A
then a projection matrix W of X can be obtained bx
Projection matrix W of Y is obtained in the same way by
Step 6-4, the calculated casting according to step 6-3Projecting X and Y by using the shadow matrix to obtain a projected voice feature set X p And a physiological feature set Y p
X p =W bx T X
Y p =W by T Y
Step 6-5, utilizing singular value decomposition SVD to diagonalize the inter-set covariance matrix of the projected feature set to obtain a voice feature set conversion matrix W x And physiological feature set transformation matrix W y The method is characterized by comprising the following steps:
wherein S is xy =X p Y p T B is a diagonal matrix with non-zero diagonal elements, wherein U, V can be obtained through SVD; from the following componentsThe method can obtain:
wherein I is a unit array;
thus, a speech feature set conversion matrix W is obtained x And physiological feature set transformation matrix W y
W x =W cx T W bx T
W y =W cy T W by T
Step 6-6, calculating the converted voice feature set X according to the conversion matrix calculated in step 6-5 dca And a physiological feature set Y dca
X dca =W x X
Y dca =W y Y
Step 6-7, the speech feature set X calculated for step 6-6 dca And a physiological feature set Y dca Serial fusion is carried out to obtain a fusion feature set Z:
Z=[X dca Y dca ]。
a lie detection system based on a voice and radar dual sensor comprises a voice acquisition module, a radar acquisition module, a voice preprocessing module, a radar preprocessing module, a voice feature extraction module, a physiological feature extraction module, a feature fusion module and a classification module;
the voice acquisition module is used for acquiring voice signals by utilizing a microphone;
The radar acquisition module is used for acquiring radar signals by using a continuous wave radar;
the voice preprocessing module is used for carrying out noise reduction, sound event detection and pre-emphasis preprocessing on the collected voice signals;
the radar preprocessing module is used for demodulating and filtering the acquired radar signals to obtain respiratory signals and heartbeat signals;
the voice feature extraction module is used for extracting characteristics of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to the preprocessed voice signals, and applying 6 statistical parameters of maximum value, minimum value, mean value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice feature set X;
the physiological characteristic extraction module is used for extracting time domain, frequency domain and nonlinear characteristics of the respiratory signal and the heartbeat signal respectively to obtain a respiratory characteristic set R and a heartbeat characteristic set H;
the feature fusion module is used for fusing the respiratory feature set R and the heartbeat feature set H to obtain a physiological feature set Y, and then carrying out feature fusion on the physiological feature set Y and the voice feature set X to obtain a fusion feature set Z;
the classifying module is used for training the classifier by utilizing the fusion feature set Z and performing lie detection classification on the voice sample.
Compared with the prior art, the invention has the remarkable advantages that: 1) The continuous wave radar can realize non-contact measurement of respiration and heartbeat, effectively reduce uncomfortable feeling of a subject, reduce physiological and psychological pressing feeling of the subject and reduce influence on a measurement result; 2) The voice signal, the respiration signal and the heartbeat signal are combined, so that the problems of strong individual variability, easy camouflage, strong limitation and the like of the voice signal can be solved; 3) By combining the SOX noise reduction program and the characteristics of improving the spectral subtraction, a better noise reduction effect can be achieved; 4) The DCA algorithm is utilized to fuse the voice features and the physiological features, minimize the correlation among classes, maximize the correlation among feature sets and improve the accuracy of lie detection.
The invention is described in further detail below with reference to the accompanying drawings.
Drawings
Figure 1 is a block diagram of a lie detection system based on dual voice and radar sensors of the present invention.
Fig. 2 is a schematic diagram of an original noisy speech waveform in one embodiment.
FIG. 3 is a schematic diagram of a speech waveform of an original noisy speech after SOX processing in one embodiment.
FIG. 4 is a schematic diagram of speech waveforms of an embodiment after an improved spectral subtraction of original noisy speech.
FIG. 5 is a schematic diagram of speech waveforms of an original noisy speech subjected to SOX+ modified spectral subtraction processing in an embodiment.
Fig. 6 is a comparison of lie detection results of different feature sets in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In addition, if there is a description of "first", "second", etc., in this disclosure, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The invention provides a lie detection method based on a voice and radar dual sensor, which comprises the following steps:
step 1, synchronously acquiring a voice signal and a radar signal by using a microphone and a continuous wave radar;
step 2, noise reduction, sound event detection and pre-emphasis preprocessing are carried out on the voice signals acquired in the step 1; the specific process comprises the following steps:
step 2-1, recording a noise sample of the noise which is the main noise and has the decibel exceeding a preset threshold value, wherein the duration is 2s;
step 2-2, generating a noise sample configuration file by using the SOX audio processing program for the noise sample obtained in the step 2-1, and performing primary noise reduction on the voice signal according to the noise sample configuration file to remove main noise;
and 2-3, performing secondary noise reduction on the voice signal obtained in the step 2-2 after primary noise reduction by using improved spectral subtraction, and removing other types of noise except noise samples to obtain a pure voice signal.
Step 3, extracting characteristics (0-12) of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to characteristics such as voice modulation, pause, voice speed change and the like possibly occurring under lie-state of a person aiming at the preprocessed voice signal obtained in the step 2, and applying 6 statistical parameters of maximum value, minimum value, average value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice characteristic set X;
Step 4, demodulating and filtering the radar signals acquired in the step 1 to obtain respiratory signals and heartbeat signals;
step 5, according to the characteristics of possible respiratory 'inhibition' and rapid heartbeat of people in lie-state, the respiratory signal and the heartbeat signal obtained in the step 4 are respectively subjected to time domain, frequency domain and nonlinear characteristic extraction to obtain a respiratory characteristic set R and a heartbeat characteristic set H; the method specifically comprises the following steps:
step 5-1, carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signal to obtain a respiratory feature set R;
A. time domain features: extracting a breath amplitude mean value, a breath amplitude standard deviation, a breath average amplitude difference and a breath normalized average amplitude difference as time domain features of a breath signal; wherein,
(1) Mean value mu of respiratory amplitude x The expression is used for reflecting the average amplitude condition of respiration in lie detection process and is as follows:
wherein X (N) is the nth respiration sequence, N is the total number of the respiration sequences, and N is more than or equal to 1 and less than or equal to N;
(2) Standard deviation sigma of respiratory amplitude x The expression is used for reflecting the overall change condition of respiration in lie detection process, and is as follows:
(3) Average respiratory amplitude delta x The expression is used for reflecting the short-time change of the respiratory amplitude in lie detection process:
(4) Breath normalized average amplitude difference delta rx The expression is used for reflecting the influence of short-time variation of the respiratory amplitude on the overall variation in lie detection process:
B. frequency domain characteristics: extracting respiratory low frequency band F L (F L Less than 0.2Hz, respiratory medium frequency band F M (0.2≤F M Less than or equal to 0.3 Hz) and respiratory high frequency band F H (F H More than 0.3 Hz) as the frequency domain characteristic of the respiratory signal;
C. nonlinear characteristics: extracting a respiration detrack fluctuation scale index and a respiration sample entropy as nonlinear characteristics of respiration signals;
(1) Respiratory detrence fluctuation scale index
The respiratory detrack fluctuation scale index is used for reflecting the nonstationary characteristic of respiratory signals in lie detection, and the computing steps are as follows:
1) Assuming that the respiratory sequence is X (n), the mean mu is calculated x
2) Calculating the cumulative difference y (n) of the breathing sequence:
3) Dividing y (n) into a windows with a window length of b in a non-overlapping manner;
4) Fitting a local trend y to each section of window length interval by using a least square method b (n) then removing the local trend of each interval to obtain a new respiratory sequence and calculating the root mean square F (n) of the new respiratory sequence:
5) Changing the size of the window length b and then repeating the steps until the required data volume is obtained;
6) According to the parameters calculated in the above steps, drawing a curve with log (n) as an abscissa and log [ F (n) ] as an ordinate, wherein the slope of the curve is the respiratory detrack fluctuation scale index of the respiratory sequence;
(2) Breath sample entropy
The breath sample entropy is used for evaluating the complexity of the breath signal in lie estimation, and the calculation steps are as follows:
1) The respiration time series is denoted as X (N), with m as window length, and is divided into s=n-m+1 respiration subsequences:
X m (t)=(X(t),X(t+1),…,X(t+m-1)),1≤t≤N-m+1
wherein X is m (t) is the t-th breath subsequence;
2) Definition of sequence X m (i) And sequence X m (j) The distance of the corresponding element is the absolute value of the maximum difference value of the corresponding element and the distance d between each breathing subsequence and all other breathing subsequences is calculated ij
d ij =max k=0,…,m-1 (X m (i+k)-X m (j+k))
Wherein i is more than or equal to 1 and less than or equal to N-m, j is more than or equal to 1 and less than or equal to N-m, and i is not equal to j;
3) Calculating the standard deviation sigma of the respiratory amplitude x And defines a threshold f=r×σ x R is a constant, taking 0.1-0.25; the distance d calculated in the above 2) ij The ratio of the number less than or equal to F to s is recordedCalculate +.>Mean value phi of (1) m (t):
4) Changing window length to m+1, repeating steps 1) to 3) to obtain phi m+1 (t);
5) Calculating breath sample entropy samplen (t):
SampEn(t)=ln[φ m (t)]-ln[φ m+1 (t)]
step 5-2, carrying out time domain, frequency domain and nonlinear feature extraction on the heartbeat signal to obtain a respiratory feature set R;
A. time domain features: extracting a heartbeat amplitude mean value, a heartbeat amplitude standard deviation, a heartbeat average amplitude difference and a heartbeat normalized average amplitude difference as time domain features of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing feature extraction part;
B. Frequency domain characteristics: extracting the heartbeat low frequency band F L '(F L ' < 0.8 Hz), frequency band F in heart beat M '(0.8≤F M '.ltoreq.1.2 Hz) and a heartbeat high frequency band F H '(F H ' 1.2 Hz) as the frequency domain characteristic of the heartbeat signal;
C. nonlinear characteristics: and extracting a heartbeat trending fluctuation scale index and a heartbeat sample entropy as nonlinear characteristics of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing characteristic extraction part.
Step 6, carrying out serial fusion on the respiratory feature set R obtained in the step 5 and the heartbeat feature set H to obtain a physiological feature set Y, and then carrying out feature fusion on the physiological feature set Y and the voice feature set X obtained in the step 3 by using a DCA algorithm to obtain a fusion feature set Z; the specific process comprises the following steps:
step 6-1, adopting a characteristic fusion mode of serial fusion for the respiratory characteristic set R and the heartbeat characteristic set H to obtain a physiological characteristic set Y:
Y=[RH]
step 6-2, calculating vector average values of each class and all feature data of the voice feature set X and the physiological feature set Y, and describing the process of the voice feature set X, wherein Y is the same as the following, and the specific steps are as follows:
in the method, in the process of the invention,vector mean value of ith class of samples of voice feature set X ij For the j-th sample of the i-th class of the voice feature set X, n i The number of samples of the ith class of the voice feature set X;
In the method, in the process of the invention,the vector average value of all sample data of the voice feature set X is obtained, k is the class number, and n is the total sample number;
step 6-3, respectively calculating projection matrices of the voice signal feature set X and the physiological feature set Y to minimize the inter-class correlation of the feature sets X and Y, and describing the projection matrix calculation process of X, where Y is the same as the following, specifically:
first, an inter-class dispersion matrix S of a speech signal feature set X is calculated bx
Wherein,
phi when the correlation between classes is minimum bx T φ bx Is a diagonal matrix, also because of phi bx T φ bx Symmetrical semi-normal, there is the following transformation:
where P is the orthogonal eigenvector matrix,diagonal matrices ordered in descending order for non-negative real eigenvalues;
let Q be composed of r eigenvectors corresponding to r maximum non-zero eigenvalues in matrix P, corresponding to:
Q Tbx T φ bx )Q=A
then a projection matrix W of X can be obtained bx
Projection matrix W of Y is obtained in the same way by
Step 6-4, projecting X and Y according to the projection matrix calculated in the step 6-3 to obtain a projected voice feature set X p And a physiological feature set Y p
X p =W bx T X
Y p =W by T Y
Step 6-5, utilizing singular value decomposition SVD to diagonalize the inter-set covariance matrix of the projected feature set to obtain a voice feature set conversion matrix W x And physiological feature set transformation matrix W y The method is characterized by comprising the following steps:
Wherein S is xy =X p Y p T B is a diagonal matrix with non-zero diagonal elements, wherein U, V can be obtained through SVD;
from the following componentsThe method can obtain:
wherein I is a unit array;
thus, a speech feature set conversion matrix W is obtained x Hesheng (Chinese character)Conversion matrix W of the physical feature set y
W x =W cx T W bx T
W y =W cy T W by T
Step 6-6, calculating the converted voice feature set X according to the conversion matrix calculated in step 6-5 dca And a physiological feature set Y dca
X dca =W x X
Y dca =W y Y
Step 6-7, the speech feature set X calculated for step 6-6 dca And a physiological feature set Y dca Serial fusion is carried out to obtain a fusion feature set Z:
Z=[X dca Y dca ]。
and 7, training a classifier by using the fusion feature set Z obtained in the step 6, and performing lie detection classification on the voice sample.
Referring to fig. 1, the invention provides a lie detection system based on a voice and radar dual sensor, which comprises a voice acquisition module, a radar acquisition module, a voice preprocessing module, a radar preprocessing module, a voice feature extraction module, a physiological feature extraction module, a feature fusion module and a classification module;
the voice acquisition module is used for acquiring voice signals by utilizing a microphone;
the radar acquisition module is used for acquiring radar signals by using a continuous wave radar;
the voice preprocessing module is used for carrying out noise reduction, sound event detection and pre-emphasis preprocessing on the collected voice signals; the module includes the following components:
The main noise acquisition unit is used for recording a noise sample aiming at the noise of which the decibel exceeds a preset threshold value, namely main noise;
the primary noise unit is used for generating a noise sample configuration file for the noise samples by using a SOX audio processing program, and carrying out primary noise reduction on the voice signals according to the noise sample configuration file to remove main noise;
and the secondary noise unit is used for carrying out secondary noise reduction on the voice signal subjected to primary noise reduction by utilizing improved spectral subtraction, removing other types of noise except noise samples and obtaining a pure voice signal.
The radar preprocessing module is used for demodulating and filtering the acquired radar signals to obtain respiratory signals and heartbeat signals;
the voice feature extraction module is used for extracting characteristics of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to the preprocessed voice signals, and applying 6 statistical parameters of maximum value, minimum value, mean value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice feature set X;
the physiological characteristic extraction module is used for extracting time domain, frequency domain and nonlinear characteristics of the respiratory signal and the heartbeat signal respectively to obtain a respiratory characteristic set R and a heartbeat characteristic set H; the module includes the following components:
The respiratory feature set acquisition unit is used for carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signals to obtain a respiratory feature set R;
A. time domain features: extracting a breath amplitude mean value, a breath amplitude standard deviation, a breath average amplitude difference and a breath normalized average amplitude difference as time domain features of a breath signal; wherein,
(1) Mean value mu of respiratory amplitude x The expression is used for reflecting the average amplitude condition of respiration in lie detection process and is as follows:
wherein X (N) is the nth respiration sequence, N is the total number of the respiration sequences, and N is more than or equal to 1 and less than or equal to N;
(2) Standard deviation sigma of respiratory amplitude x The expression is used for reflecting the overall change condition of respiration in lie detection process, and is as follows:
(3) Average respiratory amplitude delta x The expression is used for reflecting the short-time change of the respiratory amplitude in lie detection process:
(4) Breath normalized average amplitude difference delta rx The expression is used for reflecting the influence of short-time variation of the respiratory amplitude on the overall variation in lie detection process:
B. frequency domain characteristics: extracting respiratory low frequency band F L Mid-respiratory frequency band F M And respiratory high frequency band F H The power spectrum amplitude average value of the three frequency bands is used as the frequency domain characteristic of the breathing signal; wherein F is L <p 1 Hz,p 1 Hz≤F M <p 2 Hz,F H >p 2 Hz;
C. Nonlinear characteristics: extracting a respiration detrack fluctuation scale index and a respiration sample entropy as nonlinear characteristics of respiration signals;
(1) Respiratory detrence fluctuation scale index
The respiratory detrack fluctuation scale index is used for reflecting the nonstationary characteristic of respiratory signals in lie detection, and the computing steps are as follows:
1) Assuming that the respiratory sequence is X (n), the mean mu is calculated x
2) Calculating the cumulative difference y (n) of the breathing sequence:
3) Dividing y (n) into a windows with a window length of b in a non-overlapping manner;
4) Fitting each section of window length interval by using least square methodTrend y of the outgoing part b (n) then removing the local trend of each interval to obtain a new respiratory sequence and calculating the root mean square F (n) of the new respiratory sequence:
5) Changing the size of the window length b and then repeating the steps until the required data volume is obtained;
6) According to the parameters calculated in the above steps, drawing a curve with log (n) as an abscissa and log [ F (n) ] as an ordinate, wherein the slope of the curve is the respiratory detrack fluctuation scale index of the respiratory sequence;
(2) Breath sample entropy
The breath sample entropy is used for evaluating the complexity of the breath signal in lie estimation, and the calculation steps are as follows:
1) The respiration time series is denoted as X (N), with m as window length, and is divided into s=n-m+1 respiration subsequences:
X m (t)=(X(t),X(t+1),…,X(t+m-1)),1≤t≤N-m+1
wherein X is m (t) is the t-th breath subsequence;
2) Definition of sequence X m (i) And sequence X m (j) The distance of the corresponding element is the absolute value of the maximum difference value of the corresponding element and the distance d between each breathing subsequence and all other breathing subsequences is calculated ij
d ij =max k=0,…,m-1 (|X m (i+k)-X m (j+k)|)
Wherein i is more than or equal to 1 and less than or equal to N-m, j is more than or equal to 1 and less than or equal to N-m, and i is not equal to j;
3) Calculating the standard deviation sigma of the respiratory amplitude x And defines a threshold f=r×σ x R is a constant, taking 0.1-0.25; the distance d calculated in the above 2) ij The ratio of the number less than or equal to F to s is recordedCalculate +.>Mean value phi of (1) m (t):
4) Changing window length to m+1, repeating steps 1) to 3) to obtain phi m+1 (t);
5) Calculating breath sample entropy samplen (t):
SampEn(t)=ln[φ m (t)]-ln[φ m+1 (t)]
the respiratory feature set acquisition unit is used for extracting time domain, frequency domain and nonlinear features of the heartbeat signal to obtain a respiratory feature set R;
A. time domain features: extracting a heartbeat amplitude mean value, a heartbeat amplitude standard deviation, a heartbeat average amplitude difference and a heartbeat normalized average amplitude difference as time domain features of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing feature extraction part;
B. frequency domain characteristics: extracting the heartbeat low frequency band F L ' Heartbeat middle frequency band F M ' and heartbeat high frequency band F H ' the average value of the power spectrum amplitudes of the three frequency bands is used as the frequency domain characteristic of the heartbeat signal; wherein F is L '<p 3 Hz,p 3 Hz≤F M '<p 4 Hz,F H '>p 4 Hz,p 3 >p 1 ,p 4 >p 2
C. Nonlinear characteristics: and extracting a heartbeat trending fluctuation scale index and a heartbeat sample entropy as nonlinear characteristics of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing characteristic extraction part.
The feature fusion module is used for fusing the respiratory feature set R and the heartbeat feature set H to obtain a physiological feature set Y, and then carrying out feature fusion on the physiological feature set Y and the voice feature set X to obtain a fusion feature set Z; the module includes the following components:
the first feature fusion unit is used for adopting a feature fusion mode of serial fusion for the respiratory feature set R and the heartbeat feature set H to obtain a physiological feature set Y:
Y=[RH]
the first calculating unit is configured to calculate a vector average value of each class and all feature data of the speech feature set X and the physiological feature set Y, and the following description is given by a process of the speech feature set X, where Y is the same as the following, specifically:
in the method, in the process of the invention,vector mean value of ith class of samples of voice feature set X ij For the j-th sample of the i-th class of the voice feature set X, n i The number of samples of the ith class of the voice feature set X;
in the method, in the process of the invention,the vector average value of all sample data of the voice feature set X is obtained, k is the class number, and n is the total sample number;
the second calculation unit is configured to calculate projection matrices of the speech signal feature set X and the physiological feature set Y respectively so as to minimize an inter-class correlation between the feature sets X and Y, and the following description is made with a projection matrix calculation procedure of X, and Y is available in the same way, specifically as follows:
First, an inter-class dispersion matrix S of a speech signal feature set X is calculated bx
Wherein,
phi when the correlation between classes is minimum bx T φ bx Is a diagonal matrix, also because of phi bx T φ bx Symmetrical semi-normal, there is the following transformation:
where P is the orthogonal eigenvector matrix,diagonal matrices ordered in descending order for non-negative real eigenvalues;
let Q be composed of r eigenvectors corresponding to r maximum non-zero eigenvalues in matrix P, corresponding to:
Q Tbx T φ bx )Q=A
then a projection matrix W of X can be obtained bx
Projection matrix W of Y is obtained in the same way by
The projection unit is used for projecting the X and the Y according to the projection matrix to obtain a projected voice feature set X p And a physiological feature set Y p
X p =W bx T X
Y p =W by T Y
Singular value decomposition SVD unit for diagonalizing the inter-set covariance matrix of the projected feature set using the SVD to obtain a speech feature set conversion matrix W x And physiological feature set transformation matrix W y The method is characterized by comprising the following steps:
wherein S is xy =X p Y p T B is a diagonal matrix with non-zero diagonal elements, wherein U, V can be obtained through SVD;
from the following componentsThe method can obtain:
wherein I is a unit array;
thus, a speech feature set conversion matrix W is obtained x And physiological feature set transformation matrix W y
W x =W cx T W bx T
W y =W cy T W by T
A third calculation unit for calculating the converted voice feature set X according to the conversion matrix dca And a physiological feature set Y dca
X dca =W x X
Y dca =W y Y
A second feature fusion unit for the voice feature set X dca And a physiological feature set Y dca Serial fusion is carried out to obtain a fusion feature set Z:
Z=[X dca Y dca ]。
the classifying module is used for training the classifier by utilizing the fusion feature set Z and performing lie detection classification on the voice sample.
Specific limitations regarding voice and radar dual sensor based lie detection systems may be found in the above limitations of voice and radar dual sensor based lie detection methods, and are not described in detail herein. The various modules in the voice and radar dual sensor based lie detection system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The present invention will be described in further detail with reference to examples.
Examples
The invention discloses a lie detection method based on a voice and radar dual sensor, which comprises the following steps:
step 1, synchronously acquiring a voice signal and a radar signal by using a microphone and a continuous wave radar;
step 2, noise reduction, sound event detection and pre-emphasis preprocessing are carried out on the voice signals acquired in the step 1;
Step 3, extracting characteristics of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to characteristics of voice modulation, pause, voice speed change and the like which possibly occur in a lie state of a person aiming at the preprocessed voice signal obtained in the step 2, and applying 6 statistical parameters of maximum value, minimum value, mean value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice characteristic set X;
step 4, demodulating and filtering the radar signals acquired in the step 1 to obtain respiratory signals and heartbeat signals;
step 5, according to the characteristics of possible respiratory 'inhibition' and rapid heartbeat of people in lie-state, the respiratory signal and the heartbeat signal obtained in the step 4 are respectively subjected to time domain, frequency domain and nonlinear characteristic extraction to obtain a respiratory characteristic set R and a heartbeat characteristic set H;
step 6, carrying out serial fusion on the respiratory feature set R obtained in the step 5 and the heartbeat feature set H to obtain a physiological feature set Y, and then carrying out feature fusion on the physiological feature set Y and the voice feature set X obtained in the step 3 by using a DCA algorithm to obtain a fusion feature set Z;
and 7, training a classifier by using the fusion feature set Z obtained in the step 6, and performing lie detection classification on the voice sample.
In connection with fig. 2, 3, 4, 5, the sox audio processing program has a good noise reduction effect on the main noise component, but cannot completely remove the noise; the improved spectral subtraction has strong noise reduction capability, but can misjudge effective speech segments. The SOX audio processing program is adopted to perform primary noise reduction, and then improved spectral subtraction is used to perform secondary noise reduction, so that good noise reduction effect can be achieved, and meanwhile, effective voice fragments cannot be lost.
With reference to fig. 6, due to the limitations of speech, lie detection classification accuracy on the speech feature set is 59.1% at maximum; the highest lie detection classification accuracy of the physiological feature set is 67.6%, which illustrates the validity of the physiological signal and the extracted physiological feature set for lie detection; the fusion feature set obtained by fusing the voice feature set and the physiological feature set by using the serial fusion and DCA algorithm obtains higher lie detection classification accuracy than that of the two feature sets, and the highest lie detection accuracy is 70.2%.
In conclusion, the voice and radar-based dual-sensor lie detection method and system provided by the invention not only can not bring uncomfortable feeling to a detected person, but also can eliminate the limitation of voice, effectively improve the lie detection accuracy, and are high in reliability and wide in applicability.
The foregoing has outlined and described the basic principles, features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. A lie detection method based on a voice and radar dual sensor, the method comprising the steps of:
step 1, synchronously acquiring a voice signal and a radar signal by using a microphone and a continuous wave radar;
step 2, noise reduction, sound event detection and pre-emphasis preprocessing are carried out on the voice signals acquired in the step 1;
step 3, extracting the characteristics of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to the preprocessed voice signals obtained in the step 2, and applying 6 statistical parameters of maximum value, minimum value, mean value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice characteristic set X;
Step 4, demodulating and filtering the radar signals acquired in the step 1 to obtain respiratory signals and heartbeat signals;
step 5, carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signal and the heartbeat signal obtained in the step 4 respectively to obtain a respiratory feature set R and a heartbeat feature set H;
and 5, respectively extracting time domain, frequency domain and nonlinear characteristics of the respiratory signal and the heartbeat signal obtained in the step 4 to obtain a respiratory characteristic set R and a heartbeat characteristic set H, wherein the method specifically comprises the following steps of:
step 5-1, carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signal to obtain a respiratory feature set R;
A. time domain features: extracting a breath amplitude mean value, a breath amplitude standard deviation, a breath average amplitude difference and a breath normalized average amplitude difference as time domain features of a breath signal; wherein,
(1) Mean value mu of respiratory amplitude x The expression is used for reflecting the average amplitude condition of respiration in lie detection process and is as follows:
wherein X (N) is the nth respiration sequence, N is the total number of the respiration sequences, and N is more than or equal to 1 and less than or equal to N;
(2) Standard deviation sigma of respiratory amplitude x The expression is used for reflecting the overall change condition of respiration in lie detection process, and is as follows:
(3) Average respiratory amplitude delta x The expression is used for reflecting the short-time change of the respiratory amplitude in lie detection process:
(4) Breath normalized average amplitude difference delta rx The expression is used for reflecting the influence of short-time variation of the respiratory amplitude on the overall variation in lie detection process:
B. frequency domain characteristics: extracting respiratory low frequency band F L Mid-respiratory frequency band F M And respiratory high frequency band F H The power spectrum amplitude average value of the three frequency bands is used as the frequency domain characteristic of the breathing signal; wherein F is L ,p 1 Hz,p 1 Hz≤F M <p 2 Hz,F H >p 2 Hz;
C. Nonlinear characteristics: extracting a respiration detrack fluctuation scale index and a respiration sample entropy as nonlinear characteristics of respiration signals;
(1) Respiratory detrence fluctuation scale index
The respiratory detrack fluctuation scale index is used for reflecting the nonstationary characteristic of respiratory signals in lie detection, and the computing steps are as follows:
1) Assuming that the respiratory sequence is X (n), the mean mu is calculated x
2) Calculating the cumulative difference y (n) of the breathing sequence:
3) Dividing y (n) into a windows with a window length of b in a non-overlapping manner;
4) Fitting a local trend y to each section of window length interval by using a least square method b (n) then removing the local trend of each interval to obtain a new respiratory sequence and calculating the root mean square F (n) of the new respiratory sequence:
5) Changing the size of the window length b and then repeating the steps until the required data volume is obtained;
6) According to the parameters calculated in the above steps, drawing a curve with log (n) as an abscissa and log [ F (n) ] as an ordinate, wherein the slope of the curve is the respiratory detrack fluctuation scale index of the respiratory sequence;
(2) Breath sample entropy
The breath sample entropy is used for evaluating the complexity of the breath signal in lie estimation, and the calculation steps are as follows:
1) The respiration time series is denoted as X (N), with m as window length, and is divided into s=n-m+1 respiration subsequences:
X m (t)=(X(t),X(t+1),…,X(t+m-1)),1≤t≤N-m+1
wherein X is m (t) is the t-th breath subsequence;
2) Definition of sequence X m (i) And sequence X m (j) The distance of the corresponding element is the absolute value of the maximum difference value of the corresponding element and the distance d between each breathing subsequence and all other breathing subsequences is calculated ij
d ij =max k=0,…,m-1 (|X m (i+k)-X m (j+k)|)
Wherein i is more than or equal to 1 and less than or equal to N-m, j is more than or equal to 1 and less than or equal to N-m, and i is not equal to j;
3) Calculating the standard deviation sigma of the respiratory amplitude x And defines a threshold f=r×σ x R is a constant, taking 0.1-0.25; the distance d calculated in the above 2) ij The ratio of the number less than or equal to F to s is recordedCalculate +.>Mean value phi of (1) m (t):
4) Changing window length to m+1, repeating steps 1) to 3) to obtain phi m+1 (t);
5) Calculating breath sample entropy samplen (t):
SampEn(t)=ln[φ m (t)]-ln[φ m+1 (t)]
step 5-2, carrying out time domain, frequency domain and nonlinear feature extraction on the heartbeat signal to obtain a respiratory feature set R;
A. time domain features: extracting a heartbeat amplitude mean value, a heartbeat amplitude standard deviation, a heartbeat average amplitude difference and a heartbeat normalized average amplitude difference as time domain features of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing feature extraction part;
B. Frequency domain characteristics: extracting the heartbeat low frequency band F L ' Heartbeat middle frequency band F M ' and heartbeat high frequency band F H ' the average value of the power spectrum amplitudes of the three frequency bands is used as the frequency domain characteristic of the heartbeat signal; wherein F is L '<p 3 Hz,p 3 Hz≤F M '<p 4 Hz,F H '>p 4 Hz,p 3 >p 1 ,p 4 >p 2
C. Nonlinear characteristics: extracting a heartbeat trending fluctuation scale index and a heartbeat sample entropy as nonlinear characteristics of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing characteristic extraction part;
step 6, fusing the respiratory feature set R obtained in the step 5 with the heartbeat feature set H to obtain a physiological feature set Y, and then performing feature fusion on the physiological feature set Y and the voice feature set X obtained in the step 3 to obtain a fusion feature set Z;
and 7, training a classifier by using the fusion feature set Z obtained in the step 6, and performing lie detection classification on the voice sample.
2. A voice and radar dual-sensor lie detection method according to claim 1, wherein the noise reduction of the voice signal acquired in step 1 in step 2 comprises the following steps:
step 2-1, recording a noise sample aiming at the noise of which the decibel exceeds a preset threshold value, namely main noise;
step 2-2, generating a noise sample configuration file by using the SOX audio processing program for the noise sample obtained in the step 2-1, and performing primary noise reduction on the voice signal according to the noise sample configuration file to remove main noise;
And 2-3, performing secondary noise reduction on the voice signal obtained in the step 2-2 after primary noise reduction by using improved spectral subtraction, and removing other types of noise except noise samples to obtain a pure voice signal.
3. The lie detection method based on the dual voice and radar sensor according to claim 1, wherein in the step 6, feature fusion is performed on the physiological feature set Y and the voice feature set X obtained in the step 3 to obtain a fused feature set Z, and the fusion feature set Z is realized specifically by using a DCA algorithm.
4. A voice and radar dual sensor based lie detection method according to claim 3, characterized in that step 6 comprises the following steps:
step 6-1, adopting a characteristic fusion mode of serial fusion for the respiratory characteristic set R and the heartbeat characteristic set H to obtain a physiological characteristic set Y:
Y=[RH]
step 6-2, calculating vector average values of each class and all feature data of the voice feature set X and the physiological feature set Y, and describing the process of the voice feature set X, wherein Y is the same as the following, and the specific steps are as follows:
in the method, in the process of the invention,vector mean value of ith class of samples of voice feature set X ij For the j-th sample of the i-th class of the voice feature set X, n i The number of samples of the ith class of the voice feature set X;
in the method, in the process of the invention,the vector average value of all sample data of the voice feature set X is obtained, k is the class number, and n is the total sample number;
Step 6-3, respectively calculating projection matrices of the voice signal feature set X and the physiological feature set Y to minimize the inter-class correlation of the feature sets X and Y, and describing the projection matrix calculation process of X, where Y is the same as the following, specifically:
first, an inter-class dispersion matrix S of a speech signal feature set X is calculated bx
Wherein,
phi when the correlation between classes is minimum bx T φ bx Is a diagonal matrix, also because of phi bx T φ bx Symmetrical semi-normal, there is the following transformation:
where P is the orthogonal eigenvector matrix,diagonal matrices ordered in descending order for non-negative real eigenvalues;
let Q be composed of r eigenvectors corresponding to r maximum non-zero eigenvalues in matrix P, corresponding to:
Q Tbx T φ bx )Q=A
then a projection matrix W of X can be obtained bx
Projection matrix W of Y is obtained in the same way by
Step 6-4, projecting X and Y according to the projection matrix calculated in the step 6-3 to obtain a projected voice feature set X p And a physiological feature set Y p
X p =W bx T X
Y p =W by T Y
Step 6-5, utilizing singular value decomposition SVD to diagonalize the inter-set covariance matrix of the projected feature set to obtain a voice feature set conversion matrix W x And physiological feature set transformation matrix W y The method is characterized by comprising the following steps:
wherein S is xy =X p Y p T B is a diagonal matrix with non-zero diagonal elements, wherein U, V can be obtained through SVD;
From the following componentsThe method can obtain:
wherein I is a unit array;
thus, a speech feature set conversion matrix W is obtained x And physiological feature set transformation matrix W y
W x =W cx T W bx T
W y =W cy T W by T
Step 6-6, calculating the converted voice feature set X according to the conversion matrix calculated in step 6-5 dca And a physiological feature set Y dca
X dca =W x X
Y dca =W y Y
Step 6-7, the speech feature set X calculated for step 6-6 dca And a physiological feature set Y dca Serial fusion is carried out to obtain a fusion feature set Z:
Z=[X dca Y dca ]。
5. the lie detection system based on the voice and radar double sensors is characterized by comprising a voice acquisition module, a radar acquisition module, a voice preprocessing module, a radar preprocessing module, a voice feature extraction module, a physiological feature extraction module, a feature fusion module and a classification module;
the voice acquisition module is used for acquiring voice signals by utilizing a microphone;
the radar acquisition module is used for acquiring radar signals by using a continuous wave radar;
the voice preprocessing module is used for carrying out noise reduction, sound event detection and pre-emphasis preprocessing on the collected voice signals;
the radar preprocessing module is used for demodulating and filtering the acquired radar signals to obtain respiratory signals and heartbeat signals;
the voice feature extraction module is used for extracting characteristics of fundamental frequency, sounding probability, short-time zero-crossing rate, frame root mean square energy and mel cepstrum coefficient according to the preprocessed voice signals, and applying 6 statistical parameters of maximum value, minimum value, mean value, standard deviation, skewness and kurtosis to the 5 characteristics to obtain a voice feature set X;
The physiological characteristic extraction module is used for extracting time domain, frequency domain and nonlinear characteristics of the respiratory signal and the heartbeat signal respectively to obtain a respiratory characteristic set R and a heartbeat characteristic set H;
the physiological characteristic extraction module comprises the following steps of:
the respiratory feature set acquisition unit is used for carrying out time domain, frequency domain and nonlinear feature extraction on the respiratory signals to obtain a respiratory feature set R;
A. time domain features: extracting a breath amplitude mean value, a breath amplitude standard deviation, a breath average amplitude difference and a breath normalized average amplitude difference as time domain features of a breath signal; wherein,
(1) Mean value mu of respiratory amplitude x The expression is used for reflecting the average amplitude condition of respiration in lie detection process and is as follows:
wherein X (N) is the nth respiration sequence, N is the total number of the respiration sequences, and N is more than or equal to 1 and less than or equal to N;
(2) Standard deviation sigma of respiratory amplitude x The expression is used for reflecting the overall change condition of respiration in lie detection process, and is as follows:
(3) Average respiratory amplitude delta x The expression is used for reflecting the short-time change of the respiratory amplitude in lie detection process:
(4) Breath normalized average amplitude difference delta rx The expression is used for reflecting the influence of short-time variation of the respiratory amplitude on the overall variation in lie detection process:
B. Frequency domain characteristics: extracting respiratory low frequency band F L Mid-respiratory frequency band F M And respiratory high frequency band F H The power spectrum amplitude average value of the three frequency bands is used as the frequency domain characteristic of the breathing signal; wherein F is L <p 1 Hz,p 1 Hz≤F M <p 2 Hz,F H >p 2 Hz;
C. Nonlinear characteristics: extracting a respiration detrack fluctuation scale index and a respiration sample entropy as nonlinear characteristics of respiration signals;
(1) Respiratory detrence fluctuation scale index
The respiratory detrack fluctuation scale index is used for reflecting the nonstationary characteristic of respiratory signals in lie detection, and the computing steps are as follows:
1) Assuming that the respiratory sequence is X (n), the mean mu is calculated x
2) Calculating the cumulative difference y (n) of the breathing sequence:
3) Dividing y (n) into a windows with a window length of b in a non-overlapping manner;
4) Fitting a local trend y to each section of window length interval by using a least square method b (n) then removing the local trend of each interval to obtain a new oneRespiration sequence and calculate root mean square F (n) for the new respiration sequence:
5) Changing the size of the window length b and then repeating the steps until the required data volume is obtained;
6) According to the parameters calculated in the above steps, drawing a curve with log (n) as an abscissa and log [ F (n) ] as an ordinate, wherein the slope of the curve is the respiratory detrack fluctuation scale index of the respiratory sequence;
(2) Breath sample entropy
The breath sample entropy is used for evaluating the complexity of the breath signal in lie estimation, and the calculation steps are as follows:
1) The respiration time series is denoted as X (N), with m as window length, and is divided into s=n-m+1 respiration subsequences:
X m (t)=(X(t),X(t+1),…,X(t+m-1)),1≤t≤N-m+1
wherein X is m (t) is the t-th breath subsequence;
2) Definition of sequence X m (i) And sequence X m (j) The distance of the corresponding element is the absolute value of the maximum difference value of the corresponding element and the distance d between each breathing subsequence and all other breathing subsequences is calculated ij
d ij =max k=0,…,m-1 (|X m (i+k)-X m (j+k)|)
Wherein i is more than or equal to 1 and less than or equal to N-m, j is more than or equal to 1 and less than or equal to N-m, and i is not equal to j;
3) Calculating the standard deviation sigma of the respiratory amplitude x And defines a threshold f=r×σ x R is a constant, taking 0.1-0.25; the distance d calculated in the above 2) ij The ratio of the number less than or equal to F to s is recordedCalculate +.>Mean value phi of (1) m (t):
4) Changing window length to m+1, repeating steps 1) to 3) to obtain phi m+1 (t);
5) Calculating breath sample entropy samplen (t):
SampEn(t)=ln[φ m (t)]-ln[φ m+1 (t)]
the respiratory feature set acquisition unit is used for extracting time domain, frequency domain and nonlinear features of the heartbeat signal to obtain a respiratory feature set R;
A. time domain features: extracting a heartbeat amplitude mean value, a heartbeat amplitude standard deviation, a heartbeat average amplitude difference and a heartbeat normalized average amplitude difference as time domain features of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing feature extraction part;
B. Frequency domain characteristics: extracting the heartbeat low frequency band F L ' Heartbeat middle frequency band F M ' and heartbeat high frequency band F H ' the average value of the power spectrum amplitudes of the three frequency bands is used as the frequency domain characteristic of the heartbeat signal; wherein F is L '<p 3 Hz,p 3 Hz≤F M '<p 4 Hz,F H '>p 4 Hz,p 3 >p 1 ,p 4 >p 2
C. Nonlinear characteristics: extracting a heartbeat trending fluctuation scale index and a heartbeat sample entropy as nonlinear characteristics of a heartbeat signal, wherein the specific calculation mode is the same as that of the breathing characteristic extraction part;
the feature fusion module is used for fusing the respiratory feature set R and the heartbeat feature set H to obtain a physiological feature set Y, and then carrying out feature fusion on the physiological feature set Y and the voice feature set X to obtain a fusion feature set Z;
the classifying module is used for training the classifier by utilizing the fusion feature set Z and performing lie detection classification on the voice sample.
6. The lie detection system based on dual voice and radar sensors of claim 5, wherein the voice preprocessing module comprises, in order:
the main noise acquisition unit is used for recording a noise sample aiming at the noise of which the decibel exceeds a preset threshold value, namely main noise;
the primary noise unit is used for generating a noise sample configuration file for the noise samples by using a SOX audio processing program, and carrying out primary noise reduction on the voice signals according to the noise sample configuration file to remove main noise;
And the secondary noise unit is used for carrying out secondary noise reduction on the voice signal subjected to primary noise reduction by utilizing improved spectral subtraction, removing other types of noise except noise samples and obtaining a pure voice signal.
7. The voice and radar dual sensor based lie detection system according to claim 6, characterized in that the feature fusion module comprises the following steps:
the first feature fusion unit is used for adopting a feature fusion mode of serial fusion for the respiratory feature set R and the heartbeat feature set H to obtain a physiological feature set Y:
Y=[RH]
the first calculating unit is configured to calculate a vector average value of each class and all feature data of the speech feature set X and the physiological feature set Y, and the following description is given by a process of the speech feature set X, where Y is the same as the following, specifically:
in the method, in the process of the invention,vector mean value of ith class of samples of voice feature set X ij For the j-th sample of the i-th class of the voice feature set X, n i The number of samples of the ith class of the voice feature set X;
in the method, in the process of the invention,the vector average value of all sample data of the voice feature set X is obtained, k is the class number, and n is the total sample number;
the second calculation unit is configured to calculate projection matrices of the speech signal feature set X and the physiological feature set Y respectively so as to minimize an inter-class correlation between the feature sets X and Y, and the following description is made with a projection matrix calculation procedure of X, and Y is available in the same way, specifically as follows:
First, an inter-class dispersion matrix S of a speech signal feature set X is calculated bx
Wherein,
phi when the correlation between classes is minimum bx T φ bx Is a diagonal matrix, also because of phi bx T φ bx Symmetrical semi-normal, there is the following transformation:
where P is the orthogonal eigenvector matrix,diagonal matrices ordered in descending order for non-negative real eigenvalues;
let Q be composed of r eigenvectors corresponding to r maximum non-zero eigenvalues in matrix P, corresponding to:
Q Tbx T φ bx )Q=A
then a projection matrix W of X can be obtained bx
Projection matrix W of Y is obtained in the same way by
The projection unit is used for projecting the X and the Y according to the projection matrix to obtain a projected voice feature set X p And a physiological feature set Y p
X p =W bx T X
Y p =W by T Y
Singular value decomposition SVD unit for diagonalizing the inter-set covariance matrix of the projected feature set using the SVD to obtain a speech feature set conversion matrix W x And physiological feature set transformation matrix W y The method is characterized by comprising the following steps:
wherein S is xy =X p Y p T B is a diagonal matrix with non-zero diagonal elements, wherein U, V can be obtained through SVD;
from the following componentsThe method can obtain:
wherein I is a unit array;
thus, a speech feature set conversion matrix W is obtained x And physiological feature set transformation matrix W y
W x =W cx T W bx T
W y =W cy T W by T
A third calculation unit for calculating the converted voice feature set X according to the conversion matrix dca And a physiological feature set Y dca
X dca =W x X
Y dca =W y Y
A second feature fusion unit for the voice feature set X dca And a physiological feature set Y dca Serial fusion is carried out to obtain a fusion feature set Z:
Z=[X dca Y dca ]。
CN202011492568.1A 2020-12-17 2020-12-17 Lie detection method and system based on voice and radar dual sensors Active CN112634871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011492568.1A CN112634871B (en) 2020-12-17 2020-12-17 Lie detection method and system based on voice and radar dual sensors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011492568.1A CN112634871B (en) 2020-12-17 2020-12-17 Lie detection method and system based on voice and radar dual sensors

Publications (2)

Publication Number Publication Date
CN112634871A CN112634871A (en) 2021-04-09
CN112634871B true CN112634871B (en) 2024-02-20

Family

ID=75316662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011492568.1A Active CN112634871B (en) 2020-12-17 2020-12-17 Lie detection method and system based on voice and radar dual sensors

Country Status (1)

Country Link
CN (1) CN112634871B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114052694B (en) * 2021-10-26 2023-09-05 珠海脉动时代健康科技有限公司 Radar-based heart rate analysis method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN110811647A (en) * 2019-11-14 2020-02-21 清华大学 Multi-channel hidden lie detection method based on ballistocardiogram signal
CN111195132A (en) * 2020-01-10 2020-05-26 高兴华 Non-contact lie detection and emotion recognition method, device and system
CN112017671A (en) * 2020-10-14 2020-12-01 杭州艺兴科技有限公司 Multi-feature-based interview content credibility evaluation method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100152600A1 (en) * 2008-04-03 2010-06-17 Kai Sensors, Inc. Non-contact physiologic motion sensors and methods for use

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN110811647A (en) * 2019-11-14 2020-02-21 清华大学 Multi-channel hidden lie detection method based on ballistocardiogram signal
CN111195132A (en) * 2020-01-10 2020-05-26 高兴华 Non-contact lie detection and emotion recognition method, device and system
CN112017671A (en) * 2020-10-14 2020-12-01 杭州艺兴科技有限公司 Multi-feature-based interview content credibility evaluation method and system

Also Published As

Publication number Publication date
CN112634871A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN107657964B (en) Depression auxiliary detection method and classifier based on acoustic features and sparse mathematics
CN105787420B (en) Method and device for biometric authentication and biometric authentication system
Safie et al. Electrocardiogram (ECG) biometric authentication using pulse active ratio (PAR)
US10433752B2 (en) Method for the real-time identification of seizures in an electroencephalogram (EEG) signal
Phua et al. Heart sound as a biometric
Palaniappan et al. Biometrics from brain electrical activity: A machine learning approach
Agrafioti et al. Heart biometrics: Theory, methods and applications
CN110123367B (en) Computer device, heart sound recognition method, model training device, and storage medium
Bugdol et al. Multimodal biometric system combining ECG and sound signals
Boumbarov et al. ECG personal identification in subspaces using radial basis neural networks
CN107980151B (en) Access control system based on electrocardio authentication and authentication method thereof
Guven et al. Biometric identification using fingertip electrocardiogram signals
Dan et al. An identification system based on portable EEG acquisition equipment
CN112634871B (en) Lie detection method and system based on voice and radar dual sensors
Guven et al. A novel biometric identification system based on fingertip electrocardiogram and speech signals
KR102630840B1 (en) EMG signal-based recognition information extraction system and EMG signal-based recognition information extraction method using the same
Nguyen et al. Motor imagery EEG-based person verification
Zehir et al. Support vector machine for human identification based on non-fiducial features of the ecg
Tan et al. Towards real time implementation of sparse representation classifier (SRC) based heartbeat biometric system
Sengupta et al. Optimization of cepstral features for robust lung sound classification
Matos et al. Biometric recognition system using low bandwidth ECG signals
Zabcikova et al. EEG-based lie detection using ERP P300 in response to known and unknown faces: An overview
Hendrawan et al. Identification of optimum segment in single channel EEG biometric system
WO2019153074A1 (en) System and method for liveness detection and automatic template updating using fusion of medical and non-medical biometrics
Biran et al. ECG bio-identification using Fréchet classifiers: A proposed methodology based on modeling the dynamic change of the ECG features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant