CN108175426A - A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition - Google Patents
A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition Download PDFInfo
- Publication number
- CN108175426A CN108175426A CN201711315604.5A CN201711315604A CN108175426A CN 108175426 A CN108175426 A CN 108175426A CN 201711315604 A CN201711315604 A CN 201711315604A CN 108175426 A CN108175426 A CN 108175426A
- Authority
- CN
- China
- Prior art keywords
- layer
- neural network
- recurrent neural
- boltzmann machine
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013528 artificial neural network Methods 0.000 claims abstract description 71
- 230000000306 recurrent effect Effects 0.000 claims abstract description 67
- 238000012549 training Methods 0.000 claims abstract description 58
- 230000000977 initiatory effect Effects 0.000 claims abstract description 7
- 238000010200 validation analysis Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 36
- 239000011159 matrix material Substances 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 22
- 210000002569 neuron Anatomy 0.000 claims description 16
- 210000004027 cell Anatomy 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 239000006185 dispersion Substances 0.000 claims description 4
- 230000008451 emotion Effects 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 abstract description 11
- 238000011156 evaluation Methods 0.000 abstract description 4
- 238000011160 research Methods 0.000 description 10
- 238000001514 detection method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000005036 nerve Anatomy 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000013551 empirical research Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000010197 meta-analysis Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000001766 physiological effect Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000010181 skin prick test Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- VLCQZHSMCYCDJL-UHFFFAOYSA-N tribenuron methyl Chemical compound COC(=O)C1=CC=CC=C1S(=O)(=O)NC(=O)N(C)C1=NC(C)=NC(OC)=N1 VLCQZHSMCYCDJL-UHFFFAOYSA-N 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/164—Lie detection
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Veterinary Medicine (AREA)
- Public Health (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Surgery (AREA)
- Psychiatry (AREA)
- Child & Adolescent Psychology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Physiology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Developmental Disabilities (AREA)
- Educational Technology (AREA)
- Hospice & Palliative Care (AREA)
- Psychology (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of lie detecting methods that Boltzmann machine is limited based on depth recursion type condition, first in continuous speech paragraph, be limited Boltzmann machine using condition has good modeling characteristic and easy reasoning process to time series, training sample is modeled, obtains the higher-order statistics whether speaker lies;Then the parameter training for having supervision is carried out to recurrent neural network with the label of the higher-order statistics and training sample.After acquisition condition is limited the initiation parameter of Boltzmann machine and recurrent neural network, the two basic network units are built from the bottom to top;And in validation data set, the parameter based on least square regression fine tuning recurrent neural network;Using the network of foundation, the phonic signal character of speaker is tested.The present invention can automatically derive detecting a lie as a result, and with relatively high discrimination, this method is not high to the professional knowledge and skill set requirements of evaluation and test person, there is higher testing efficiency.
Description
Technical field
The present invention relates to a kind of voice lie detection technologies, detect a lie especially with the voice messaging of speaker's context
Method.
Background technology
The basic principle of " detecting a lie " be psychology variation of the people when lying necessarily cause some physiological parameters (such as skin pricktest,
Heartbeat, blood pressure, breathing brain wave, sound) variation, usually it only restricted by vegetative nerve and be difficult to be controlled by brain mind.
Therefore polygraph is that a variety of subject crossings such as psychology and physiology are warm, and system pair is tested by electric-physiology parameter
Individual heart disguises one's intention to be detected with state.At present, the work of a large amount of psychology be all by facial expression, physiological activity and
The test scenarios as lie such as gesture.There are three types of the groundworks of lie research:Theoretical work (type, the shape of research deception
Formula and motivation etc.), empirical research (passing through experimental research to find to detect significant feature to lie) and lie is detected
The development of technology, be largely currently based on the research of posterior infromation all there is lack automation and adaptivity and
There is the shortcomings that certain subjective bias.
Sound and prosodic features are the common features of speech analysis, are also had in terms of voice mood analysis and identification important
Using.2009, Enos summarized about 200 kinds of sound and prosodic features, including the duration, stops in its doctoral thesis
, tone and loudness of a sound feature.Feature is extracted based on multi-dimensional time scale, from several seconds to entire sentence.1) tonality feature is from every section
The dullness area of voice obtains.In addition, a large amount of second order feature set includes:Fundamental tone maximum value, fundamental tone average value, fundamental tone minimum value,
Rising frame/decline frame/whole frame/field/has the fundamental tone number in acoustic frame, and the length of first/the last one slope rises from dropping to
Variation number and first/the last one/mean slope values.To these features, there are five types of standardized methods:Original value divided by average
Value, subtracts average value, feature Cumulative Distribution Function value, subtracts average value again divided by standard error;2) the basic energy feature of two classes
It is calculated:Each section of primary energy and the energy of voiced sound.This category feature also includes a large amount of secondary energy feature, such as minimum
Value, maximum value and average value etc.;3) (phoneme) protensive features:The maximum value and average value of the duration of a sound.The two similary features also table
It is now one in three kinds of forms:Original value is normalized using the duration of speaker, uses holding for entire sound bank
Continuous duration is normalized;4) other prosodic features, the slope of the fundamental tone of the ultima including speech, speech the
Duration of one syllable etc..
In terms of voice lie detection, presently used feature is all the branch of features described above, and difference lies in the statistics of feature
The difference for asking method and number.Ekman et al. acquires lie language material by way of interviewing cameo shot impression of view, by language
The fundamental frequency feature of material is for statistical analysis, and discovery lies voice segments compared with voice segments of telling the truth, and fundamental frequency is obviously improved.
Hansen et al. mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficients;MFCC) and
First-order difference, second differnce, auto-correlation, the cross-correlation of MFCC constructs one group of feature, in the method for neural network as grader
11 pressure ratings of sound point of speaker dependent are studied, the results showed that, compared with gentle state, features above is being pressed
The micro- shake of sound channel vocal organs of reacting condition under power state.2003, De Paulo et al. were to existing research work of detecting a lie
The meta analysis of 158 kinds of features that is itd is proposed in work research shows that, wherein having, 23 kinds of features performances are more apparent, including 16 kinds of voices and
Language correlated characteristic, such as compared to telling the truth, people will appear that duration of speaking shortens, presentation detail tails off, repeats when lying
Become phenomena such as more, fundamental frequency increases.The research group of Purdue Univ-West Lafayette USA is detected a lie using amplitude modulation and frequency modulation(PFM) model
Research, the results show that Teager energy correlated characteristics have the possibility for distinguishing true lie.But in the prior art, general survey
Lie method, which often relies on psychological study and the subjective evaluation and test of people, these methods, needs evaluation and test person to have stronger profession to know
Knowledge and technical ability, relatively inefficient, there is also largely subjective errors.Detect a lie cost too using posterior infromation
Height, and detect a lie that result can there are deviations sometimes.And the previous research to lie is mainly experimental by psychologist
Deception situation in carry out, while the concrete condition of experiment is recorded with image, these researchs are mostly just for voice
Intensity and tone, there is no utilize newest voice processing technology.Therefore, for the obstacle of current technology, voice lie detection is necessary
Using more complicated speech processing algorithm.
Invention content
Goal of the invention:In order to, solve in the prior art, to be detected a lie into using posterior infromation in view of the deficiencies of the prior art
This is excessively high and there are problems that deviation, and present invention proposition is a kind of to be limited detecting a lie for Boltzmann machine based on depth recursion type condition
Method.
Technical solution:In order to solve the above technical problems, the present invention uses following technical scheme:
A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition, is included the following steps:
Step 1: obtaining multiple voices as training sample, each training sample has respective affective tag, utilizes item
Part is limited the high-order characteristic information of Boltzmann machine extraction training sample speech feature vector;Non-supervisory instruction is carried out to training sample
The condition of getting is limited the parameter W of Boltzmann machinexh、Wx′hAnd Wx′x;WxhIt is that be limited Boltzmann machine visible for the condition of t moment
The connection weight matrix of node layer and hidden layer node, Wx′hBe the t-1 moment condition be limited the visible node layer of Boltzmann machine and
The connection weight matrix of hidden layer node, Wx′xIt is that the condition of t moment is limited the visible node layer of Boltzmann machine and t-1 moment
Condition is limited the connection weight matrix of the visible node layer of Boltzmann machine;
Step 2: exercised supervision using the label of training sample and the high-order characteristic information of extraction to recurrent neural network
Training obtains the recurrent neural network parameter W of initializationyn, WnxAnd Wnn;WynIt is the visible node layer of recurrent neural network of t moment
With the connection weight matrix of hidden layer node, WnxIt is the visible node layer of recurrent neural network and hidden layer node at t-1 moment
Connection weight matrix, WnnIt is that the recurrent neural network of t moment is shown in node layer and the visible node layer of the recurrent neural network at t-1 moment
Connection weight matrix;
Step 3: it is limited the parameter W of Boltzmann machine in acquisition conditionxh、Wx′hAnd Wx′x, recurrent neural network parameter Wyn,
WnxAnd WnnAfterwards, by condition be limited Boltzmann machine and recurrent neural network from the bottom to top erect come, realize from phonetic feature to
The mapping of affective tag, wherein the bottom of the visible layer of limited Boltzmann machine as whole network, is limited Boltzmann machine
Hidden layer node be connected with the visible layer of recurrent neural network, finally by the use of the hidden layer of recurrent neural network as whole network
Top layer;And recurrent neural network parameter is adjusted based on criterion of least squares in the validation data set of recurrent neural network
Wyn, WnxAnd Wnn;
Step 4: the speech feature vector of speaker is tested using newly-established network.
Further, the specific method of step 1 includes:
(1) training sample speech feature vector is set as x=[x1, x2..., xm]T, m is dimension;Speech feature vector is made
The input of Boltzmann machine is limited for condition;
(2) it is based onMaximal possibility estimation principle, using to sdpecific dispersion method, to training sample
It carries out unsupervised training and obtains the parameter W that condition is limited Boltzmann machinexh、Wx′hAnd Wx′x;
The speech feature vector x of t moment known in this way(t)With the speech feature vector x at t-1 moment(t-1), utilize mean field
Approximation obtains speech feature vector x(t)And x(t-1)High-order characteristic information
Wherein, s is sigmoid functions;cxIt is visible layer neuron and the amount of bias bias vector of hidden layer neuron.
Further, the method for the initiation parameter of acquisition recurrent neural network is in step 2:
If the label of training sample isBy high-order characteristic informationWith
The label of training sampleRespectively as outputting and inputting for recurrent neural network, exercise supervision training,
Obtain the initiation parameter of recurrent neural network.
Further, the affective tag, which is divided into, lies, suspects and three classes of not lying.
Further, the speech feature vector in step 1 obtains:Including 26 extracted using openEAR tool boxes
Frame level feature, and statistics calculating is carried out to the difference of this 26 frame level features and frame level feature with 19 statistical functions;Training sample
This duration is fixed, and 26 × 2 × 19=988 speech feature vector is extracted from each training sample.
Further, the condition in step 1 is limited the energy function of Boltzmann machine and is defined as:
In formulaBe in recurrent neural network visible layer node i t moment cell value,It is that visible node layer k exists
The cell value at t-p moment;hjRecurrent neural network in j-th of node of hidden layer h variable;σiIt is the side of visible layer node i
Difference;aiAnd cjIt is visible layer node i and the amount of bias of hidden layer node j;It it is the t-p moment between the visible node layer of t moment
Directed connection weight matrix;It is directed connection weight matrix of the t-p moment visible node layer to hidden layer node;WxhIt is
The visible node layer of t moment and the symmetrical connection weight matrix of hidden layer node;If give multiple visible layer past observing valuesHidden layer nodeCurrently
The observed value of moment visible node layer It is similar to limited Boltzmann machine visible layer nerve
The amount of bias of member and hidden layer neuron, condition are limited parameter in Boltzmann machineAnd WxhSolution also can be used it is logical
It crosses maximum-likelihood criterion to obtain, trained object function is:
Wherein, V′(t)Give multiple visible layer past observing valuesSo that pairInstruction
Practice the solution that process can generate nonzero value and nonnegative value, about ziParameter updates gradient:
B, c is the amount of bias parameter of visible layer neuron and hidden layer neuron respectivelyWithUpdate gradient,
And have:
Wherein<·>dataRepresent the expectation on training dataset,<·>modelRepresent the expectation in model profile, when
When giving random hidden layer vector h, it is seen that layer viThe state conditional probability that is v and given random training sample v, implicit
Layer unit hjState be 1 conditional probability be respectively:
N expression mean values are μ, variance σ2Gaussian Profile N (μ, σ2), S is sigmoid functions.
Further, the training in step 2 passes through minimumWithBetween estimation error go out recurrent neural net
The weight W of networkyn, WnxAnd WnnAnd amount of bias bnAnd by, it is assumed that the implicit number of plies of recurrent neural network is 1,Expression
For:
Wherein s is sigmoid functions, n(t-1)It is output, the b at recurrent neural network t-1 momentn、byIt is recurrence god respectively
Weight W through networknxAnd WnnCorresponding amount of bias.
Advantageous effect:The invention discloses a kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition,
First in continuous speech paragraph, being limited Boltzmann machine using condition has time series good modeling characteristic and easy
Reasoning process models training sample, obtains the higher-order statistics whether speaker lies;Then with the higher order statistical
The label of information and training sample carries out recurrent neural network the parameter training for having supervision.Boltzmann is limited in acquisition condition
After the initiation parameter of machine and recurrent neural network, the two basic network units are built from the bottom to top from the bottom to top.
And in validation data set, the parameter based on least square regression fine tuning recurrent neural network.During test, the net of foundation is utilized
Network tests the phonic signal character of speaker.The present invention can automatically derive detecting a lie as a result, and with relatively high
Discrimination, this method is not high to the professional knowledge and skill set requirements of evaluation and test person, and testing efficiency is higher.Experiment shows the present invention
Method can effectively identify whether speaker lies.
Description of the drawings
Fig. 1 is the network structure that the condition based on acoustic feature is limited Boltzmann machine;
Fig. 2 is limited Boltzmann machine structure chart for condition.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment is further described the present invention.
Fig. 2 is limited Boltzmann machine structure chart for condition,It is cell value of the visible layer node i in t moment,Being can
See cell values of the node layer k at the t-p moment.hjJ-th of node of hidden layer h variable.σiIt is the variance of visible layer node i.bi
And cjIt is visible layer node i and the amount of bias of hidden layer node j.It it is the t-p moment to having between the visible node layer of t moment
To connection weight matrix.It is directed connection weight matrix of the t-p moment visible node layer to hidden layer node.WvhWhen being t
Carve visible node layer and the symmetrical connection weight matrix of hidden layer node.If give multiple visible layer past observing valuesHidden layer node
The observed value of current time visible node layer
A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition, is included the following steps:
Step 1: obtaining multiple voices as training sample, each training sample has respective affective tag, this implementation
Example, which is divided into affective tag, lies, suspects and three classes of not lying;Boltzmann machine extraction training sample voice is limited using condition
The high-order characteristic information of feature vector;The parameter that unsupervised training obtains condition and is limited Boltzmann machine is carried out to training sample
Wxh、Wx′hAnd Wx′x;WxhIt is that the condition of current time t is limited the connection weight of the visible node layer of Boltzmann machine and hidden layer node
Weight matrix, Wx′hThe condition at t-1 moment is limited the connection weight matrix of the visible node layer of Boltzmann machine and hidden layer node, Wx′x
It is that the condition of current time t is limited the visible node layer of Boltzmann machine and the condition at t-1 moment is limited Boltzmann machine visible layer
The connection weight matrix of node,
It specifically includes:
(1) training sample speech feature vector is set as x=[x1, x2..., xm]T, m is dimension;Speech feature vector is made
The input of Boltzmann machine is limited for condition;
The energy function that condition is limited Boltzmann machine is defined as:
In formulaBe in recurrent neural network visible layer node i t moment cell value,It is that visible node layer k exists
The cell value at t-p moment;hjRecurrent neural network in j-th of node of hidden layer h variable;σiIt is the side of visible layer node i
Difference;aiAnd cjIt is visible layer node i and the amount of bias of hidden layer node j;It it is the t-p moment between the visible node layer of t moment
Directed connection weight matrix;It is directed connection weight matrix of the t-p moment visible node layer to hidden layer node;WxhIt is
The visible node layer of t moment and the symmetrical connection weight matrix of hidden layer node;If give multiple visible layer past observing valuesHidden layer nodeCurrently
The observed value of moment visible node layer It is similar to limited Boltzmann machine visible layer neuron
With the amount of bias of hidden layer neuron, condition is limited parameter in Boltzmann machineAnd WxhSolution also can be used pass through
Maximum-likelihood criterion obtains, and trained object function is:
Wherein
Wherein, V′(t)Give multiple visible layer past observing valuesSo that pairInstruction
Practice the solution that process can generate nonzero value and nonnegative value, about ziParameter updates gradient:
B, c is the amount of bias parameter of visible layer neuron and hidden layer neuron respectivelyWithUpdate gradient, and
Have:
Wherein<·>dataRepresent the expectation on training dataset,<·>modelRepresent the expectation in model profile, when
When giving random hidden layer vector h, it is seen that layer viThe state conditional probability that is v and given random training sample v, implicit
Layer unit hjState be 1 conditional probability be respectively:
N expression mean values are μ, variance σ2Gaussian Profile N (μ, σ2), S is sigmoid functions.
(2) it is based onMaximal possibility estimation principle, using to sdpecific dispersion method, to training sample into
Row unsupervised training obtains the parameter W that condition is limited Boltzmann machinexh、Wx′hAnd Wx′x;
The speech feature vector x of t moment known in this way(t)With the speech feature vector x at t-1 moment(t-1), utilize mean field
Approximation obtains speech feature vector x(t)And x(t-1)High-order characteristic information
Wherein, s is sigmoid functions;cxIt is visible layer neuron and the amount of bias bias vector of hidden layer neuron.
As shown in Figure 1, estimation acoustic feature, by acoustic feature vector x=[x1, x2..., xm]T(m is dimension) conduct
The input of GBRBM, is then based onMaximal possibility estimation principle carries out non-prison using CD methods to it
Supervise and instruct the parameter W for getting GBRBMxh、Wx′hAnd Wx′x.X known in this way(t)And x(t-1), acoustics is can obtain using mean field approximation
Feature x(t)And x(t-1)Acoustic feature high-order emotion statistical information
Wherein, speech feature vector acquisition methods are:The 26 frame level features extracted using openEAR tool boxes, and
Statistics calculating is carried out to the difference of this 26 frame level features and frame level feature with 19 statistical functions;The duration of training sample is consolidated
It is fixed, and 26 × 2 × 19=988 speech feature vector is extracted from each training sample.
Step 2: exercised supervision using the label of training sample and the high-order characteristic information of extraction to recurrent neural network
Training obtains the recurrent neural network parameter W of initializationyn, WnxAnd Wnn;WynIt is current time t recurrent neural network visible layer section
The connection weight matrix of point and hidden layer node, WnxIt is the visible node layer of recurrent neural network and hidden layer node at t-1 moment
Connection weight matrix WnnIt is that the recurrent neural network of current time t is shown in that node layer and the recurrent neural network at t-1 moment are visible
The connection weight matrix of node layer,
Specific method is:If the label of training sample isBy high-order characteristic information With the label of training sampleInput respectively as recurrent neural network and defeated
Go out, exercise supervision training, obtains the initiation parameter of recurrent neural network.
The training passes through minimumWithBetween estimation error go out the weight W of recurrent neural networkyn, WnxWith
WnnAnd amount of bias bnAnd by, it is assumed that the implicit number of plies of recurrent neural network is 1,It is expressed as:
Wherein s is sigmoid functions, n(t-1)It is output, the b at recurrent neural network t-1 momentn、byIt is recurrence god respectively
Weight W through networknxAnd WnnCorresponding amount of bias.
The parameter W of Boltzmann machine is limited in acquisition conditionxh、Wx′hAnd Wx′x, recurrent neural network parameter Wyn, WnxAnd Wnn
Afterwards, condition is limited Boltzmann machine and recurrent neural network to erect to come from the bottom to top, is realized from phonetic feature to emotion mark
The mapping of label, wherein the bottom of the visible layer of limited Boltzmann machine as whole network, is limited the implicit of Boltzmann machine
Node layer is connected with the visible layer of recurrent neural network, finally by the use of the hidden layer of recurrent neural network as the top of whole network
Layer;And recurrent neural network parameter W is adjusted based on criterion of least squares in the validation data set of recurrent neural networkyn, Wnx
And Wnn;
Step 4: the neural network using foundation tests the speech feature vector of speaker.
The present invention designs in specific implementation process and recorded CSC databases.Although in laboratory environments, ethics
And the considerations of practicality, eliminates the use of normal form, for example boggles, but because Scenario Design with develop subject " from
I shows " it is viewpoint, therefore subject is usually lured by material reward and cheated.It is extraction with lying related phonetic feature,
It is effective using speech terminals detection technology (Voice Activity Detection, the VAD) detection based on short-time energy first
Voice in effectively speak content, frame level feature, frame length 128ms are then extracted in content speaking.Due to emotion recognition
Object undivided voice when being long, be 640ms in time interval, when a length of 1280ms content of speaking on to frame level feature into
Row statistics calculates, and obtains the feature related with emotion.
For the recognition performance for performing time and speech emotional of balanced algorithm, 26 frames are extracted using openEAR tool boxes
Grade feature, and statistics calculating is carried out to this 26 frame level features and corresponding difference with 19 statistical functions.Therefore each fixation
Sentence on duration (1280ms) can extract to 26 × 2 × 19=988 feature.26 frame level features and 19 statistical functions point
It is listed in Tables 1 and 2.
1 26 frame level features of table
2 19 statistical functions of table
Then characteristic criterion is carried out, and be cascaded into feature vector to this 988 speech feature vectors.
(2) estimation condition is limited Boltzmann parameter
When estimation condition is limited Boltzmann parameter, during gradient declines, sdpecific dispersion algorithm is used primary lucky
The update to weight is completed in Buss sampling.The newer iteration step length of parameter is set as 0.0001, and learning rate 0.001 learns round
(epoch) it is 200, weights decay factor is 0.0002.For the acoustic feature CRBM in RNN-DRBM, node in hidden layer is set
It is set to 500.
(3) estimate the parameter of recurrent neural network
For the hidden layer RNN of recurrent neural network, node in hidden layer is set as 300.In this way recursion type condition by
The network structure for limiting Boltzmann machine is 988-500-300-x, and x represents the dimension of people's label of lying of network top.
(4) recurrent neural network parameter is finely tuned
In validation data set, the parameter of recurrent neural network is finely tuned, wherein iterative parameter is set as 200, convergence error
It is set as 0.0001.
Contrast experiment is carried out using the method for existing SVM, k- nearest neighbour method and the present invention respectively, is finally obtained such as table 3
Shown experimental result.
3 test result of table
As shown in Table 3, in the identification lied, algorithm of the invention is better than SVM methods and K- nearest neighbour methods.No matter for saying
Whether words people lies, suspects that discrimination of still not lying is above SVM methods and K- nearest neighbour methods.
By experimental result it is found that a kind of depth recursion type condition employed in this example is limited detecting a lie for Boltzmann machine
Whether method can be efficiently identified containing ingredient of lying in paragraph, so as to fulfill the function of detecting a lie, compared to based on psychology
The empirical lie detecting method of knowledge and some common automatic identification algorithms, the method for the present invention have relatively good identification
Performance.
Claims (7)
1. a kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition, it is characterised in that:Include the following steps:
Step 1: obtaining multiple voices as training sample, each training sample has respective affective tag, using condition by
Limit the high-order characteristic information of Boltzmann machine extraction training sample speech feature vector;Unsupervised training is carried out to training sample to obtain
The parameter W of Boltzmann machine is limited to conditionxh、Wx′hAnd Wx′x;WxhIt is that the condition of t moment is limited Boltzmann machine visible layer section
The connection weight matrix of point and hidden layer node, Wx′hIt is that the condition at t-1 moment is limited the visible node layer of Boltzmann machine and implicit
The connection weight matrix of node layer, Wx′xIt is that the condition of t moment is limited the visible node layer of Boltzmann machine and the condition at t-1 moment
The connection weight matrix of the visible node layer of limited Boltzmann machine;
Step 2: the instruction to be exercised supervision using the label of training sample and the high-order characteristic information of extraction to recurrent neural network
Practice, obtain the recurrent neural network parameter W of initializationyn, WnxAnd Wnn;WynBe t moment the visible node layer of recurrent neural network and
The connection weight matrix of hidden layer node, WnxIt is the visible node layer of recurrent neural network at t-1 moment and the company of hidden layer node
Meet weight matrix, WnnIt is that the recurrent neural network of t moment is shown in node layer and the visible node layer of the recurrent neural network at t-1 moment
Connection weight matrix;
Step 3: it is limited the parameter W of Boltzmann machine in acquisition conditionxh、Wx′hAnd Wx′x, recurrent neural network parameter Wyn, WnxWith
WnnAfterwards, condition is limited Boltzmann machine and recurrent neural network to erect to come from the bottom to top, is realized from phonetic feature to emotion
The mapping of label, wherein the bottom of the visible layer of limited Boltzmann machine as whole network, is limited the hidden of Boltzmann machine
It is connected containing node layer with the visible layer of recurrent neural network, finally by the use of the hidden layer of recurrent neural network as the top of whole network
Layer;And recurrent neural network parameter W is adjusted based on criterion of least squares in the validation data set of recurrent neural networkyn, Wnx
And Wnn;
Step 4: the speech feature vector of speaker is tested using newly-established network.
2. the lie detecting method according to claim 1 that Boltzmann machine is limited based on depth recursion type condition, feature are existed
In the specific method of step 1 is:
(1) training sample speech feature vector is set as x=[x1, x2..., xm]T, m is dimension;Using speech feature vector as condition
The input of limited Boltzmann machine;
(2) it is based onMaximal possibility estimation principle using to sdpecific dispersion method, carries out training sample non-
Supervised training obtains the parameter W that condition is limited Boltzmann machinexh、Wx′hAnd Wx′x;
The speech feature vector x of t moment known in this way(t)With the speech feature vector x at t-1 moment(t-1), utilize mean field approximation
Obtain speech feature vector x(t)And x(t-1)High-order characteristic information
Wherein, s is sigmoid functions;cxIt is visible layer neuron and the amount of bias bias vector of hidden layer neuron.
3. the lie detecting method according to claim 2 that Boltzmann machine is limited based on depth recursion type condition, feature are existed
In the method that the initiation parameter of recurrent neural network is obtained in step 2 is:
If the label of training sample isBy high-order characteristic informationAnd training
The label of sampleRespectively as outputting and inputting for recurrent neural network, exercise supervision training, obtains
The initiation parameter of recurrent neural network.
4. the lie detecting method according to any one of claims 1 to 3 that Boltzmann machine is limited based on depth recursion type condition,
It is characterized in that, the affective tag, which is divided into, lies, suspects and three classes of not lying.
5. the lie detecting method according to any one of claims 1 to 3 that Boltzmann machine is limited based on depth recursion type condition,
It is characterized in that:Speech feature vector in step 1 obtains:Including the 26 frame level spies extracted using openEAR tool boxes
Sign, and statistics calculating is carried out to the difference of this 26 frame level features and frame level feature with 19 statistical functions;Training sample when
It is long to fix, and 26 × 2 × 19=988 speech feature vector is extracted from each training sample.
6. the lie detecting method according to claim 1 or 2 that Boltzmann machine is limited based on depth recursion type condition, feature
It is:The energy function that condition in step 1 is limited Boltzmann machine is defined as:
In formulaBe in recurrent neural network visible layer node i t moment cell value,It is visible node layer k in t-p
The cell value at quarter;hjRecurrent neural network in j-th of node of hidden layer h variable;σiIt is the variance of visible layer node i;aiWith
cjIt is visible layer node i and the amount of bias of hidden layer node j;It it is the t-p moment to oriented between the visible node layer of t moment
Connection weight matrix;It is directed connection weight matrix of the t-p moment visible node layer to hidden layer node;WxhIt is t moment
It can be seen that node layer and the symmetrical connection weight matrix of hidden layer node;If give multiple visible layer past observing valuesHidden layer nodeCurrently
The observed value of moment visible node layerIt is similar to limited Boltzmann machine visible layer god
Amount of bias through member and hidden layer neuron, condition are limited parameter W in Boltzmann machinex′h,And WxhSolution also can be used it is logical
It crosses maximum-likelihood criterion to obtain, trained object function is:
Wherein
Wherein, V '(t)Give multiple visible layer past observing valuesSo that pairTrained
Journey can generate the solution of nonzero value and nonnegative value, about ziParameter updates gradient:
B, c is the amount of bias parameter of visible layer neuron and hidden layer neuron respectivelyWithUpdate gradient, and have:
Wherein<·>dataRepresent the expectation on training dataset,<·>modelThe expectation in model profile is represented, when given
During random hidden layer vector h, it is seen that layer viState be v conditional probability and given random training sample v, hidden layer list
First hjState be 1 conditional probability be respectively:
N expression mean values are μ, variance σ2Gaussian Profile N (μ, σ2), S is sigmoid functions.
7. the lie detecting method according to claim 3 that Boltzmann machine is limited based on depth recursion type condition, feature are existed
In:Training in step 2 passes through minimumWithBetween estimation error go out the weight W of recurrent neural networkyn, Wnx
And WnnAnd amount of bias bnAnd by, it is assumed that the implicit number of plies of recurrent neural network is 1,It is expressed as:
Wherein s is sigmoid functions, n(t-1)It is output, the b at recurrent neural network t-1 momentn、byIt is recurrent neural network respectively
Weight WnxAnd WnnCorresponding amount of bias.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711315604.5A CN108175426B (en) | 2017-12-11 | 2017-12-11 | Lie detection method based on deep recursion type conditional restricted Boltzmann machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711315604.5A CN108175426B (en) | 2017-12-11 | 2017-12-11 | Lie detection method based on deep recursion type conditional restricted Boltzmann machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108175426A true CN108175426A (en) | 2018-06-19 |
CN108175426B CN108175426B (en) | 2020-06-02 |
Family
ID=62546012
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711315604.5A Expired - Fee Related CN108175426B (en) | 2017-12-11 | 2017-12-11 | Lie detection method based on deep recursion type conditional restricted Boltzmann machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108175426B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109118763A (en) * | 2018-08-28 | 2019-01-01 | 南京大学 | Vehicle flowrate prediction technique based on corrosion denoising deepness belief network |
CN110009025A (en) * | 2019-03-27 | 2019-07-12 | 河南工业大学 | A kind of semi-supervised additive noise self-encoding encoder for voice lie detection |
CN110265063A (en) * | 2019-07-22 | 2019-09-20 | 东南大学 | A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis |
CN111616702A (en) * | 2020-06-18 | 2020-09-04 | 北方工业大学 | Lie detection analysis system based on cognitive load enhancement |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6925455B2 (en) * | 2000-12-12 | 2005-08-02 | Nec Corporation | Creating audio-centric, image-centric, and integrated audio-visual summaries |
WO2013149123A1 (en) * | 2012-03-30 | 2013-10-03 | The Ohio State University | Monaural speech filter |
CN106251880A (en) * | 2015-06-03 | 2016-12-21 | 创心医电股份有限公司 | Identify method and the system of physiological sound |
CN106782518A (en) * | 2016-11-25 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of audio recognition method based on layered circulation neutral net language model |
CN107146615A (en) * | 2017-05-16 | 2017-09-08 | 南京理工大学 | Audio recognition method and system based on the secondary identification of Matching Model |
-
2017
- 2017-12-11 CN CN201711315604.5A patent/CN108175426B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6925455B2 (en) * | 2000-12-12 | 2005-08-02 | Nec Corporation | Creating audio-centric, image-centric, and integrated audio-visual summaries |
WO2013149123A1 (en) * | 2012-03-30 | 2013-10-03 | The Ohio State University | Monaural speech filter |
CN106251880A (en) * | 2015-06-03 | 2016-12-21 | 创心医电股份有限公司 | Identify method and the system of physiological sound |
CN106782518A (en) * | 2016-11-25 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of audio recognition method based on layered circulation neutral net language model |
CN107146615A (en) * | 2017-05-16 | 2017-09-08 | 南京理工大学 | Audio recognition method and system based on the secondary identification of Matching Model |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109118763A (en) * | 2018-08-28 | 2019-01-01 | 南京大学 | Vehicle flowrate prediction technique based on corrosion denoising deepness belief network |
CN109118763B (en) * | 2018-08-28 | 2021-05-18 | 南京大学 | Vehicle flow prediction method based on corrosion denoising deep belief network |
CN110009025A (en) * | 2019-03-27 | 2019-07-12 | 河南工业大学 | A kind of semi-supervised additive noise self-encoding encoder for voice lie detection |
CN110009025B (en) * | 2019-03-27 | 2023-03-24 | 河南工业大学 | Semi-supervised additive noise self-encoder for voice lie detection |
CN110265063A (en) * | 2019-07-22 | 2019-09-20 | 东南大学 | A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis |
CN110265063B (en) * | 2019-07-22 | 2021-09-24 | 东南大学 | Lie detection method based on fixed duration speech emotion recognition sequence analysis |
CN111616702A (en) * | 2020-06-18 | 2020-09-04 | 北方工业大学 | Lie detection analysis system based on cognitive load enhancement |
Also Published As
Publication number | Publication date |
---|---|
CN108175426B (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110556129B (en) | Bimodal emotion recognition model training method and bimodal emotion recognition method | |
Cernak et al. | Characterisation of voice quality of Parkinson’s disease using differential phonological posterior features | |
Koolagudi et al. | IITKGP-SEHSC: Hindi speech corpus for emotion analysis | |
CN106073706B (en) | A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination | |
CN108175426A (en) | A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition | |
CN108899049A (en) | A kind of speech-emotion recognition method and system based on convolutional neural networks | |
Travieso et al. | Detection of different voice diseases based on the nonlinear characterization of speech signals | |
Caponetti et al. | Biologically inspired emotion recognition from speech | |
CN109727608A (en) | A kind of ill voice appraisal procedure based on Chinese speech | |
Hahm et al. | Parkinson's condition estimation using speech acoustic and inversely mapped articulatory data | |
Petroni et al. | Classification of infant cry vocalizations using artificial neural networks (ANNs) | |
He | Stress and emotion recognition in natural speech in the work and family environments | |
CN107452370A (en) | A kind of application method of the judgment means of Chinese vowel followed by a nasal consonant dysphonia patient | |
CN112618911B (en) | Music feedback adjusting system based on signal processing | |
Sharma et al. | Processing and analysis of human voice for assessment of Parkinson disease | |
Shinde et al. | Automated Depression Detection using Audio Features | |
Chou et al. | Bird species recognition by comparing the HMMs of the syllables | |
Mijić et al. | Classification of cognitive load using voice features: a preliminary investigation | |
Patil et al. | A review on emotional speech recognition: resources, features, and classifiers | |
Firdausillah et al. | Implementation of neural network backpropagation using audio feature extraction for classification of gamelan notes | |
Safdar et al. | Prediction of Specific Language Impairment in Children using Cepstral Domain Coefficients | |
Hair et al. | Assessing Posterior-Based Mispronunciation Detection on Field-Collected Recordings from Child Speech Therapy Sessions. | |
Marck et al. | Identification, analysis and characterization of base units of bird vocal communication: The white spectacled bulbul (Pycnonotus xanthopygos) as a case study | |
Singh et al. | Analyzing machine learning algorithms for speech impairment related issues | |
Zheng et al. | The Extraction Method of Emotional Feature Based on Children's Spoken Speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200602 |