CN105679333A - Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method - Google Patents

Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method Download PDF

Info

Publication number
CN105679333A
CN105679333A CN201610123469.3A CN201610123469A CN105679333A CN 105679333 A CN105679333 A CN 105679333A CN 201610123469 A CN201610123469 A CN 201610123469A CN 105679333 A CN105679333 A CN 105679333A
Authority
CN
China
Prior art keywords
vocal
pressure
physical model
hilton
sac
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610123469.3A
Other languages
Chinese (zh)
Other versions
CN105679333B (en
Inventor
姚潇
徐宁
汤一彬
张�杰
高明生
蒋爱民
刘小峰
张学武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Campus of Hohai University
Original Assignee
Changzhou Campus of Hohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Campus of Hohai University filed Critical Changzhou Campus of Hohai University
Priority to CN201610123469.3A priority Critical patent/CN105679333B/en
Publication of CN105679333A publication Critical patent/CN105679333A/en
Application granted granted Critical
Publication of CN105679333B publication Critical patent/CN105679333B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4884Other medical applications inducing physiological or psychological stress, e.g. applications for stress testing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Psychology (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Developmental Disabilities (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Physiology (AREA)
  • Educational Technology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to a vocal cord-larynx ventricle-vocal track linked physical model and a mental pressure detection method. The physical model includes a mechanical equation set for describing a vocal cord motion model, and an aerodynamics equation set for describing pressure drop distribution in a glottis depth direction and a larynx ventricle-false vocal cord-vocal track direction. A physiological parameter estimation algorithm is designed through the established vocal cord-larynx ventricle-vocal track linked physical model, so that a physiological variation mechanism of phonation in a pressure state is researched. Physiological feature parameters of the vocal cords and the larynx ventricle when a speaker phonates in the pressure state are extracted, and a relation from real voice signals to physiological features is established. According to the estimated physiological parameters, variation features of various vocal organs and the flow state of airflow in the vocal organs under the influence of pressure variation factors are obtained, and the variation features are used for detection of the mental pressure. The detection recognition precision and reliability are improved.

Description

The physical model of vocal cords-Hilton's sac-sound channel linkage and stress detection method
Technical field
The present invention relates to a kind of intelligent sound technical field, particularly to the detection method utilizing voice technology to carry out stress.
Background technology
Mental pressure refers to that the thing that it is found that true or the imagination is beyond oneself physiology time unexpected and spiritual concentrated expression. Psychology shows, excessively heavy pressure can cause passiveness, painful stress, causes as the negative emotions such as depressed, nervous, anxiety and indignation, thus causing the decline of work efficiency and quality of life. Psychologist is thought, lives in for a long time and can cause serious mental illness under life stress, and makes human body that the sensitivity of disease to be increased, serious even cause cancer. And emotion and pressure also exist comparatively significant dependency relation, mental pressure can bring out multiple emotion, as excited, lose, be sick of, and under pressure source in various degree, intense strain in various degree can be caused, so emotion is it may be said that be a kind of external expressive form of pressure to a certain extent, and pressure can be quantified in degree.
The important embodiment mode of pressure one is speaker's voice when speaking, and becoming affects voice and produce a very important influence factor. When surrounding or words person's self-condition generation ANOMALOUS VARIATIONS, or a certain work mostly it is absorbed in due to user, speech recognition simply aids in the underwork of other work, in this process, at this moment due to the existence of operating pressure, speaker is subject to stress, interlocutor pronounces and will have large effect, thus creating abnormality, and the sound-variation of generation, and abnormality is often embodied in the middle of the voice of speaker, define the voice signal under pressure anomaly state.
But, the stressed speech under the stressed speech under stress, particularly multitask brain load pressure, relatively low from discrimination acoustically, it can not correctly be classified by general acoustic features, deficient in stability and robustness. Additionally, due to the generting machanism of stressed speech has with normally voice compares significant difference, in acoustic features aspect, the performance degree of variation state is relatively low, discrimination is relatively not high.Therefore, in detection process, we are difficult to improve the reliability of Stressful speech classification.
Summary of the invention
It is an object of the invention to provide the physical model of a kind of vocal cords-Hilton's sac-sound channel linkage, in order to obtain analog voice data.
In order to solve above-mentioned technical problem, the invention provides the physical model of a kind of vocal cords-Hilton's sac-sound channel linkage, including:
For describing the mechanical equation group of vocal cord movement pattern, for describing along the aerodynamic equation group of pressure drop distribution corresponding to glottis depth direction and Hilton's sac, false vocal cord and sound channel direction.
Further, described mechanical equation group includes:
m 1 d 2 x 1 dt 2 + r 1 dx 1 d t + s 1 ( x 1 ) + k c 12 ( x 1 - x 2 ) = F 1 - - - ( 1 )
m 2 d 2 x 2 dt 2 + r 2 dx 2 d t + s 2 ( x 2 ) + k c 12 ( x 2 - x 1 ) + k c 23 ( x 2 - x 3 ) = F 2 - - - ( 2 )
m 3 d 2 x 3 dt 2 + r 3 dx 3 d t + s 3 ( x 3 ) + k c 23 ( x 3 - x 2 ) = F 3 - - - ( 3 )
In above formula (1), (2) and (3),
m1、m2And m3Respectively for building three masses of vocal cords model, and these three masses are arranged in order;
x1、x2And x3The displacement that respectively three masses move in the vertical direction;
kc12And kc23The spring rate respectively coupled between two between three masses;
r1、r2And r3The respectively equivalent viscous of three masses;
F1、F2And F3Respectively force active force suffered by three masses; And
s1、s2And s3Represent the spring matched with three masses respectively, and be expressed as:
si(xi)=ki(xi+ηxi 3) i=1,2,3 (4)
In above formula (4), i represents i-th mass, kiRepresenting the stiffness coefficient of the spring matched with i-th mass, η is the nonlinear factor of spring.
Further, described aerodynamic equation group includes:
Along air force scholar's equation group of glottis depth direction, and air force scholar's equation group of pressure drop distribution corresponding to Hilton's sac, false vocal cord and sound channel direction.
Further, described air force scholar's equation group along glottis depth direction includes:
P s - P 11 = ( 1.00 + 0.37 ) ρU g 2 2 A g 1 2 - - - ( 5 )
P 1 - P 2 = 12 μd i l g 2 U g A g i 3 , i = 1 , 2 , 3 - - - ( 6 )
P 21 - P 12 = ρU g 2 2 ( 1 A g 1 2 - 1 A g 2 2 ) - - - ( 7 )
P 31 - P 22 = ρU g 2 2 ( 1 A g 1 2 - 1 A g 2 2 ) - - - ( 8 )
In above formula (5), (6), (7) and (8),
Pi1、Pi2Represent the pressure in i-th mass porch, exit;
AgiRepresent the static glottis gap sectional area corresponding to i-th mass;
UgRepresent glottis ripple, namely by the air velocity of glottis;
0.37 represents that air-flow produces vena contracta phenomenon due to long-pending suddenly the dropping of glottis entrance section, and what cause the vocal cords Pressure Drop with trachea junction affects loss coefficient; And
PsRepresenting that pressure subglottic is strong, ρ represents that atmospheric density, μ represent that shear viscosity coefficient, lg represent the length of vocal cords model, diRepresent the thickness of the vocal cords model corresponding with i-th mass.
Further, air force scholar's equation group of pressure drop distribution corresponding to described Hilton's sac, false vocal cord and sound channel direction includes:
P 32 - P v = - ρ 2 · 2 A g 2 A E ( 1 - A g 2 A E ) U g 2 - - - ( 9 )
P v - P f 1 = ρ 2 ( 1 A f 2 - 1 A v 2 ) U g 2 - - - ( 10 )
P f 1 - P f 2 = 12 μl f 2 d f A f 3 U g - - - ( 11 )
P f 2 - P 1 = - ρ 2 · 2 A f A 1 ( 1 - A f A 1 ) U g 2 - - - ( 12 )
In above formula (9), (10), (11) and (12),
Pv、AvRepresent pressure, Hilton's sac's sectional area in Hilton's sac, P respectivelyf1And Pf2Represent the pressure at false vocal cord two ends, A respectivelyfRepresent the sectional area of false vocal cord, AERepresent the sectional area of porch, Hilton's sac, A1、P1Represent the sectional area of sound channel porch, pressure respectively.
Another aspect, on the basis of the above-mentioned physical model setting up vocal cords-Hilton's sac-sound channel linkage, present invention also offers a kind of pressure detection method based on speech production modeling, to solve to realize stress test by voice.
Described pressure detection method includes:
Step S1, sets up the physical model of vocal cords-Hilton's sac-sound channel linkage;
Step S2, by described physical model, generates the analog voice signal under the relevant pressure in real world;
Step S3, according to physiological parameter algorithm for estimating, estimates speaker's corresponding physiological parameter during sounding under relevant pressure state, the physiological feature relation corresponding to set up voice signal;
Step S4, carries out the detection of stress according to physiological feature relation.
Further, described step S3 comprises the steps: according to physiological parameter algorithm for estimating
Step S31, obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S32, carries out Fourier transformation to residual signals and obtains the frequency spectrum of real speech;
Step S32, isolates high fdrequency component and low frequency component by residual signals by band filter, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Step S33, performs quadratic fit, namely obtains the frequency spectrum of described analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;
Step S34, is continually changing vocal cords sound channel physiological parameter, minimizes cost function to obtain so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
The third aspect, present invention also offers a kind of voice-based physiological parameter algorithm for estimating, to realize physiological parameter is estimated by real speech data and analog voice data.
Described voice-based physiological parameter algorithm for estimating, comprises the steps:
Step S1 ', it is thus achieved that the frequency spectrum of real speech and the initial value of quadratic fit;
Step S2 ', performs quadratic fit, to construct the cost function of the feature of overall importance based on voice spectrum;
Step S3 ', estimates physiological parameter according to cost function.
Further, the method obtaining the frequency spectrum of real speech and the initial value of quadratic fit in described step S1 ' comprises the steps:
Step S11 ', obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S12 ', residual signals is carried out Fourier transformation and obtains the frequency spectrum of real speech, and residual signals is isolated by band filter high fdrequency component and low frequency component, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Described step S2 ' performs quadratic fit, include based on the method for the cost function of the feature of overall importance of voice spectrum with structure: perform quadratic fit, namely analog voice signal is produced by described physical model, and obtain the frequency spectrum of this analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum; And
The method according to cost function, physiological parameter estimated in described step S3 ' includes:
In order to minimize cost function, it is continually changing vocal cords sound channel physiological parameter so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
Further, described cost function is C = 1 N Σ i = 1 N | l o g S ( ω i ) - logS * ( ω i ) | 2 , Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
The invention has the beneficial effects as follows, the physical model setting up vocal cords-Hilton's sac-sound channel linkage of the present invention, and by this design of physical model physiological parameter algorithm for estimating, so that the physiological variation mechanism of research sounding under pressure state, the physiological feature parameters such as vocal cords sound channel when namely extracting sounding under speaker's pressure state and Hilton's sac, set up the relation from actual speech signal to physiological feature; And according to the physiological parameter estimated, it is thus achieved that each phonatory organ and the wherein air flow press molding variation characteristic under pressure variance factor affects, finally in order to the detection of stress, improve precision and reliability that detection identifies.
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the present invention is further described.
Fig. 1 is the structure chart that the physical model of vocal cords-Hilton's sac-sound channel linkage that the present invention sets up is corresponding;
Fig. 2 is based on the method flow diagram of the pressure detection method of speech production modeling;
Fig. 3 is physiological parameter algorithm for estimating block diagram;
Fig. 4 is physiological parameter algorithm for estimating flow chart;
Fig. 5 is the experimental result picture that the present invention verifies effectiveness.
Detailed description of the invention
In conjunction with the accompanying drawings, the present invention is further detailed explanation. These accompanying drawings are the schematic diagram of simplification, and the basic structure of the present invention is only described in a schematic way, and therefore it only shows the composition relevant with the present invention.
Embodiment 1
As it is shown in figure 1, the physical model of a kind of vocal cords-Hilton's sac of the present invention-sound channel linkage, including:
For describing the mechanical equation group of vocal cord movement pattern, for describing along the aerodynamic equation group of pressure drop distribution corresponding to glottis depth direction and Hilton's sac, false vocal cord and sound channel direction.
Concrete, described mechanical equation group includes:
m 1 d 2 x 1 dt 2 + r 1 dx 1 d t + s 1 ( x 1 ) + k c 12 ( x 1 - x 2 ) = F 1 - - - ( 1 )
m 2 d 2 x 2 dt 2 + r 2 dx 2 d t + s 2 ( x 2 ) + k c 12 ( x 2 - x 1 ) + k c 23 ( x 2 - x 3 ) = F 2 - - - ( 2 )
m 3 d 2 x 3 dt 2 + r 3 dx 3 d t + s 3 ( x 3 ) + k c 23 ( x 3 - x 2 ) = F 3 - - - ( 3 )
In above formula (1), (2) and (3),
m1、m2And m3Respectively for building three masses of vocal cords model, and these three masses are arranged in order;
x1、x2And x3The displacement that respectively three masses move in the vertical direction;
kc12And kc23The spring rate respectively coupled between two between three masses;
r1、r2And r3The respectively equivalent viscous of three masses;
F1、F2And F3Respectively force active force suffered by three masses; And
s1、s2And s3Represent the spring matched with three masses respectively, and be expressed as:
si(xi)=ki(xi+ηxi 3) i=1,2,3 (4)
In above formula (4), i represents i-th mass, kiRepresenting the stiffness coefficient of the spring matched with i-th mass, η is the nonlinear factor of spring.
Under time dependant conditions, it is considered to the inertia of air quality, described aerodynamic equation group includes: along air force scholar's equation group of glottis depth direction, and air force scholar's equation group of pressure drop distribution corresponding to Hilton's sac, false vocal cord and sound channel direction.
Wherein, described air force scholar's equation group along glottis depth direction includes:
P s - P 1 = ( 1.00 + 0.37 ) ρU g 2 2 A g 1 2 - - - ( 5 )
P i 1 - P i 2 = 12 μd i l g 2 U g A g i 3 , i = 1 , 2 , 3 - - - ( 6 )
P 21 - P 12 = ρU g 2 2 ( 1 A g 1 2 - 1 A g 2 2 ) - - - ( 7 )
P 31 - P 22 = ρU g 2 2 ( 1 A g 1 2 - 1 A g 2 2 ) - - - ( 8 )
In above formula (5), (6), (7) and (8),
Pi1、Pi2Represent the pressure in i-th mass porch, exit;
AgiRepresent the static glottis gap sectional area corresponding to i-th mass;
UgRepresent glottis ripple, namely by the air velocity of glottis;
0.37 represents that air-flow produces vena contracta phenomenon due to long-pending suddenly the dropping of glottis entrance section, and what cause the vocal cords Pressure Drop with trachea junction affects loss coefficient; And
PsRepresenting that pressure subglottic is strong, ρ represents that atmospheric density, μ represent that shear viscosity coefficient, lg represent the length of vocal cords model, diRepresent the thickness of the vocal cords model corresponding with i-th mass.
When speaker is in pressure state, especially can make the air flow press molding near Hilton's sac and false vocal cord that comparatively significant change occurs. Therefore, this physical model needs to further expand, and namely needs to consider Hilton's sac, false vocal cord, sound channel direction (in supraglottic larynx portion air-flow) are set up corresponding equation group.
Concrete, air force scholar's equation group of pressure drop distribution corresponding to described Hilton's sac, false vocal cord and sound channel direction includes:
P 32 - P v = - ρ 2 · 2 A g 2 A E ( 1 - A g 2 A E ) U g 2 - - - ( 9 )
P v - P f 1 = ρ 2 ( 1 A f 2 - 1 A v 2 ) U g 2 - - - ( 10 )
P f 1 - P f 2 = 12 μl f 2 d f A f 3 U g - - - ( 11 )
P f 2 - P 1 = - ρ 2 · 2 A f A 1 ( 1 - A f A 1 ) U g 2 - - - ( 12 )
In above formula (9), (10), (11) and (12),
Pv、AvRepresent pressure, Hilton's sac's sectional area in Hilton's sac, P respectivelyf1And Pf2Represent the pressure at false vocal cord two ends, A respectivelyfRepresent the sectional area of false vocal cord, AERepresent the sectional area of porch, Hilton's sac, A1、P1Represent the sectional area of sound channel porch, pressure respectively.
According to Aerodynamics, under deriving analysis pressure state, the working motion mechanism of phonatory organ and the wherein stream pressure regularity of distribution, construct acoustic equivalent circuit with this, thus setting up the physical model of vocal cords-Hilton's sac-sound channel linkage of speech production. The motion of vocal cords is simulated, it is possible to by glottal air flow speed U by this physical modelgSpeed when can derive air-flow by lip, therefore namely the voice signal simulated is the differential-acoustic pressure of lip place air velocity. After simulation constructs different voice signals, it is possible to obtain the frequency spectrum of analog voice signal, be denoted as S*(ω)。
Embodiment 2
On embodiment 1 basis, the present embodiment 2 provides a kind of pressure detection method based on speech production modeling, including:
Step S1, sets up the physical model of vocal cords-Hilton's sac-sound channel linkage;
Step S2, by described physical model, generates the analog voice signal under the relevant pressure in real world;
Step S3, according to physiological parameter algorithm for estimating, estimates speaker's corresponding physiological parameter during sounding under relevant pressure state, the physiological feature relation corresponding to set up voice signal;
Step S4, carries out the detection of stress according to physiological feature relation.
This pressure detection method adopts the approximating method analyzing synthesis (AnalysisbySynthesis) to set up contacting of speech data in model and reality, the waveform of comparison model generation and the waveform of actual speech signal, by making cost function minimization search for optimal solution in solution space, by the matching of model, required parameter is estimated, and then realize the detection of stress.
Described step S3 comprises the steps: according to physiological parameter algorithm for estimating
Step S31, obtains the sound source information of real speech, i.e. residual signals by linear prediction (LPC);
Step S32, carries out Fourier transformation (FFT) and obtains the frequency spectrum of real speech residual signals;
Step S32, isolates high fdrequency component and low frequency component by residual signals by band filter, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Concrete, the locality cost function used in first matching (is divided into high frequency cost function, low frequency cost function), the main local feature using signal constructs, the voice affecting fundamental frequency and lightness can be produced driving source and be analyzed by locality cost function, by the seizure of voice high frequency components and analysis, flat type and scrambling in conjunction with frequency spectrum high frequency band harmonic structure, thus constructions cost function can more efficiently simulate stressed speech, pass through fundamental frequency estimation, namely fundamental frequency (F0) constructs low frequency cost function, and use loud and clear Luminance Analysis, namely SpectralFlatnessMeasure (SFM) reflects that the scrambling of frequency band harmonic structure is to carry out the structure of high frequency cost function.
For the models fitting of low frequency component, mainly estimate the k affecting voice signal fundamental frequency1And kc12(experiment proves k2、k3And kc23The generation of stressed speech is affected little). And the models fitting to high fdrequency component, estimated parameter is the Hilton's sac cross-sectional area parameter A of the reflection loud and clear brightness SFM of voiceV. By obtained k1、kc12And AVAs initial parameter value, it is brought in quadratic fit. Wherein about k1、kc12Definition, see the relevant discussion of embodiment 1.
Step S33, performs quadratic fit, namely obtains the frequency spectrum of described analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;
Concrete, analog voice signal carries out Fourier transformation by the residual signals that corresponding linear is predicted and obtains the frequency spectrum of analog voice signal, compares real speech and analog voice signal, constructs the feature cost function of overall importance based on voice spectrum.Cost function of overall importance can carry out directly overall situation structure by the frequency-domain spectrum of target voice, in order to captures and more gos deep into essential and more stable useful information.
Described cost function is C = 1 N Σ i = 1 N | l o g S ( ω i ) - l o g S * ( ω i ) | 2 , Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
Step S34, it is continually changing vocal cords sound channel physiological parameter, cost function (cost function of feature of overall importance) is minimized to obtain, described physical model is made to generate new voice signal, in solution space, optimal solution is searched for, it is achieved physiological parameter is estimated thereby through cost function minimization.
Concrete, the physiological parameter of estimation includes sound channel area on the stiffness coefficient of vocal cords, pressure subglottic, Hilton's sac's cross-sectional area and glottis. For optimal solution search method, it is possible to adopt Nelder-Mead simplex method, with the local search ability that it is very strong, it is possible to accelerate convergence of algorithm speed, it is achieved the high efficiency of searching algorithm.
Described step S4 carries out the detection of stress according to physiological feature relation.
Concrete, the physiological parameter that estimates is for the sensitivity of pressure state, it is possible to analyze sounding physiological system institute under pressure state affected. By KNN sorting algorithm under pressure lower variation voice and normal condition voice be identified classification, it is achieved in proposed method, physiological parameter is for the evaluation of the sensitivity of pressure state, thus the effectiveness of proposed method is verified.
Embodiment 3
On the basis of embodiment 1 and 2, the present embodiment 3 additionally provides a kind of voice-based physiological parameter algorithm for estimating, comprises the steps:
Step S1 ', it is thus achieved that the frequency spectrum of real speech and the initial value of quadratic fit;
Step S2 ', performs quadratic fit, to construct the cost function of the feature of overall importance based on voice spectrum;
Step S3 ', estimates physiological parameter according to cost function.
The method obtaining the frequency spectrum of real speech and the initial value of quadratic fit in described step S1 ' comprises the steps:
Step S11 ', obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S12 ', residual signals is carried out Fourier transformation and obtains the frequency spectrum of real speech, and residual signals is isolated by band filter high fdrequency component and low frequency component, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Described step S2 ' performs quadratic fit, include based on the method for the cost function of the feature of overall importance of voice spectrum with structure: perform quadratic fit, namely analog voice signal is produced by described physical model, and obtain the frequency spectrum of this analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum; And
The method according to cost function, physiological parameter estimated in described step S3 ' includes:
In order to minimize cost function, it is continually changing vocal cords sound channel physiological parameter so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
Described cost function is C = 1 N Σ i = 1 N | l o g S ( ω i ) - l o g S * ( ω i ) | 2 , Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
About the correlation step of physiological parameter algorithm for estimating in the present embodiment 3, referring to the corresponding description in embodiment 2, repeat no more here.
Embodiment 4
The effectiveness of the pressure detection method modeled based on speech production is tested, to further illustrate feasibility pressure state detected from the angle of voice.
In the present invention, the checking data adopted are all from telephone communication data, and wherein 100 subjectss (male 50 people, female 50 people) participate in experiment. In experiment, operator tested is chatted with each by phone, on average everyone four groups dialogues, and often organizing chatting time is 10 minutes, and records the most real voice communication data. In four groups of dialogues, two groups is the leisure chat under light state, in other two groups of dialogues, tested is applied in different types of pressure respectively, and the pressure of applying includes: (1) many tasks; (2) it is pressed for time; (3) venturing, detail is table 1 such as. The real speech data that tested people speaks under pressure state are logged for the checking of pressure detection method effectiveness.
Table 1
In order to verify the effectiveness of proposed method, the present invention and the voice-based detection method of tradition compare. The present invention passes through physical model, the simulation of the stressed speech signal under pressure in real world is generated, utilizes physiological parameter algorithm for estimating, the physiological feature parameter such as vocal cords sound channel when estimating sounding under speaker's pressure state and Hilton's sac. By physiological feature parameter and the acoustical characteristic parameters that traditional method proposes in the average recognition rate of pressure detecting compared with, illustrate there is obvious advantage based on the method for speech production modeling at pressure detection method.
With the above-mentioned desirable embodiment according to the present invention for enlightenment, by above-mentioned description, relevant staff in the scope not necessarily departing from this invention technological thought, can carry out various change and amendment completely. The technical scope of this invention is not limited to the content in description, it is necessary to determine its technical scope according to right.

Claims (10)

1. the physical model of vocal cords-Hilton's sac-sound channel linkage, it is characterised in that including:
For describing the mechanical equation group of vocal cord movement pattern, for describing along the aerodynamic equation group of pressure drop distribution corresponding to glottis depth direction and Hilton's sac, false vocal cord and sound channel direction.
2. the physical model of vocal cords-Hilton's sac according to claim 1-sound channel linkage, it is characterised in that
Described mechanical equation group includes:
m 1 d 2 x 1 dt 2 + r 1 dx 1 d t + s 1 ( x 1 ) + k c 12 ( x 1 - x 2 ) = F 1 - - - ( 1 )
m 2 d 2 x 2 dt 2 + r 2 dx 2 d t + s 2 ( x 2 ) + k c 12 ( x 2 - x 1 ) + k c 23 ( x 2 - x 3 ) = F 2 - - - ( 2 )
m 3 d 2 x 3 dt 2 + r 3 dx 3 d t + s 3 ( x 3 ) + k c 23 ( x 3 - x 2 ) = F 3 - - - ( 3 )
In above formula (1), (2) and (3),
m1、m2And m3Respectively for building three masses of vocal cords model, and these three masses are arranged in order;
x1、x2And x3The displacement that respectively three masses move in the vertical direction;
kc12And kc23The spring rate respectively coupled between two between three masses;
r1、r2And r3The respectively equivalent viscous of three masses;
F1、F2And F3Respectively force active force suffered by three masses; And
s1、s2And s3Represent the spring matched with three masses respectively, and be expressed as:
si(xi)=ki(xi+ηxi 3) i=1,2,3 (4)
In above formula (4), i represents i-th mass, kiRepresenting the stiffness coefficient of the spring matched with i-th mass, η is the nonlinear factor of spring.
3. the physical model of vocal cords-Hilton's sac according to claim 2-sound channel linkage, it is characterised in that described aerodynamic equation group includes:
Along air force scholar's equation group of glottis depth direction, and
Air force scholar's equation group of pressure drop distribution corresponding to Hilton's sac, false vocal cord and sound channel direction.
4. the physical model of vocal cords-Hilton's sac according to claim 3-sound channel linkage, it is characterised in that described air force scholar's equation group along glottis depth direction includes:
P s - P 11 = ( 1.00 + 0.37 ) ρU g 2 2 A g 1 2 - - - ( 5 )
P i 1 - P i 2 = 12 μd i l g 2 U g A g i 3 , i = 1 , 2 , 3 - - - ( 6 )
P 21 - P 12 = ρU g 2 2 ( 1 A g 1 2 - 1 A g 2 2 ) - - - ( 7 )
P 31 - P 22 = ρU g 2 2 ( 1 A g 1 2 - 1 A g 2 2 ) - - - ( 8 )
In above formula (5), (6), (7) and (8),
Pi1、Pi2Represent the pressure in i-th mass porch, exit;
AgiRepresent the static glottis gap sectional area corresponding to i-th mass;
UgRepresent glottis ripple, namely by the air velocity of glottis;
0.37 represents that air-flow produces vena contracta phenomenon due to long-pending suddenly the dropping of glottis entrance section, and what cause the vocal cords Pressure Drop with trachea junction affects loss coefficient; And
PsRepresenting that pressure subglottic is strong, ρ represents that atmospheric density, μ represent that shear viscosity coefficient, 1g represent the length of vocal cords model, diRepresent the thickness of the vocal cords model corresponding with i-th mass.
5. the physical model of vocal cords-Hilton's sac according to claim 4-sound channel linkage, it is characterised in that air force scholar's equation group of pressure drop distribution corresponding to described Hilton's sac, false vocal cord and sound channel direction includes:
P 32 - P v = - ρ 2 · 2 A g 2 A E ( 1 - A g 2 A E ) U g 2 - - - ( 9 )
P v - P f 1 = ρ 2 ( 1 A f 2 - 1 A v 2 ) U g 2 - - - ( 10 )
P f 1 - P f 2 = 12 μl f 2 d f A f 3 U g - - - ( 11 )
P f 2 - P 1 = - ρ 2 · 2 A f A 1 ( 1 - A f A 1 ) U g 2 - - - ( 12 )
In above formula (9), (10), (11) and (12),
Pv、AvRepresent pressure, Hilton's sac's sectional area in Hilton's sac, P respectivelyf1And Pf2Represent the pressure at false vocal cord two ends, A respectivelyfRepresent the sectional area of false vocal cord, AERepresent the sectional area of porch, Hilton's sac, A1、P1Represent the sectional area of sound channel porch, pressure respectively.
6. the pressure detection method based on speech production modeling, it is characterised in that including:
Step S1, sets up the physical model of vocal cords-Hilton's sac-sound channel linkage;
Step S2, by described physical model, generates the analog voice signal under the relevant pressure in real world;
Step S3, according to physiological parameter algorithm for estimating, estimates speaker's corresponding physiological parameter during sounding under relevant pressure state, the physiological feature relation corresponding to set up voice signal;
Step S4, carries out the detection of stress according to physiological feature relation.
7. pressure detection method according to claim 6, it is characterised in that comprise the steps: according to physiological parameter algorithm for estimating in described step S3
Step S31, obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S32, carries out Fourier transformation to residual signals and obtains the frequency spectrum of real speech;
Step S32, isolates high fdrequency component and low frequency component by residual signals by band filter, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Step S33, performs quadratic fit, namely obtains the frequency spectrum of described analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;
Step S34, is continually changing vocal cords sound channel physiological parameter, minimizes cost function to obtain so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
8. a voice-based physiological parameter algorithm for estimating, it is characterised in that comprise the steps:
Step S1 ', it is thus achieved that the frequency spectrum of real speech and the initial value of quadratic fit;
Step S2 ', performs quadratic fit, to construct the cost function of the feature of overall importance based on voice spectrum;
Step S3 ', estimates physiological parameter according to cost function.
9. physiological parameter algorithm for estimating according to claim 8, it is characterised in that
The method obtaining the frequency spectrum of real speech and the initial value of quadratic fit in described step S1 ' comprises the steps:
Step S11 ', obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S12 ', residual signals is carried out Fourier transformation and obtains the frequency spectrum of real speech, and residual signals is isolated by band filter high fdrequency component and low frequency component, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Described step S2 ' performs quadratic fit, include based on the method for the cost function of the feature of overall importance of voice spectrum with structure: perform quadratic fit, namely analog voice signal is produced by the physical model described in claim 1, and obtain the frequency spectrum of this analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;And
The method according to cost function, physiological parameter estimated in described step S3 ' includes:
In order to minimize cost function, it is continually changing vocal cords sound channel physiological parameter so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
10. physiological parameter algorithm for estimating according to claim 9, it is characterised in that described cost function is C = 1 N Σ i = 1 N | l o g S ( ω i ) - l o g S * ( ω i ) | 2 , Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
CN201610123469.3A 2016-03-03 2016-03-03 Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method Expired - Fee Related CN105679333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610123469.3A CN105679333B (en) 2016-03-03 2016-03-03 Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610123469.3A CN105679333B (en) 2016-03-03 2016-03-03 Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method

Publications (2)

Publication Number Publication Date
CN105679333A true CN105679333A (en) 2016-06-15
CN105679333B CN105679333B (en) 2019-04-12

Family

ID=56306752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610123469.3A Expired - Fee Related CN105679333B (en) 2016-03-03 2016-03-03 Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method

Country Status (1)

Country Link
CN (1) CN105679333B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133713A (en) * 2017-11-27 2018-06-08 苏州大学 A kind of method that sound channel area is estimated in the case where glottis closes phase
CN108601566A (en) * 2016-11-17 2018-09-28 华为技术有限公司 A kind of stress evaluating method and device
CN110367934A (en) * 2019-07-25 2019-10-25 深圳大学 A kind of health monitor method and monitoring system based on non-voice body sounds

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003058175A (en) * 2001-08-13 2003-02-28 Nippon Telegr & Teleph Corp <Ntt> Method of synthesizing pharyngeal sound source and apparatus for implementing this method
CN101502425A (en) * 2009-03-09 2009-08-12 西安交通大学 System and method for detecting characteristic of vocal cord vibration mechanics
CN103050042A (en) * 2012-12-04 2013-04-17 华东师范大学 Vocal cord quality distribution model and building method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003058175A (en) * 2001-08-13 2003-02-28 Nippon Telegr & Teleph Corp <Ntt> Method of synthesizing pharyngeal sound source and apparatus for implementing this method
CN101502425A (en) * 2009-03-09 2009-08-12 西安交通大学 System and method for detecting characteristic of vocal cord vibration mechanics
CN103050042A (en) * 2012-12-04 2013-04-17 华东师范大学 Vocal cord quality distribution model and building method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张吉伟等: "声带三质量块振动模型的研究", 《陕西师范大学学报》 *
张礼和等: "嘶音的三质量块声带模型分析方法", 《中国生物医学工程学报》 *
程启明等: "基于语音生成逆向解的嘶音合成方法", 《科技通报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108601566A (en) * 2016-11-17 2018-09-28 华为技术有限公司 A kind of stress evaluating method and device
CN108601566B (en) * 2016-11-17 2020-06-26 华为技术有限公司 Mental stress evaluation method and device
US11547334B2 (en) 2016-11-17 2023-01-10 Huawei Technologies Co., Ltd. Psychological stress estimation method and apparatus
CN108133713A (en) * 2017-11-27 2018-06-08 苏州大学 A kind of method that sound channel area is estimated in the case where glottis closes phase
CN108133713B (en) * 2017-11-27 2020-10-02 苏州大学 Method for estimating sound channel area under glottic closed phase
CN110367934A (en) * 2019-07-25 2019-10-25 深圳大学 A kind of health monitor method and monitoring system based on non-voice body sounds
CN110367934B (en) * 2019-07-25 2023-02-03 深圳大学 Health monitoring method and system based on non-voice body sounds

Also Published As

Publication number Publication date
CN105679333B (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN106205633B (en) It is a kind of to imitate, perform practice scoring system
Prasomphan Improvement of speech emotion recognition with neural network classifier by using speech spectrogram
CN109308731A (en) The synchronous face video composition algorithm of the voice-driven lip of concatenated convolutional LSTM
US11786171B2 (en) Method and system for articulation evaluation by fusing acoustic features and articulatory movement features
US20170154640A1 (en) Method and electronic device for voice recognition based on dynamic voice model selection
CN105206258A (en) Generation method and device of acoustic model as well as voice synthetic method and device
Patil et al. The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification
CN102332263A (en) Close neighbor principle based speaker recognition method for synthesizing emotional model
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN109887489A (en) Speech dereverberation method based on the depth characteristic for generating confrontation network
CN105679333A (en) Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method
Prasomphan Detecting human emotion via speech recognition by using speech spectrogram
Bozkurt et al. Improving automatic emotion recognition from speech signals
Chamoli et al. Detection of emotion in analysis of speech using linear predictive coding techniques (LPC)
Godin et al. Glottal waveform analysis of physical task stress speech
Srinivas et al. Optimization-based support vector neural network for speaker recognition
CN108175426A (en) A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition
Khaki et al. Continuous emotion tracking using total variability space.
CN105845131A (en) Far-talking voice recognition method and device
Gomes et al. i-vector algorithm with Gaussian Mixture Model for efficient speech emotion recognition
Ben-Youssef et al. Speech driven talking head from estimated articulatory features
JP4381404B2 (en) Speech synthesis system, speech synthesis method, speech synthesis program
Folorunso et al. Laughter signature: a novel biometric trait for person identification
Galvan et al. Audiovisual affect recognition in spontaneous filipino laughter
Sampath Kumar et al. A Real-Time Demo for Acoustic Event Classification in Ambient Assisted Living Contexts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190412