CN105679333A - Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method - Google Patents
Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method Download PDFInfo
- Publication number
- CN105679333A CN105679333A CN201610123469.3A CN201610123469A CN105679333A CN 105679333 A CN105679333 A CN 105679333A CN 201610123469 A CN201610123469 A CN 201610123469A CN 105679333 A CN105679333 A CN 105679333A
- Authority
- CN
- China
- Prior art keywords
- vocal
- pressure
- physical model
- hilton
- sac
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 25
- 230000001755 vocal effect Effects 0.000 title claims abstract description 22
- 230000003340 mental effect Effects 0.000 title abstract description 5
- 210000001260 vocal cord Anatomy 0.000 claims abstract description 44
- 210000004704 glottis Anatomy 0.000 claims abstract description 23
- 238000001228 spectrum Methods 0.000 claims description 43
- 238000000034 method Methods 0.000 claims description 20
- 238000004519 manufacturing process Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 7
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 210000003437 trachea Anatomy 0.000 claims description 3
- 210000000056 organ Anatomy 0.000 abstract description 4
- 210000000867 larynx Anatomy 0.000 abstract description 3
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000008451 emotion Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000465 moulding Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000002832 anti-viral assay Methods 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/165—Evaluating the state of mind, e.g. depression, anxiety
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4884—Other medical applications inducing physiological or psychological stress, e.g. applications for stress testing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Psychiatry (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Heart & Thoracic Surgery (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Psychology (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Developmental Disabilities (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Physiology (AREA)
- Educational Technology (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention relates to a vocal cord-larynx ventricle-vocal track linked physical model and a mental pressure detection method. The physical model includes a mechanical equation set for describing a vocal cord motion model, and an aerodynamics equation set for describing pressure drop distribution in a glottis depth direction and a larynx ventricle-false vocal cord-vocal track direction. A physiological parameter estimation algorithm is designed through the established vocal cord-larynx ventricle-vocal track linked physical model, so that a physiological variation mechanism of phonation in a pressure state is researched. Physiological feature parameters of the vocal cords and the larynx ventricle when a speaker phonates in the pressure state are extracted, and a relation from real voice signals to physiological features is established. According to the estimated physiological parameters, variation features of various vocal organs and the flow state of airflow in the vocal organs under the influence of pressure variation factors are obtained, and the variation features are used for detection of the mental pressure. The detection recognition precision and reliability are improved.
Description
Technical field
The present invention relates to a kind of intelligent sound technical field, particularly to the detection method utilizing voice technology to carry out stress.
Background technology
Mental pressure refers to that the thing that it is found that true or the imagination is beyond oneself physiology time unexpected and spiritual concentrated expression. Psychology shows, excessively heavy pressure can cause passiveness, painful stress, causes as the negative emotions such as depressed, nervous, anxiety and indignation, thus causing the decline of work efficiency and quality of life. Psychologist is thought, lives in for a long time and can cause serious mental illness under life stress, and makes human body that the sensitivity of disease to be increased, serious even cause cancer. And emotion and pressure also exist comparatively significant dependency relation, mental pressure can bring out multiple emotion, as excited, lose, be sick of, and under pressure source in various degree, intense strain in various degree can be caused, so emotion is it may be said that be a kind of external expressive form of pressure to a certain extent, and pressure can be quantified in degree.
The important embodiment mode of pressure one is speaker's voice when speaking, and becoming affects voice and produce a very important influence factor. When surrounding or words person's self-condition generation ANOMALOUS VARIATIONS, or a certain work mostly it is absorbed in due to user, speech recognition simply aids in the underwork of other work, in this process, at this moment due to the existence of operating pressure, speaker is subject to stress, interlocutor pronounces and will have large effect, thus creating abnormality, and the sound-variation of generation, and abnormality is often embodied in the middle of the voice of speaker, define the voice signal under pressure anomaly state.
But, the stressed speech under the stressed speech under stress, particularly multitask brain load pressure, relatively low from discrimination acoustically, it can not correctly be classified by general acoustic features, deficient in stability and robustness. Additionally, due to the generting machanism of stressed speech has with normally voice compares significant difference, in acoustic features aspect, the performance degree of variation state is relatively low, discrimination is relatively not high.Therefore, in detection process, we are difficult to improve the reliability of Stressful speech classification.
Summary of the invention
It is an object of the invention to provide the physical model of a kind of vocal cords-Hilton's sac-sound channel linkage, in order to obtain analog voice data.
In order to solve above-mentioned technical problem, the invention provides the physical model of a kind of vocal cords-Hilton's sac-sound channel linkage, including:
For describing the mechanical equation group of vocal cord movement pattern, for describing along the aerodynamic equation group of pressure drop distribution corresponding to glottis depth direction and Hilton's sac, false vocal cord and sound channel direction.
Further, described mechanical equation group includes:
In above formula (1), (2) and (3),
m1、m2And m3Respectively for building three masses of vocal cords model, and these three masses are arranged in order;
x1、x2And x3The displacement that respectively three masses move in the vertical direction;
kc12And kc23The spring rate respectively coupled between two between three masses;
r1、r2And r3The respectively equivalent viscous of three masses;
F1、F2And F3Respectively force active force suffered by three masses; And
s1、s2And s3Represent the spring matched with three masses respectively, and be expressed as:
si(xi)=ki(xi+ηxi 3) i=1,2,3 (4)
In above formula (4), i represents i-th mass, kiRepresenting the stiffness coefficient of the spring matched with i-th mass, η is the nonlinear factor of spring.
Further, described aerodynamic equation group includes:
Along air force scholar's equation group of glottis depth direction, and air force scholar's equation group of pressure drop distribution corresponding to Hilton's sac, false vocal cord and sound channel direction.
Further, described air force scholar's equation group along glottis depth direction includes:
In above formula (5), (6), (7) and (8),
Pi1、Pi2Represent the pressure in i-th mass porch, exit;
AgiRepresent the static glottis gap sectional area corresponding to i-th mass;
UgRepresent glottis ripple, namely by the air velocity of glottis;
0.37 represents that air-flow produces vena contracta phenomenon due to long-pending suddenly the dropping of glottis entrance section, and what cause the vocal cords Pressure Drop with trachea junction affects loss coefficient; And
PsRepresenting that pressure subglottic is strong, ρ represents that atmospheric density, μ represent that shear viscosity coefficient, lg represent the length of vocal cords model, diRepresent the thickness of the vocal cords model corresponding with i-th mass.
Further, air force scholar's equation group of pressure drop distribution corresponding to described Hilton's sac, false vocal cord and sound channel direction includes:
In above formula (9), (10), (11) and (12),
Pv、AvRepresent pressure, Hilton's sac's sectional area in Hilton's sac, P respectivelyf1And Pf2Represent the pressure at false vocal cord two ends, A respectivelyfRepresent the sectional area of false vocal cord, AERepresent the sectional area of porch, Hilton's sac, A1、P1Represent the sectional area of sound channel porch, pressure respectively.
Another aspect, on the basis of the above-mentioned physical model setting up vocal cords-Hilton's sac-sound channel linkage, present invention also offers a kind of pressure detection method based on speech production modeling, to solve to realize stress test by voice.
Described pressure detection method includes:
Step S1, sets up the physical model of vocal cords-Hilton's sac-sound channel linkage;
Step S2, by described physical model, generates the analog voice signal under the relevant pressure in real world;
Step S3, according to physiological parameter algorithm for estimating, estimates speaker's corresponding physiological parameter during sounding under relevant pressure state, the physiological feature relation corresponding to set up voice signal;
Step S4, carries out the detection of stress according to physiological feature relation.
Further, described step S3 comprises the steps: according to physiological parameter algorithm for estimating
Step S31, obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S32, carries out Fourier transformation to residual signals and obtains the frequency spectrum of real speech;
Step S32, isolates high fdrequency component and low frequency component by residual signals by band filter, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Step S33, performs quadratic fit, namely obtains the frequency spectrum of described analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;
Step S34, is continually changing vocal cords sound channel physiological parameter, minimizes cost function to obtain so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
The third aspect, present invention also offers a kind of voice-based physiological parameter algorithm for estimating, to realize physiological parameter is estimated by real speech data and analog voice data.
Described voice-based physiological parameter algorithm for estimating, comprises the steps:
Step S1 ', it is thus achieved that the frequency spectrum of real speech and the initial value of quadratic fit;
Step S2 ', performs quadratic fit, to construct the cost function of the feature of overall importance based on voice spectrum;
Step S3 ', estimates physiological parameter according to cost function.
Further, the method obtaining the frequency spectrum of real speech and the initial value of quadratic fit in described step S1 ' comprises the steps:
Step S11 ', obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S12 ', residual signals is carried out Fourier transformation and obtains the frequency spectrum of real speech, and residual signals is isolated by band filter high fdrequency component and low frequency component, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Described step S2 ' performs quadratic fit, include based on the method for the cost function of the feature of overall importance of voice spectrum with structure: perform quadratic fit, namely analog voice signal is produced by described physical model, and obtain the frequency spectrum of this analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum; And
The method according to cost function, physiological parameter estimated in described step S3 ' includes:
In order to minimize cost function, it is continually changing vocal cords sound channel physiological parameter so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
Further, described cost function is Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
The invention has the beneficial effects as follows, the physical model setting up vocal cords-Hilton's sac-sound channel linkage of the present invention, and by this design of physical model physiological parameter algorithm for estimating, so that the physiological variation mechanism of research sounding under pressure state, the physiological feature parameters such as vocal cords sound channel when namely extracting sounding under speaker's pressure state and Hilton's sac, set up the relation from actual speech signal to physiological feature; And according to the physiological parameter estimated, it is thus achieved that each phonatory organ and the wherein air flow press molding variation characteristic under pressure variance factor affects, finally in order to the detection of stress, improve precision and reliability that detection identifies.
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the present invention is further described.
Fig. 1 is the structure chart that the physical model of vocal cords-Hilton's sac-sound channel linkage that the present invention sets up is corresponding;
Fig. 2 is based on the method flow diagram of the pressure detection method of speech production modeling;
Fig. 3 is physiological parameter algorithm for estimating block diagram;
Fig. 4 is physiological parameter algorithm for estimating flow chart;
Fig. 5 is the experimental result picture that the present invention verifies effectiveness.
Detailed description of the invention
In conjunction with the accompanying drawings, the present invention is further detailed explanation. These accompanying drawings are the schematic diagram of simplification, and the basic structure of the present invention is only described in a schematic way, and therefore it only shows the composition relevant with the present invention.
Embodiment 1
As it is shown in figure 1, the physical model of a kind of vocal cords-Hilton's sac of the present invention-sound channel linkage, including:
For describing the mechanical equation group of vocal cord movement pattern, for describing along the aerodynamic equation group of pressure drop distribution corresponding to glottis depth direction and Hilton's sac, false vocal cord and sound channel direction.
Concrete, described mechanical equation group includes:
In above formula (1), (2) and (3),
m1、m2And m3Respectively for building three masses of vocal cords model, and these three masses are arranged in order;
x1、x2And x3The displacement that respectively three masses move in the vertical direction;
kc12And kc23The spring rate respectively coupled between two between three masses;
r1、r2And r3The respectively equivalent viscous of three masses;
F1、F2And F3Respectively force active force suffered by three masses; And
s1、s2And s3Represent the spring matched with three masses respectively, and be expressed as:
si(xi)=ki(xi+ηxi 3) i=1,2,3 (4)
In above formula (4), i represents i-th mass, kiRepresenting the stiffness coefficient of the spring matched with i-th mass, η is the nonlinear factor of spring.
Under time dependant conditions, it is considered to the inertia of air quality, described aerodynamic equation group includes: along air force scholar's equation group of glottis depth direction, and air force scholar's equation group of pressure drop distribution corresponding to Hilton's sac, false vocal cord and sound channel direction.
Wherein, described air force scholar's equation group along glottis depth direction includes:
In above formula (5), (6), (7) and (8),
Pi1、Pi2Represent the pressure in i-th mass porch, exit;
AgiRepresent the static glottis gap sectional area corresponding to i-th mass;
UgRepresent glottis ripple, namely by the air velocity of glottis;
0.37 represents that air-flow produces vena contracta phenomenon due to long-pending suddenly the dropping of glottis entrance section, and what cause the vocal cords Pressure Drop with trachea junction affects loss coefficient; And
PsRepresenting that pressure subglottic is strong, ρ represents that atmospheric density, μ represent that shear viscosity coefficient, lg represent the length of vocal cords model, diRepresent the thickness of the vocal cords model corresponding with i-th mass.
When speaker is in pressure state, especially can make the air flow press molding near Hilton's sac and false vocal cord that comparatively significant change occurs. Therefore, this physical model needs to further expand, and namely needs to consider Hilton's sac, false vocal cord, sound channel direction (in supraglottic larynx portion air-flow) are set up corresponding equation group.
Concrete, air force scholar's equation group of pressure drop distribution corresponding to described Hilton's sac, false vocal cord and sound channel direction includes:
In above formula (9), (10), (11) and (12),
Pv、AvRepresent pressure, Hilton's sac's sectional area in Hilton's sac, P respectivelyf1And Pf2Represent the pressure at false vocal cord two ends, A respectivelyfRepresent the sectional area of false vocal cord, AERepresent the sectional area of porch, Hilton's sac, A1、P1Represent the sectional area of sound channel porch, pressure respectively.
According to Aerodynamics, under deriving analysis pressure state, the working motion mechanism of phonatory organ and the wherein stream pressure regularity of distribution, construct acoustic equivalent circuit with this, thus setting up the physical model of vocal cords-Hilton's sac-sound channel linkage of speech production. The motion of vocal cords is simulated, it is possible to by glottal air flow speed U by this physical modelgSpeed when can derive air-flow by lip, therefore namely the voice signal simulated is the differential-acoustic pressure of lip place air velocity. After simulation constructs different voice signals, it is possible to obtain the frequency spectrum of analog voice signal, be denoted as S*(ω)。
Embodiment 2
On embodiment 1 basis, the present embodiment 2 provides a kind of pressure detection method based on speech production modeling, including:
Step S1, sets up the physical model of vocal cords-Hilton's sac-sound channel linkage;
Step S2, by described physical model, generates the analog voice signal under the relevant pressure in real world;
Step S3, according to physiological parameter algorithm for estimating, estimates speaker's corresponding physiological parameter during sounding under relevant pressure state, the physiological feature relation corresponding to set up voice signal;
Step S4, carries out the detection of stress according to physiological feature relation.
This pressure detection method adopts the approximating method analyzing synthesis (AnalysisbySynthesis) to set up contacting of speech data in model and reality, the waveform of comparison model generation and the waveform of actual speech signal, by making cost function minimization search for optimal solution in solution space, by the matching of model, required parameter is estimated, and then realize the detection of stress.
Described step S3 comprises the steps: according to physiological parameter algorithm for estimating
Step S31, obtains the sound source information of real speech, i.e. residual signals by linear prediction (LPC);
Step S32, carries out Fourier transformation (FFT) and obtains the frequency spectrum of real speech residual signals;
Step S32, isolates high fdrequency component and low frequency component by residual signals by band filter, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Concrete, the locality cost function used in first matching (is divided into high frequency cost function, low frequency cost function), the main local feature using signal constructs, the voice affecting fundamental frequency and lightness can be produced driving source and be analyzed by locality cost function, by the seizure of voice high frequency components and analysis, flat type and scrambling in conjunction with frequency spectrum high frequency band harmonic structure, thus constructions cost function can more efficiently simulate stressed speech, pass through fundamental frequency estimation, namely fundamental frequency (F0) constructs low frequency cost function, and use loud and clear Luminance Analysis, namely SpectralFlatnessMeasure (SFM) reflects that the scrambling of frequency band harmonic structure is to carry out the structure of high frequency cost function.
For the models fitting of low frequency component, mainly estimate the k affecting voice signal fundamental frequency1And kc12(experiment proves k2、k3And kc23The generation of stressed speech is affected little). And the models fitting to high fdrequency component, estimated parameter is the Hilton's sac cross-sectional area parameter A of the reflection loud and clear brightness SFM of voiceV. By obtained k1、kc12And AVAs initial parameter value, it is brought in quadratic fit. Wherein about k1、kc12Definition, see the relevant discussion of embodiment 1.
Step S33, performs quadratic fit, namely obtains the frequency spectrum of described analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;
Concrete, analog voice signal carries out Fourier transformation by the residual signals that corresponding linear is predicted and obtains the frequency spectrum of analog voice signal, compares real speech and analog voice signal, constructs the feature cost function of overall importance based on voice spectrum.Cost function of overall importance can carry out directly overall situation structure by the frequency-domain spectrum of target voice, in order to captures and more gos deep into essential and more stable useful information.
Described cost function is Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
Step S34, it is continually changing vocal cords sound channel physiological parameter, cost function (cost function of feature of overall importance) is minimized to obtain, described physical model is made to generate new voice signal, in solution space, optimal solution is searched for, it is achieved physiological parameter is estimated thereby through cost function minimization.
Concrete, the physiological parameter of estimation includes sound channel area on the stiffness coefficient of vocal cords, pressure subglottic, Hilton's sac's cross-sectional area and glottis. For optimal solution search method, it is possible to adopt Nelder-Mead simplex method, with the local search ability that it is very strong, it is possible to accelerate convergence of algorithm speed, it is achieved the high efficiency of searching algorithm.
Described step S4 carries out the detection of stress according to physiological feature relation.
Concrete, the physiological parameter that estimates is for the sensitivity of pressure state, it is possible to analyze sounding physiological system institute under pressure state affected. By KNN sorting algorithm under pressure lower variation voice and normal condition voice be identified classification, it is achieved in proposed method, physiological parameter is for the evaluation of the sensitivity of pressure state, thus the effectiveness of proposed method is verified.
Embodiment 3
On the basis of embodiment 1 and 2, the present embodiment 3 additionally provides a kind of voice-based physiological parameter algorithm for estimating, comprises the steps:
Step S1 ', it is thus achieved that the frequency spectrum of real speech and the initial value of quadratic fit;
Step S2 ', performs quadratic fit, to construct the cost function of the feature of overall importance based on voice spectrum;
Step S3 ', estimates physiological parameter according to cost function.
The method obtaining the frequency spectrum of real speech and the initial value of quadratic fit in described step S1 ' comprises the steps:
Step S11 ', obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S12 ', residual signals is carried out Fourier transformation and obtains the frequency spectrum of real speech, and residual signals is isolated by band filter high fdrequency component and low frequency component, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Described step S2 ' performs quadratic fit, include based on the method for the cost function of the feature of overall importance of voice spectrum with structure: perform quadratic fit, namely analog voice signal is produced by described physical model, and obtain the frequency spectrum of this analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum; And
The method according to cost function, physiological parameter estimated in described step S3 ' includes:
In order to minimize cost function, it is continually changing vocal cords sound channel physiological parameter so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
Described cost function is Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
About the correlation step of physiological parameter algorithm for estimating in the present embodiment 3, referring to the corresponding description in embodiment 2, repeat no more here.
Embodiment 4
The effectiveness of the pressure detection method modeled based on speech production is tested, to further illustrate feasibility pressure state detected from the angle of voice.
In the present invention, the checking data adopted are all from telephone communication data, and wherein 100 subjectss (male 50 people, female 50 people) participate in experiment. In experiment, operator tested is chatted with each by phone, on average everyone four groups dialogues, and often organizing chatting time is 10 minutes, and records the most real voice communication data. In four groups of dialogues, two groups is the leisure chat under light state, in other two groups of dialogues, tested is applied in different types of pressure respectively, and the pressure of applying includes: (1) many tasks; (2) it is pressed for time; (3) venturing, detail is table 1 such as. The real speech data that tested people speaks under pressure state are logged for the checking of pressure detection method effectiveness.
Table 1
In order to verify the effectiveness of proposed method, the present invention and the voice-based detection method of tradition compare. The present invention passes through physical model, the simulation of the stressed speech signal under pressure in real world is generated, utilizes physiological parameter algorithm for estimating, the physiological feature parameter such as vocal cords sound channel when estimating sounding under speaker's pressure state and Hilton's sac. By physiological feature parameter and the acoustical characteristic parameters that traditional method proposes in the average recognition rate of pressure detecting compared with, illustrate there is obvious advantage based on the method for speech production modeling at pressure detection method.
With the above-mentioned desirable embodiment according to the present invention for enlightenment, by above-mentioned description, relevant staff in the scope not necessarily departing from this invention technological thought, can carry out various change and amendment completely. The technical scope of this invention is not limited to the content in description, it is necessary to determine its technical scope according to right.
Claims (10)
1. the physical model of vocal cords-Hilton's sac-sound channel linkage, it is characterised in that including:
For describing the mechanical equation group of vocal cord movement pattern, for describing along the aerodynamic equation group of pressure drop distribution corresponding to glottis depth direction and Hilton's sac, false vocal cord and sound channel direction.
2. the physical model of vocal cords-Hilton's sac according to claim 1-sound channel linkage, it is characterised in that
Described mechanical equation group includes:
In above formula (1), (2) and (3),
m1、m2And m3Respectively for building three masses of vocal cords model, and these three masses are arranged in order;
x1、x2And x3The displacement that respectively three masses move in the vertical direction;
kc12And kc23The spring rate respectively coupled between two between three masses;
r1、r2And r3The respectively equivalent viscous of three masses;
F1、F2And F3Respectively force active force suffered by three masses; And
s1、s2And s3Represent the spring matched with three masses respectively, and be expressed as:
si(xi)=ki(xi+ηxi 3) i=1,2,3 (4)
In above formula (4), i represents i-th mass, kiRepresenting the stiffness coefficient of the spring matched with i-th mass, η is the nonlinear factor of spring.
3. the physical model of vocal cords-Hilton's sac according to claim 2-sound channel linkage, it is characterised in that described aerodynamic equation group includes:
Along air force scholar's equation group of glottis depth direction, and
Air force scholar's equation group of pressure drop distribution corresponding to Hilton's sac, false vocal cord and sound channel direction.
4. the physical model of vocal cords-Hilton's sac according to claim 3-sound channel linkage, it is characterised in that described air force scholar's equation group along glottis depth direction includes:
In above formula (5), (6), (7) and (8),
Pi1、Pi2Represent the pressure in i-th mass porch, exit;
AgiRepresent the static glottis gap sectional area corresponding to i-th mass;
UgRepresent glottis ripple, namely by the air velocity of glottis;
0.37 represents that air-flow produces vena contracta phenomenon due to long-pending suddenly the dropping of glottis entrance section, and what cause the vocal cords Pressure Drop with trachea junction affects loss coefficient; And
PsRepresenting that pressure subglottic is strong, ρ represents that atmospheric density, μ represent that shear viscosity coefficient, 1g represent the length of vocal cords model, diRepresent the thickness of the vocal cords model corresponding with i-th mass.
5. the physical model of vocal cords-Hilton's sac according to claim 4-sound channel linkage, it is characterised in that air force scholar's equation group of pressure drop distribution corresponding to described Hilton's sac, false vocal cord and sound channel direction includes:
In above formula (9), (10), (11) and (12),
Pv、AvRepresent pressure, Hilton's sac's sectional area in Hilton's sac, P respectivelyf1And Pf2Represent the pressure at false vocal cord two ends, A respectivelyfRepresent the sectional area of false vocal cord, AERepresent the sectional area of porch, Hilton's sac, A1、P1Represent the sectional area of sound channel porch, pressure respectively.
6. the pressure detection method based on speech production modeling, it is characterised in that including:
Step S1, sets up the physical model of vocal cords-Hilton's sac-sound channel linkage;
Step S2, by described physical model, generates the analog voice signal under the relevant pressure in real world;
Step S3, according to physiological parameter algorithm for estimating, estimates speaker's corresponding physiological parameter during sounding under relevant pressure state, the physiological feature relation corresponding to set up voice signal;
Step S4, carries out the detection of stress according to physiological feature relation.
7. pressure detection method according to claim 6, it is characterised in that comprise the steps: according to physiological parameter algorithm for estimating in described step S3
Step S31, obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S32, carries out Fourier transformation to residual signals and obtains the frequency spectrum of real speech;
Step S32, isolates high fdrequency component and low frequency component by residual signals by band filter, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Step S33, performs quadratic fit, namely obtains the frequency spectrum of described analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;
Step S34, is continually changing vocal cords sound channel physiological parameter, minimizes cost function to obtain so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
8. a voice-based physiological parameter algorithm for estimating, it is characterised in that comprise the steps:
Step S1 ', it is thus achieved that the frequency spectrum of real speech and the initial value of quadratic fit;
Step S2 ', performs quadratic fit, to construct the cost function of the feature of overall importance based on voice spectrum;
Step S3 ', estimates physiological parameter according to cost function.
9. physiological parameter algorithm for estimating according to claim 8, it is characterised in that
The method obtaining the frequency spectrum of real speech and the initial value of quadratic fit in described step S1 ' comprises the steps:
Step S11 ', obtains the sound source information of real speech, i.e. residual signals by linear prediction;
Step S12 ', residual signals is carried out Fourier transformation and obtains the frequency spectrum of real speech, and residual signals is isolated by band filter high fdrequency component and low frequency component, and this high fdrequency component and low frequency component carry out once just matching respectively, and the relevant parameter corresponding just matching obtained is as the initial value of quadratic fit;
Described step S2 ' performs quadratic fit, include based on the method for the cost function of the feature of overall importance of voice spectrum with structure: perform quadratic fit, namely analog voice signal is produced by the physical model described in claim 1, and obtain the frequency spectrum of this analog voice signal, and then structure is based on the cost function of the feature of overall importance of voice spectrum;And
The method according to cost function, physiological parameter estimated in described step S3 ' includes:
In order to minimize cost function, it is continually changing vocal cords sound channel physiological parameter so that described physical model generates new voice signal, searches for optimal solution thereby through cost function minimization, it is achieved physiological parameter is estimated in solution space.
10. physiological parameter algorithm for estimating according to claim 9, it is characterised in that described cost function is Wherein
S*(ω) for the frequency spectrum of analog voice signal, and the frequency spectrum that S (ω) is actual speech signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610123469.3A CN105679333B (en) | 2016-03-03 | 2016-03-03 | Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610123469.3A CN105679333B (en) | 2016-03-03 | 2016-03-03 | Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105679333A true CN105679333A (en) | 2016-06-15 |
CN105679333B CN105679333B (en) | 2019-04-12 |
Family
ID=56306752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610123469.3A Expired - Fee Related CN105679333B (en) | 2016-03-03 | 2016-03-03 | Vocal cords-Hilton's sac-sound channel linkage physical model and stress detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105679333B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108133713A (en) * | 2017-11-27 | 2018-06-08 | 苏州大学 | A kind of method that sound channel area is estimated in the case where glottis closes phase |
CN108601566A (en) * | 2016-11-17 | 2018-09-28 | 华为技术有限公司 | A kind of stress evaluating method and device |
CN110367934A (en) * | 2019-07-25 | 2019-10-25 | 深圳大学 | A kind of health monitor method and monitoring system based on non-voice body sounds |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003058175A (en) * | 2001-08-13 | 2003-02-28 | Nippon Telegr & Teleph Corp <Ntt> | Method of synthesizing pharyngeal sound source and apparatus for implementing this method |
CN101502425A (en) * | 2009-03-09 | 2009-08-12 | 西安交通大学 | System and method for detecting characteristic of vocal cord vibration mechanics |
CN103050042A (en) * | 2012-12-04 | 2013-04-17 | 华东师范大学 | Vocal cord quality distribution model and building method thereof |
-
2016
- 2016-03-03 CN CN201610123469.3A patent/CN105679333B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003058175A (en) * | 2001-08-13 | 2003-02-28 | Nippon Telegr & Teleph Corp <Ntt> | Method of synthesizing pharyngeal sound source and apparatus for implementing this method |
CN101502425A (en) * | 2009-03-09 | 2009-08-12 | 西安交通大学 | System and method for detecting characteristic of vocal cord vibration mechanics |
CN103050042A (en) * | 2012-12-04 | 2013-04-17 | 华东师范大学 | Vocal cord quality distribution model and building method thereof |
Non-Patent Citations (3)
Title |
---|
张吉伟等: "声带三质量块振动模型的研究", 《陕西师范大学学报》 * |
张礼和等: "嘶音的三质量块声带模型分析方法", 《中国生物医学工程学报》 * |
程启明等: "基于语音生成逆向解的嘶音合成方法", 《科技通报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108601566A (en) * | 2016-11-17 | 2018-09-28 | 华为技术有限公司 | A kind of stress evaluating method and device |
CN108601566B (en) * | 2016-11-17 | 2020-06-26 | 华为技术有限公司 | Mental stress evaluation method and device |
US11547334B2 (en) | 2016-11-17 | 2023-01-10 | Huawei Technologies Co., Ltd. | Psychological stress estimation method and apparatus |
CN108133713A (en) * | 2017-11-27 | 2018-06-08 | 苏州大学 | A kind of method that sound channel area is estimated in the case where glottis closes phase |
CN108133713B (en) * | 2017-11-27 | 2020-10-02 | 苏州大学 | Method for estimating sound channel area under glottic closed phase |
CN110367934A (en) * | 2019-07-25 | 2019-10-25 | 深圳大学 | A kind of health monitor method and monitoring system based on non-voice body sounds |
CN110367934B (en) * | 2019-07-25 | 2023-02-03 | 深圳大学 | Health monitoring method and system based on non-voice body sounds |
Also Published As
Publication number | Publication date |
---|---|
CN105679333B (en) | 2019-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106205633B (en) | It is a kind of to imitate, perform practice scoring system | |
Prasomphan | Improvement of speech emotion recognition with neural network classifier by using speech spectrogram | |
CN109308731A (en) | The synchronous face video composition algorithm of the voice-driven lip of concatenated convolutional LSTM | |
US11786171B2 (en) | Method and system for articulation evaluation by fusing acoustic features and articulatory movement features | |
US20170154640A1 (en) | Method and electronic device for voice recognition based on dynamic voice model selection | |
CN105206258A (en) | Generation method and device of acoustic model as well as voice synthetic method and device | |
Patil et al. | The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification | |
CN102332263A (en) | Close neighbor principle based speaker recognition method for synthesizing emotional model | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN109887489A (en) | Speech dereverberation method based on the depth characteristic for generating confrontation network | |
CN105679333A (en) | Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method | |
Prasomphan | Detecting human emotion via speech recognition by using speech spectrogram | |
Bozkurt et al. | Improving automatic emotion recognition from speech signals | |
Chamoli et al. | Detection of emotion in analysis of speech using linear predictive coding techniques (LPC) | |
Godin et al. | Glottal waveform analysis of physical task stress speech | |
Srinivas et al. | Optimization-based support vector neural network for speaker recognition | |
CN108175426A (en) | A kind of lie detecting method that Boltzmann machine is limited based on depth recursion type condition | |
Khaki et al. | Continuous emotion tracking using total variability space. | |
CN105845131A (en) | Far-talking voice recognition method and device | |
Gomes et al. | i-vector algorithm with Gaussian Mixture Model for efficient speech emotion recognition | |
Ben-Youssef et al. | Speech driven talking head from estimated articulatory features | |
JP4381404B2 (en) | Speech synthesis system, speech synthesis method, speech synthesis program | |
Folorunso et al. | Laughter signature: a novel biometric trait for person identification | |
Galvan et al. | Audiovisual affect recognition in spontaneous filipino laughter | |
Sampath Kumar et al. | A Real-Time Demo for Acoustic Event Classification in Ambient Assisted Living Contexts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190412 |