WO2014178749A1 - Procédé de détermination du risque de développement de maladies chez un individu en fonction de sa voix et ensemble matériel-logiciel de mise en oeuvre de ce procédé - Google Patents

Procédé de détermination du risque de développement de maladies chez un individu en fonction de sa voix et ensemble matériel-logiciel de mise en oeuvre de ce procédé Download PDF

Info

Publication number
WO2014178749A1
WO2014178749A1 PCT/RU2013/000672 RU2013000672W WO2014178749A1 WO 2014178749 A1 WO2014178749 A1 WO 2014178749A1 RU 2013000672 W RU2013000672 W RU 2013000672W WO 2014178749 A1 WO2014178749 A1 WO 2014178749A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
individual
signal
voice
parameters
Prior art date
Application number
PCT/RU2013/000672
Other languages
English (en)
Russian (ru)
Inventor
Антон Павлович ЛЫСАК
Original Assignee
Общество С Ограниченной Ответственностью "Эм Ди Войс"
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Общество С Ограниченной Ответственностью "Эм Ди Войс" filed Critical Общество С Ограниченной Ответственностью "Эм Ди Войс"
Publication of WO2014178749A1 publication Critical patent/WO2014178749A1/fr

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor

Definitions

  • the invention relates to medicine and is intended to study the functional state of the vocal folds.
  • the invention also relates to information and network technologies used in medicine, namely, to an electronic information system that provides the formation and visual display on the screen of a terminal device of an individual (user) of a system of information about the state of his voice folds according to the parameters of the voice signal.
  • the claimed invention is an expandable, modifiable, modular and interactive tool for analysis and visualization of the functional state of voice folds, designed to inform the user about the current state of his voice and the likelihood of a disease.
  • the user can monitor the functional state of their vocal folds with the aim of early diagnosis of diseases of the larynx and timely prevention of chronic diseases associated with the vocal folds, as well as associated with some diseases of the upper respiratory tract, nervous system and other diseases, the marker of which may be change in the state of the vocal folds.
  • the technology is implemented in a simple and understandable form, and the likelihood of a disease is assessed by the results of a simple speech test, which the user of the system can pass independently without involving a phoniatrist specialist, outside the clinic, at a convenient time for him.
  • Diagnosis of throat diseases according to the results of the assessment of voice changes associated with changes in the fluctuations of the vocal folds, which lead to a change in voice quality, for example, a change in voice to hoarse, rough, etc.
  • the reason for such changes may be disturbances in the process of vibrations of the vocal folds, which arise as a result of the presence of a pathology that changes the behavior of the folds during phonation.
  • certain methods for assessing the probability of diseases of the vocal folds are known, based on recording a voice signal with subsequent analysis based on the study of the frequency track of the fundamental tone of the voice signal and obtaining various speech parameters that describe the functional state of the individual's voice folds.
  • a method based on the study of random fluctuations in the pitch period is the Jitter effect (“USE OF PERIODICITY AND JITTER AS SPEECH RECOGNITION FEATURES” DL Thomson and R.
  • Closest to the claimed solution is a system and method of voice analysis for the diagnosis of diseases of the vocal folds (application US 2008/0300867, IPC: G10L11 / 04), based on the assessment of quantitative indicators of vibration of the vocal fold by analyzing recordings of images of the larynx obtained using endoscopic equipment during time for speech reproduction, and analysis of the waveform of the acoustic signal obtained using sound recording devices.
  • the analytical processing of the acoustic signal is carried out using software and hardware and methods for calculating indicators characterizing including Jitter effect, Flicker effect, turbulent noise level.
  • the objective of the invention is the creation of a new method and hardware-software complex (system) for assessing the physiological parameters of the sound of the vocal folds by processing the parameters of the functional state of the vocal folds and displaying these results in a clear and intuitive way.
  • the technical result is to increase the reliability of determining the risk of any
  • the results of visualization of the indicators of the functional state of the vocal folds are presented in a form accessible for perception by a person who does not have special medical training.
  • Indicators of the functional state of the vocal folds can be presented in the form of graphic images or in combination with text information that does not require a medical education to understand.
  • the hardware-software complex (system) for determining the risk of developing an individual’s disease by his voice includes an individual’s terminal device with an individual voice recording module located in it, a voice recording recording control module configured to select a sampling frequency and duration recording a voice signal, a computing module configured to translate the recorded voice signal from analog to digital signal, the module from the display of information on the monitor of the terminal device of an individual obtained from the voice signal analysis unit, configured to determine for the recorded voice signal at least one parameter from the group characterizing the Jitter effect and / or the Shimmer effect ) and / or physiological properties of the vocal folds and / or noise level in the voice signal and a parameter characterizing the nonlinearity of the voice signal with subsequent construction of the vector in the N-dimensional space of the voice signal parameters the individual, where N is the number of groups used, and by determining the posterior probability of the vector being preliminarily formed in multidimensional space for norm and pathology by calculating probability density functions for norm and pathology.
  • the voice signal analysis unit is configured to form a multidimensional space for norm and pathology using aggregating functions for each group of parameters.
  • the voice signal analysis unit may be located in the terminal device of the individual.
  • the voice analysis unit the signal can be located on the remote access server, while the hardware-software complex further comprises an Internet connection module, which is located in the terminal device of the individual and is configured to receive and transmit a digital signal to the voice signal analysis unit.
  • the voice signal analysis unit includes a database with probability density distribution functions for voice signals in normal and pathological conditions.
  • An individual uses a mobile phone, smartphone, personal computer, laptop, tablet computer as a terminal device, and the voice signal analysis unit is configured to calculate parameters on x86, x64, ARM, MIPS platforms using the following operating systems: Windows, Linux, MacOS family, iOS, Android.
  • the computing module is configured to generate a voice signal from the recorded continuous speech by processing it by extracting individual stressed vowels from the continuous speech.
  • the method of determining the risk of developing an individual’s disease by his voice using a hardware-software complex includes recording an individual’s voice signal, consisting of a set of vowels, or forming the said voice signal from recorded continuous speech, followed by its analysis, including determining for a recorded voice signal at least one parameter from a group characterizing the Jitter effect and / or the Shimmer effect and / or physiologists properties of the vocal folds and / or noise level in the vocal signal and non-linearity of the vocal signal with subsequent construction of the vector of the N-dimensional space of the individual voice signal parameters, where N is the number of groups used and the posterior probability of the resulting vector for belonging to pre-formed multidimensional spaces for the norm and pathology by calculating the probability density function for the norm and pathology, while forming a multidimensional space for the norm and pathology in the presence of more than one parameter in the group is carried out using aggregating functions, which are calculated for each group of parameters.
  • An individual’s voice signal is recorded using a microphone, and the recorded signal is sent to the voice signal analysis unit located in the terminal device of the individual and / or on the remote server.
  • An analysis of an individual's voice signal is carried out using a hardware-software complex made on x86, x64, ARM, MIPS platforms using the families of operating systems: Windows, Linux, MacOS, iOS, Android.
  • the formation of the voice signal from the recorded continuous speech is carried out on the computing module of the terminal device of the individual by processing it by extracting individual stressed vowels from the continuous speech.
  • the number of vowels is selected at least two, one of which is closed, the second is open, and the duration of vowels is at least five seconds.
  • the total duration of the vowel sound in the set, composed of the selected fragments from continuous speech is at least 10 s.
  • Aggregate functions are determined using the principal component method.
  • N-dimensional spaces for norm and pathology is carried out using databases of sound signals of individual voices in norm and pathology, respectively.
  • An individual's voice signal is recorded in the form of a sound wave of the pulse-code modulation format, "Mono", with a sampling frequency of not less than 16 kHz.
  • the method provides for the re-recording of an individual's voice signal and its analysis to obtain parameters that are compared with previously obtained parameters with the determination of the level of deviation, which is used to judge the dynamics of the probability of illness of the vocal folds of an individual.
  • the parameters of the Jitter effect are obtained by determining the fundamental frequency from the recorded voice signal of the track, followed by analysis of the fundamental frequency oscillations.
  • the parameters of the “Shimmer” effect are obtained by determining the track of maximum amplitudes on the periods of the fundamental tone, followed by analysis of fluctuations in the amplitude characteristics of the signal.
  • the parameters characterizing the physiological properties of the vocal folds are obtained by reverse filtering the voice signal, followed by analysis of the remainder signal.
  • the parameters of the noise level in a voice signal are determined at intervals of the fundamental tone. Parameters characterizing the nonlinearity of the voice signal are obtained by constructing the phase space of the voice signal.
  • parameters characterizing the effect of “Jitter” use the average absolute value of the effect of “Jitter” (Mean absolute Jitter), and / or the standard deviation of the fundamental frequency (Standard deviation of F0 contour), and / or voice frequency range (Phonatory frequency Range), and / or the Pitch perturbation Factor, and / or the relative value of the “Jitter” effect, expressed as% (Jitter (%)), and / or the Pitch Perturbation Quotient ), and / or a smoothed pitch perturbation factor (Smoothed Pitch Pertur bation Quotient), and / or Relative Average Perturbation.
  • parameters characterizing the effect of "Trembling additionally use a short-term change in the effect of "Trembling” (Short term Jitter Estimation).
  • parameters characterizing the noise level in the intervals of the fundamental tone use the parameter characterizing the level of turbulent noise in the period of the fundamental tone (Turbulent noise index (TNI)) and / or the parameter characterizing the degree of collapse of the vocal folds (Soft phonation index (SPI)), and / or an indicator of the noise level relative to the level of the voiced component (Voice turbulence index (VTI)), and / or the ratio of the harmonic component of the signal to the non-harmonic component (Harmonic to Noise Ratio (HNR)), and / or the ratio of the excitation energy to the noise energy (Glottal to Noise Excitation Ratio (G NER)).
  • TTI the parameter characterizing the degree of collapse of the vocal folds
  • SPI Soft phonation index
  • VTI voice turbulence index
  • HNR Harmonic to Noise Ratio
  • G NER Average to Noise Excitation Ratio
  • the parameters use the level of turbulent noise (Glottal to Noise Distribution Ratio), which is determined as follows: the signal in the format of pulse-code modulation, "Mono", with a sampling frequency of 16 kHz, which determines the track of the fundamental frequency, is input; after that, the input signal is back-filtered (calculation of the residual signal), then the cochlear spectrum of the residual signal is calculated, the spectral energy of the residual signal is weighed in the range from 1.5 kHz to 2 kHz using using average energy in this frequency range, then the spectral energy of the remainder signal is weighed in the fundamental frequency range using the average energy in the range from the minimum fundamental frequency to the maximum fundamental frequency, based on the results of the obtained values, the ratio of weighted energies is determined and the distribution of the energy ratio is obtained by plotting a histogram.
  • the level of turbulent noise Glottal to Noise Distribution Ratio
  • the Shannon entropy method and / or the Repier entropy method and / or the value of the first minimum of the information function (Value of First Minimum of Mutual Information Function), and / or the signal periodicity indicator (Recurrent period density entropy); and / or an indicator obtained as a result of signal analysis with the exception of the internal trend (Detrended Fluctuation Analysis), and / or an indicator obtained as a result of signal analysis by the Taken's Estimator method, and / or an indicator obtained as a result of empirical decomposition of the signal into levels ( Empirical Mode Decomposition Excitations Ratios).
  • FIG. 1 shows a variant of the hardware architecture of the inventive system, according to which, the analysis unit is located in the cloud infrastructure
  • FIG. 2 is a block diagram of a system implementation
  • FIG. 3 schematically shows the algorithm of the learning process of the system
  • FIG. 4 - 1 1 show the results of the steps for determining the level of turbulent noise of a voice source, in particular, in FIG. 4 is a fragment of a speech signal
  • FIG. 5 is a pitch track
  • FIG. 6 shows a signal - remainder, in FIG. 7 is a cochlear spectrum of a residual signal
  • FIG. 8 weighted energy of the residual signal in the range of 1.5-2.5 kHz, FIG.
  • FIG. 9 shows the weighted energy of the residual signal in the range from minimum to maximum frequency of the fundamental tone of the voice signal
  • FIG. 10 is the ratio of the weighted energies shown in FIG. 9 and FIG. 8, in FIG. 1 1 shows a spectrogram of the distribution of the noise level
  • FIG. 12 is a flowchart of an algorithm for calculating the turbulent noise level of a voice source
  • FIG. 13 schematically shows an algorithm for implementing the analysis unit of the proposed method
  • FIG. 14-15 are a diagram showing the results of assessing the likelihood of presence of diseases of the vocal folds on a user terminal device, in particular in FIG. 14 shows a variant of displaying information in the form of an increment of parameters relative to the previous test, FIG. 15 shows a variant of displaying information in the form of absolute parameter values
  • FIG. 16 is a schematic representation of the information output by the information display module before testing an individual; FIG. 17 - before starting recording a vowel, in FIG. 18 after recording all vowels.
  • Database - a set of independent materials presented in an objective form on a digital medium, systematized in this way, so that these materials can be found and processed using an electronic computer (computer).
  • PCM Pulse code modulation
  • Eng. Pulse Code Modulation PCM
  • PCM Pulse code modulation
  • a method for diagnosing a disease of the vocal folds can be implemented using the system shown in Fig.1-2.
  • the system contains the following modules: 1 — information display module; 2 - control module; 3 - module recording sound (voice signal of the individual); 4— network connection module; 5— computing module, which includes a processor device and all the necessary subsystems for the full functioning of blocks 1-4; 6 - an external interface to the "cloud" service, which includes a set of servers and virtual machines based on the x32, x64, ARM platform, supporting the following families of operating systems: Windows, Linux, MacOS, iOS, Android.
  • client application module 5 is executed on the platforms x32, x64, ARM, with support for the operating systems families Windows, Linux, MacOS, iOS, Android, then it is possible to fully install software that implements the algorithm of the proposed method (see figure 2), into this module, which eliminates the need for a cloud service and module 4.
  • Modules 1-5 can be implemented on the basis of any devices with these functions, including personal terminal devices of an individual, for example, a mobile phone, smartphone, personal computer, laptop, tablet computer, etc.
  • the system contains a client application 7 (see Fig. 2) that implements a graphical interface for user interaction; external interface 6, in the case of cloud architecture, a remote service can be used as an interface; analysis unit 8, consisting of three main elements: module 9, which is a database containing probability density functions for voices normal (characterized by the absence of any diseases of the vocal folds) and pathology (characterized by the presence of a functional or organic disorder of the vocal folds),
  • YU signal analysis module 10 which calculates the signal parameters for the subsequent classification of the signal
  • statistics module 11 which classifies the signal according to the parameters obtained from the analysis module 10 for the probability of norm / pathology. If the hardware of the client application meets the above requirements, then block 8 can be integrated into the client application.
  • the method is as follows.
  • a software module that implements the algorithm of the proposed method is downloaded to an individual’s personal device, for example, a telephone, or to a cloud service, for example, Microsoft Windows Azure.
  • the analysis unit 8 of the recorded signal that implements the analytical part of the proposed method can be embedded both in the terminal device of the individual and can be located in remote access, for example, on the server of the organization serving the terminal devices via the Internet.
  • the user launches the software module and records the voice signal for the purpose of its subsequent analysis by the hardware-software complex.
  • the terminal device must support the recording format of the voice signal, with a sampling frequency of 16 kHz.
  • vowel segments of speech are selected, for example, by the method described in (“Analysis and automatic segmentation of a speech signal”, A. Tsyplikhin, thesis of the candidate of technical sciences, 2006). Moreover, to assess the likelihood of the disease, it is necessary to accumulate the total duration of the segments of the order of 10 seconds for each type of vowel.
  • the length of the voice recording is about 5 seconds for each type of vowel.
  • the recorded signals are transmitted to the data analysis unit, where the received signal is analyzed according to the algorithm shown in FIG. 13.
  • the incoming audio signal is subjected to preliminary analysis, which determines the signal balance, frequency track the pitch and the track of the amplitudes of the signal over the periods of the pitch (the method of determination is presented in more detail below), on the basis of which the groups of parameters characterizing are calculated: the “Jitter” effect and / or the “Flicker” effect and / or the level of turbulent noise in the voice signal and / or physiological properties of the vocal folds and non-linearity of the voice signal.
  • the following is a description of the operation of the system using five groups of parameters.
  • the main component is calculated for each of the groups of parameters.
  • the calculation of the probability density distribution functions is carried out at the stage of training the system, which is a preliminary step before the replication of the system (hardware and software complex) according to the scheme shown in FIG. 3.
  • oS is the main component for the parameters characterizing the "Flicker” effect (S)
  • oJ is the main component for parameters characterizing the "Jitter” effect (J)
  • oN is the main component for the parameters characterizing the level of turbulent noise (N)
  • oG is the main component for the parameters characterizing the parameters of the voice source (G)
  • oP is the main component for the parameters characterizing the nonlinearity of the phonation process (P).
  • the posterior probability of the obtained five-dimensional vector belongs to the probability density distribution function for voices in normal and to the probability density distribution function for voices in pathology.
  • This measure can be calculated, for example, using the method described in the literature (“The Optimality of Naive Bayes”, H. Zhang, American Association for Artificial Intelligence, 2004); (Caruana, R .; Niculescu-Mizil, A. (2006). "An empirical comparison of supervised learning algorithms.” Proceedings of the 23rd international conference on Machine learning).
  • the likelihood of vocal cord pathology can be calculated, for example, using the logistic regression algorithm (Hosmer, David W .; Lemeshow, Stanley (2000). Applied Logistic Regression (2nd ed.). Wiley), having previously conducted joint training of the system and this algorithm.
  • logistic regression algorithm Hosmer, David W .; Lemeshow, Stanley (2000). Applied Logistic Regression (2nd ed.). Wiley
  • the total information containing the data obtained after analyzing the voice signal is displayed on the terminal device of the user.
  • Data can be presented in the form of indicators showing the functional state of the vocal folds and voice quality.
  • the absolute value of the parameter (see FIG. 16) and its increment compared to the previous value (see FIG. 15) are alternately displayed.
  • the following parameters are used as output parameters: the main component of the group of parameters describing the “Flicker” effect, which is displayed as a parameter called “respiration stability”; the main component of the group of parameters describing the “Jitter” effect, which is displayed with the name “Voice jitter,” the main component of the group of parameters that describe the level of turbulent noise, which is displayed with the name “Voice hoarseness”, the main component of the group of parameters that describe the non-linearity of the oscillation process, which display with the name "Harmony of voice”, the probability of the presence of pathology of the vocal folds of the individual, which is displayed with the name "Probability of the presence of pathology.”
  • the purpose of the system learning process is to identify the main components for each of the groups of parameters and to obtain the values of the probability density functions for voices in normal and pathological conditions. Training systems are produced on the existing database of votes in norm and pathology. In this case, the database must satisfy the following conditions:
  • the database should include an individual’s voice recording, presented in the form of a PCM sound wave, “Mono”, with a sampling frequency of not less than 16 kHz; Also, the database may contain records in a format that can be converted to the desired format without losing data, for example: * .wav, * .nsp, etc.
  • the database must contain data on what category the recording of an individual’s voice belongs to: norm / pathology; data on the gender of the individual: husband / wife; sampling rate at which voice recording is made; data on what sound the voice recording refers to: / a: /, / o: /, / i: /, / and: /.
  • the invention can be used as commercially available databases, for example, The Disordered Voice Database of Massachusetts Eye and Ear Infirmary (MEEI) Voice and Speech Lab
  • a preliminary analysis is performed (a detailed description of which is presented below), after which the parameters related to one or another group characterizing: the effect of “Shake” and / or the “Flicker” effect, and / or the level of turbulent noise in the voice signal, and / or the physiological properties of the vocal folds and non-linearity of the voice signal.
  • the following is a description of the learning process using five groups of parameters.
  • a reliable result of assessing the probability of risk of vocal cord disease can be obtained by using a smaller number of groups of parameters (from two to five) characterizing the above effects (which are used both in training the system and in the process of implementing the method).
  • oS is the main component for the parameters characterizing the “Flicker” effect (8)
  • oJ is the main component for the parameters characterizing the “Flicker” effect ( t )
  • oN is the main component for parameters characterizing the level of turbulent noise (N)
  • oG is the main component for parameters characterizing the parameters of the voice source (G)
  • oP is the main component for parameters characterizing the nonlinearity of the phonation process (P).
  • the determination of the frequency of the fundamental tone can be implemented by known methods: L. R. Rabiner, M. J. Cheng, A. E. Rosenberg and C. A. McGonegal, “A comparative perfomance study of several pitch detection algorithms,” IEEE Trans. Audio Electroacoust., Pp. 399-417, 1976 .; D. Gerhard, “Pitch extraction and fundamental frequency: history and current techniques,” University of Regina, Saskatchewan, Canada, 2003 .; A. De Cheveigne, “International Conference on Acoustics,” in Pitch perception models from origins to today, Kyoto, 2004 .; V. N. Sorokin and V. P. Trifonenkov, "Autocorrelational Analysis of Speech Signal," Vol. 3, N "42, 1996.
  • the invention can be implemented using the following algorithms:
  • ITU G.726 (“GS Recommendation G.726," [On the Internet]. Http://www.itu.int/rec/T- REC-G.726 / en.), YIN (A. De Cheveigne and H Kawahara, “ ⁇ , a fundamental frequency estimator for speech and music,” JASA, 1 1 1, pp. 1917-1930, 2002.), TWIN (A. I. Tsyplikhin, “Impulse analysis of a voice source,” Acoustic Journal, T. 53, pp. 119-133, 2007.).
  • the track of maximum signal amplitudes on the periods of the fundamental tone is preliminarily calculated.
  • One of the possible methods for isolating the track of maximum signal amplitudes is described in TWIN (A. I. Tsyplikhin, “Analysis of Impulses of a Voice Source,” Acoustic Journal, vol. 53, pp. 119-133, 2007.)
  • the remainder signal is calculated (Fig. 6) (“INITIAL CONDITIONS IN THE PROBLEM OF VOICE SOURCE IDENTIFICATION”, V.N. Sorokin, A.A. Tananykin, Information Processes, Volume 10, ⁇ , page 1 - 10) by reverse filtering the original signal (D. Wong, J. Markel, A. Gray, "Least Squares Glottal Inverse Filtering from the Acoustic Speech Waveform", IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP- 27, No. 4, pp. 350-355, 1979).
  • the main parameters of the “Jitter” effect are based on the pitch track and can be calculated using the formulas presented in Table 1.
  • Fj is the fundamental frequency with the i-th number obtained from the fundamental frequency track
  • F0_av is the average fundamental frequency over the entire fundamental frequency track.
  • a parameter can be used that determines the short-term change in the “Short term Jitter Estimation” effect, the calculation method of which is presented in Voice Pathology Detection Based on Short-Term Jitter Estimations in Running Speech ”, M. Vasilakis, Y. Stylianou, Folia Phoniatr Logop 509-T1.
  • the noise level determines the quality of the signal and voice in general, in particular, the presence of wheezing and hoarseness of the voice.
  • the noise level in the signal is a good indicator for determining the presence of pathology of the larynx.
  • the invention proposes to use one or more of the following parameters to characterize the noise level in the signal.
  • Parameter characterizing the level of turbulent noise in the period of the fundamental tone (Turbule where N is the pitch track, R (tn, Tn) is the normalized autocorrelation function.
  • the TNI parameter can be determined, for example, by the method presented in P. Mitev and S. Hadjitodorov, “A method for turbulent noise estimation in voiced signal,” Med Biol Eng Comput., Vol. 38, N ° 6, pp. 625-631, 2000; SOFTWARE INSTRUCTION MANUAL Multi-Dimensional Voice Program (MDVP) Model 5105, KayPentax.
  • the parameter characterizing the degree of collapse of the vocal folds is determined by the ratio of harmonic energy in the frequency range from 70Hz to 1600Hz to harmonic energy in the frequency range from 1600Hz to 4500Hz (S. An Xue, “Effects of aging on selected acoustic voice parameters: preliminary normative data and educational implications "2001;” SOFTWARE INSTRUCTION MANUAL Multi-Dimensional Voice Program (MDVP) Model 5105 ", KayPentax).
  • the noise level indicator relative to the level of the voiced component is a parameter characterized by the average ratio of the non-harmonic signal energy in the frequency range from 2800Hz to 5800Hz to harmonic energy in the frequency range from 70Hz to 4500Hz.
  • the level of harmonic energy must be calculated in the region with the minimum level of fluctuation of the harmonic frequency, signal amplitude and minimum energy of the subharmonic component of the signal.
  • the definition of this parameter can be implemented according to the methodology presented in the following information sources: S. An Xue, “Effects of aging on selected acoustic voice parameters: preliminary normative data and educational implications” 2001; V. D. Nicola, M. I. Fiorella, D. A.
  • the ratio of the harmonic component of the signal to the non-harmonic component is a parameter that characterizes relative noise level in the speech signal.
  • HNR Harmonic to Noise Ratio
  • the ratio of the excitation energy to the noise energy is a parameter that characterizes the quality of the speech signal.
  • the parameter is calculated as the maximum correlation coefficient between the Hilbert envelopes of the speech signal in different frequency ranges.
  • Voice source turbulent noise level this parameter characterizes the ratio of the voice source energy to the turbulent noise energy.
  • the parameter calculation algorithm is shown in FIG. 12.
  • At the input of the terminal module of the individual serves a voice signal with a sampling frequency of 16 kHz (Fig. 4). After that, a search is made for the frequency track of the fundamental tone of the input signal (Fig. 5) and the remainder signal is obtained by reverse filtering, Fig. 6.
  • the cochlear spectrum (FIG. 7) is calculated by the method described in the Up Efficient Implementation of the Patterson-Holdsworth Auditory Filter I3ank, M. Slaney, Apple Computer Technical Report # 35 Perception Group — Advanced Technology Group.
  • Signal periodicity index (Recurrent period density entropy) - a parameter that is calculated based on the phase space of the signal and characterizes the frequency of the signal.
  • the definition of this parameter can be implemented according to the methodology presented in NONLINEAR, BIOPHYSICALLY-INFORMED SPEECH PATHOLOGY DETECTION Max Little, Patrick McSharryab, Irene Moroza and Stephen Robertsb Mathematical Institute, Engineering Science, Oxford University, UK the exception of the internal trend (Detrended Fluctuation Analysis) can be determined by the methodology presented in NONLINEAR, BIOPHYSICALLY-INFORMED SPEECH PATHOLOGY DETECTION Max Little, Patrick McSharryab, Irene Moroza and Stephen Robertsb Mathematical Institute, Engineering Science, Oxford University, UK.
  • the value of the first minimum of the information function is a parameter characterizing the phase shift of the signal by 180 °.
  • the definition of this parameter can be implemented according to the methodology presented in information Function Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics »Patricia Henriquez, Jesus B. Alonso, Miguel A. Ferrer, Carlos M. Travieso, Juan I. Godino-Llorente , and Fernando Diaz-de-Maria.
  • the indicator obtained as a result of signal analysis by the Taken method (Taken's Estimator) characterizes the correlation dimension of the signal.
  • Shannon Entropy is a measure of the uncertainty or unpredictability of a signal.
  • the definition of this parameter can be implemented according to the methodology presented in the information Function Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics * Patricia Henriquez, Jesus B. Alonso, Miguel A. Ferrer, Carlos M. Travieso, Juan I. Godino-Llorente , and Fernando Diaz-de-Maria.
  • Renyi Entropies is a parameter that determines the quantitative diversity of signal uncertainty.
  • the definition of this parameter can be implemented according to the methodology presented in the Information Function Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics * Patricia Henriquez, Jesiis B. Alonso, Miguel A. Ferrer, Carlos M. Travieso, Juan I. Godino-Llorente , and Fernando Diaz-de-Maria.
  • the glottis quotient opening coefficient characterizes random changes in the period during which the glottis is open.
  • the definition of this parameter can be implemented according to the methodology presented in the Journal of the Royal Society Interface Electronic Supplementary Material, “Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity * Athanasios Tsanasa, Max A. Little, Patrick E. McSharry, Lorraine O. Ramige.
  • the Vocal Fold Excitation Ratios characterize the energy level of the impulses of the voice source in comparison with the level of turbulent noise.
  • the definition of this parameter can be implemented according to the methodology presented in the Journal of the Royal Society Interface Electronic Supplementary Material, “Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity "Athanasios Tsanasa, Max A. Little, Patrick E. McSharry, Lorraine O. Ramige.
  • Parameters of the single-mass model of vocal folds - parameters characterizing the mass and stiffness of the vocal folds. These parameters can be obtained by the method described in P. Gomez-Vilda, R. Fernandez-Baillo, V. Rodllar-Biarge, VN lluis, A. Alvarez-Marquina, LM Mazaira-Fernandez, R. Martinez-Olalla and JI Godino -Llorente, Glottal source biomedical signature for voice pathology detections Speech Communication, 2008.
  • the inventive system was made based on a cloud service Windows
  • the user starts the system by pressing the appropriate button on the smartphone.
  • the system displays information about the testing process and provides the user with the opportunity to start testing by clicking the "Start Testing” button (see Fig. 17).
  • the system displays the vowel sound / a: / as text and allows the user to start recording the displayed vowel sound (see Fig. 18).
  • the user pronounces this sound until the system unlocks the “Record” button and changes the information about the sound that the user needs to pronounce.
  • the user continues to record all the necessary sounds (/ o: /, / i: /, / and: /). Then the user clicks the “Next” button (see Fig. 18), thereby transferring all recorded files to the analysis module.
  • the system uses the option of analysis of continuous speech, then the system is launched in the background, which is activated, for example, at the time of an individual's phone call.
  • the terminal device of the individual records the voice signal, after which it is segmented with the allocation of sections of vowels. Upon reaching a total duration of 10 seconds for each vowel sound, the terminal device transmits data to the analysis module.
  • the analysis module receives all voice signals recorded by the user, which are either recorded on the terminal device in the desired format, or converted by the computing module of the terminal device to the format of a single-channel signal with a frequency of 16 kHz (see Fig. 4).
  • the analysis module performs a preliminary analysis of each received signal, during which it calculates the track of the frequency of the fundamental tone (see Fig. 5), the track of the maximum amplitudes of the signal on the periods of the fundamental tone and the signal balance (see Fig. 6).
  • Oscillation of the amplitude relative to the average calculated on three periods of the fundamental tone
  • Oscillation of the amplitude on the average calculated on five periods of the fundamental tone
  • Oscillation of the amplitude on the average calculated on eleven periods of the fundamental tone
  • Relative value of the effect " Flicker ”, expressed in% Parameter characterizing the degree of collapse of the vocal folds
  • Parameter characterizing the level of turbulent noise in the period of the fundamental tone The ratio of the harmonic component of the signal to the non-harmonic component,
  • the ratio of the excitation energy to the noise energy, Relative value of the “Jitter” effect expressed in%
  • Oscillation coefficient of the fundamental frequency calculated for three periods of the fundamental tones
  • the coefficient of oscillation of the frequency of the fundamental tone calculated for five periods of the fundamental tone
  • the coefficient of oscillation of the frequency of the fundamental tone calculated
  • the analysis module transmits the resulting data to the terminal device of the individual, where they are displayed to the individual in an understandable and easily interpreted form (see Fig. 14-15). Using the data provided by the system, the individual concludes that the condition of the vocal folds is close to normal and there is no need to visit a specialist doctor.
  • the inventive method and system for implementing the method allows monitoring the functional state of the vocal folds of an individual at any time convenient for him, without requiring the presence of a specialist doctor and makes it possible to undergo regular “screening” examinations in order to determine changes in the individual’s voice.
  • This approach allows you to save money and time of the individual, while increasing the likelihood of early detection of pathologies of the vocal folds or other diseases, a marker of which may be a change in the state of the vocal folds.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Telephonic Communication Services (AREA)

Abstract

L'invention se rapporte au domaine de la médecine et a pour but d'étudier l'état fonctionnel des cordes vocales. Le but de la présente invention est de créer un nouveau procédé et un nouvel ensemble matériel-logiciel pour estimer les paramètres physiologiques de sonorité des cordes vocales en traitant des paramètres de l'état fonctionnel des cordes vocales et en représentant les résultats sous forme visible et intuitivement compréhensible. L'ensemble matériel-logiciel pour déterminer les risques de développement de maladies chez un individu en fonction de sa voix comprend un dispositif terminal d'individu dans lequel sont disposés un module d'enregistrement du signal vocal de l'individu, un module de commande d'enregistrement du signal vocal permettant de choisir la fréquence de discrétisation et la longueur d'enregistrement du signal vocal, un module de calcul capable de convertir le signal vocal enregistré de signal analogique en signal numérique, et un module de représentation des informations sur un moniteur du dispositif terminal de l'individu reçues de l'unité d'analyse de signal vocal.
PCT/RU2013/000672 2013-04-29 2013-08-05 Procédé de détermination du risque de développement de maladies chez un individu en fonction de sa voix et ensemble matériel-logiciel de mise en oeuvre de ce procédé WO2014178749A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2013119828 2013-04-29
RU2013119828/08A RU2559689C2 (ru) 2013-04-29 2013-04-29 Способ определения риска развития заболеваний индивида по его голосу и аппаратно-программный комплекс для реализации способа

Publications (1)

Publication Number Publication Date
WO2014178749A1 true WO2014178749A1 (fr) 2014-11-06

Family

ID=51843751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2013/000672 WO2014178749A1 (fr) 2013-04-29 2013-08-05 Procédé de détermination du risque de développement de maladies chez un individu en fonction de sa voix et ensemble matériel-logiciel de mise en oeuvre de ce procédé

Country Status (2)

Country Link
RU (1) RU2559689C2 (fr)
WO (1) WO2014178749A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116473521A (zh) * 2023-06-21 2023-07-25 四川大学华西医院 疑似环杓关节脱位声音频谱识别方法及系统
EP4101370A4 (fr) * 2020-03-05 2024-03-06 The Catholic University Of Korea Industry-Academic Cooperation Foundation Appareil pour diagnostiquer une maladie provoquant des troubles de la voix et de la déglutition, et sa méthode de diagnostic

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2582050C1 (ru) * 2015-01-28 2016-04-20 Федеральное государственное бюджетное образовательное учреждение высшего профессионального образования "Пензенский государственный университет" (ФГБОУ ВПО "Пензенский государственный университет") Способ адаптивной обработки речевых сигналов в условиях нестабильной работы речевого аппарата

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008054162A1 (fr) * 2006-11-03 2008-05-08 Min Hwa Lee Procédé, appareil et système pour diagnostiquer l'état de santé d'utilisateurs de terminal mobile
US20080300867A1 (en) * 2007-06-03 2008-12-04 Yan Yuling System and method of analyzing voice via visual and acoustic data
US20120220899A1 (en) * 2011-02-28 2012-08-30 Samsung Electronics Co., Ltd. Apparatus and method of diagnosing health by using voice

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2313280C1 (ru) * 2006-05-16 2007-12-27 Государственное образовательное учреждение высшего профессионального образования "Курский государственный технический университет" Способ исследования функционального состояния голосовых складок

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008054162A1 (fr) * 2006-11-03 2008-05-08 Min Hwa Lee Procédé, appareil et système pour diagnostiquer l'état de santé d'utilisateurs de terminal mobile
US20080300867A1 (en) * 2007-06-03 2008-12-04 Yan Yuling System and method of analyzing voice via visual and acoustic data
US20120220899A1 (en) * 2011-02-28 2012-08-30 Samsung Electronics Co., Ltd. Apparatus and method of diagnosing health by using voice

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4101370A4 (fr) * 2020-03-05 2024-03-06 The Catholic University Of Korea Industry-Academic Cooperation Foundation Appareil pour diagnostiquer une maladie provoquant des troubles de la voix et de la déglutition, et sa méthode de diagnostic
CN116473521A (zh) * 2023-06-21 2023-07-25 四川大学华西医院 疑似环杓关节脱位声音频谱识别方法及系统
CN116473521B (zh) * 2023-06-21 2023-08-18 四川大学华西医院 疑似环杓关节脱位声音频谱识别方法及系统

Also Published As

Publication number Publication date
RU2013119828A (ru) 2014-11-10
RU2559689C2 (ru) 2015-08-10

Similar Documents

Publication Publication Date Title
Kadiri et al. Analysis and detection of pathological voice using glottal source features
Kreiman et al. Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation
Basilakos et al. A multivariate analytic approach to the differential diagnosis of apraxia of speech
Rusz et al. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease
Hlavnička et al. Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson’s disease and parkinsonism
AU2013274940B2 (en) Cepstral separation difference
Mittal et al. Analysis of production characteristics of laughter
Pah et al. Phonemes based detection of parkinson’s disease for telehealth applications
Lansford et al. Free-classification of perceptually similar speakers with dysarthria
Khan et al. Cepstral separation difference: A novel approach for speech impairment quantification in Parkinson's disease
Vojtech et al. Refining algorithmic estimation of relative fundamental frequency: Accounting for sample characteristics and fundamental frequency estimation method
Cordella et al. Classification-based screening of Parkinson’s disease patients through voice signal
Kopf et al. Pitch strength as an outcome measure for treatment of dysphonia
Reddy et al. Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech
Dubey et al. Detection and assessment of hypernasality in repaired cleft palate speech using vocal tract and residual features
Jalali-najafabadi et al. Acoustic analysis and digital signal processing for the assessment of voice quality
RU2559689C2 (ru) Способ определения риска развития заболеваний индивида по его голосу и аппаратно-программный комплекс для реализации способа
Mittapalle et al. Glottal flow characteristics in vowels produced by speakers with heart failure
Narendra et al. Automatic intelligibility assessment of dysarthric speech using glottal parameters
Ekström et al. PREQUEL: Supervised phonetic approaches to analyses of great ape quasi-vowels
Cordeiro et al. Spectral envelope first peak and periodic component in pathological voices: A spectral analysis
Selvakumari et al. A voice activity detector using SVM and Naïve Bayes classification algorithm
Schultz et al. A tutorial review on clinical acoustic markers in speech science
Dubey et al. Hypernasality Severity Detection Using Constant Q Cepstral Coefficients.
Le The use of spectral information in the development of novel techniques for speech-based cognitive load classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13883386

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13883386

Country of ref document: EP

Kind code of ref document: A1