CN112599119B - Method for establishing and analyzing mobility dysarthria voice library in big data background - Google Patents

Method for establishing and analyzing mobility dysarthria voice library in big data background Download PDF

Info

Publication number
CN112599119B
CN112599119B CN202011546906.5A CN202011546906A CN112599119B CN 112599119 B CN112599119 B CN 112599119B CN 202011546906 A CN202011546906 A CN 202011546906A CN 112599119 B CN112599119 B CN 112599119B
Authority
CN
China
Prior art keywords
voice
corpus
labeling
data
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011546906.5A
Other languages
Chinese (zh)
Other versions
CN112599119A (en
Inventor
马春
杜炜
金力
阚峻岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Traditional Chinese Medicine AHUTCM
Original Assignee
Anhui University of Traditional Chinese Medicine AHUTCM
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Traditional Chinese Medicine AHUTCM filed Critical Anhui University of Traditional Chinese Medicine AHUTCM
Publication of CN112599119A publication Critical patent/CN112599119A/en
Application granted granted Critical
Publication of CN112599119B publication Critical patent/CN112599119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a method for establishing and analyzing a mobility dysarthria voice library under a big data background, which comprises the following steps: designing pronunciation text; recording voice; parameter analysis of the voice file; establishing a database management system; data analysis of big data technology. The invention aims at researching the voice characteristics of patients with dyskinesia caused by nervous system diseases, can realize the measurement of covering a large scale group and the collection of related information by virtue of the advantages of an open network platform, realizes the establishment of voice libraries such as Mandarin, dialect, healthy people voice, patient voice and the like, and establishes word libraries meeting the condition diagnosis of patients with dyskinesia on the basis.

Description

Method for establishing and analyzing mobility dysarthria voice library in big data background
Technical Field
The invention relates to a method for establishing and analyzing a mobility dysarthria voice library under a big data background.
Background
(1) Current research situation of dyskinesia:
dyskinesia (dysarthria) refers to a group of speech disorders resulting from disturbances in the control of muscles due to damage to the central or peripheral nervous system. Motor dysarthria is often manifested as slow, weakened, imprecise, uncoordinated movements of the speech-related musculature, and may also affect respiration, resonance, control of laryngeal sounds, dysarthria and prosody, often referred to clinically as dysarthria. Common causes of dyskinesia include brain trauma, cerebral palsy, amyotrophic lateral sclerosis, multiple sclerosis, cerebral apoplexy, parkinson's disease, spinocerebellar ataxia, etc. Dysarthria can be classified into relaxation type, spasticity type, maladjustment type, motor weak type, motor strong type and mixed type according to the characteristics of neuroanatomy and speech acoustics. In communication disorders associated with brain damage, dysarthria is up to 54%. At present, the clinical practice can reflect the speech acoustic characteristics of dysarthria from subjective and objective aspects through examination on voice, resonance, rhythm and the like, and is favorable for providing targeted treatment and comprehensively and scientifically clarifying the speech acoustic pathological mechanism of dysarthria.
The research reports on the overall incidence rate of the dyskinesia are less at home and abroad, and the research on 125 cases of parkinsonian patients by Miller et al shows that the speech definition average value of 69.6% of patients is lower than that of a normal control group, wherein 51.2% of patients are lower by one standard deviation, which indicates that the incidence rate of dyskinesia is higher in parkinsonian patients. 1000 primary stroke patients were screened by Bogoussplavsky et al, and up to 46% of patients with speech impairment were found to be dysarthria patients with 12.4% of established diagnosis. Hartelius et al also found 51% of dysarthria incidences in multiple sclerosis patients. Thus, the incidence of dysarthria is high. At present, no unified assessment method exists in China, no special assessment standard exists in the mobility dysarthria, and most of dysarthria assessment methods or improvement methods and dysarthria check-up tables of Chinese rehabilitation research centers are adopted, so that clinicians or rehabilitation doctors can check, score, record and evaluate dysarthria degree and type.
(2) Domestic voice library research status quo:
with the development of information technology and computer science, the voice technology enables the interaction between machine behavior and human natural language, and both voice synthesis, voice recognition and voice recognition research are necessarily dependent on the construction of a back-end excellent voice corpus. The development of foreign voice libraries is mature, chinese voice library research is also fast-moving in recent decades, and the research and establishment of the voice library are landed in different languages and cultural contexts. But the construction of voice libraries against dyskinesia is still under investigation.
The national constituent voice function evaluation research is mainly focused on subjective evaluation, and only a few researchers distinguish constituent voice from voice concepts. Huang Zhaoming et al propose a "chinese sound construction ability test vocabulary" which contains 50 words, and a speech rehabilitation engineer can comprehensively evaluate the sound construction ability of 21 consonants and 4 tones of a test by evaluating the sound construction voices of the 50 words to be tested, and at the same time evaluate the phoneme comparison ability of the test by 18 phoneme comparison and 37 minimum voice pairs. Chen Sanding et al evaluate the initial consonants, vowels and tones of the Mandarin Chinese for 50 deaf infants, reveal the development rule of the sound construction voice of the deaf infants who speaks the Mandarin Chinese, and further provide the principles of early, sequential, fault-tolerant and consolidated speech rehabilitation education. The Zhang Jing doctor of the university of eastern China researches the main error trend of the hearing impaired children in the process of consonant sound construction, analyzes the cause and correspondingly proposes a consonant sound position treatment framework of the hearing impaired children.
(3) Big data is under the current situation of medical field study:
currently, it is popular for big data definitions to be: data exceeding the capabilities of typical database software tools can be retrieved, stored, processed, and analyzed. Big data is different from traditional data concepts such as very large scale data, mass data and the like, and has four basic characteristics: a large quantity, diversity, aging and value. Kayyali B et al studied the impact of big data in the united states medical industry, indicating that over time the value of big data to the medical industry will become more and more significant. Big data in the current medical field mainly come from pharmaceutical enterprises, clinical diagnosis data, patient hospitalization data, health management and social network data. For example, drug development is a relatively intensive process, and even for small and medium-sized enterprises, one item of drug development data is above TB; the data of hospitals are also very fast growing every day, double-source CT examination of a patient is imaged at 3000 times, 1.5GB of image data is probably generated, a standard pathological examination image has almost 5GB of image, and the data of patient hospitalization, electronic medical records and the like are rapidly growing every day. Research methods based on mass big data analysis have led to thinking about scientific methodologies. Research does not need to directly contact a research object, and new research findings can be obtained by directly analyzing and mining mass data, which probably promotes a new scientific research mode.
The establishment of the voice corpus is a complicated problem, and the problem that the later perfection of the voice corpus is still to be improved, for example, the existing inter-word pitch-changing rule is fully utilized, so that the actual situations of pitch changing and light sounds are embodied as much as possible. For the deficiency of the corpus, the utilization rate of the existing corpus can be improved in the preprocessing link. For the above reasons, the speech library should take the form of an open database so that it can be added and modified at any time to complete the database. Because the speech conditions are not the same, various difficulties are encountered in establishing a specific speech corpus, and the problems discussed herein are only one discussion for establishing the speech corpus, so that the method is expected to provide data support for the research of speech, and plays an important role in better developing language and perfecting the speech corpus.
In addition, the large data size is undoubtedly a great advantage of the network big data analysis technology, but how to ensure the quality of the massive data and how to realize the problems of cleaning, management, analysis and the like of the massive data also becomes a great technical difficulty of the subject research. The massive network big data has the characteristics of multi-source isomerism, interactivity, timeliness, burstiness, high noise and the like, so that the characteristics of big noise and low value density although the value of the network big data is huge are caused. This poses a significant challenge to ensuring data quality in network big data analysis studies.
Disclosure of Invention
The invention designs a method for establishing and analyzing a mobility dysarthria voice library under a big data background, which solves the technical problems that the data volume is a great advantage of a network big data analysis technology, but how to ensure the quality of massive data and how to realize the cleaning, management and analysis of the massive data is a great technical difficulty.
In order to solve the technical problems, the invention adopts the following scheme:
a method for establishing and analyzing a mobility dysarthria voice library under a big data background comprises the following steps: step 1, designing a pronunciation text;
step 2, recording voice;
step 3, labeling the voice file;
step 4, analyzing acoustic parameters of the voice file;
step 5, establishing a database management system;
and 6, data analysis of a big data technology.
Preferably, the data analysis of the big data technology in the step 6 is based on a voice classification mechanism of a Hadoop platform, and specifically includes the following sub-steps:
step 61, collecting a plurality of patient voice files, segmenting and labeling voice segments, constructing a voice database, analyzing extracted acoustic parameters, and obtaining effective characteristics of voice classification;
step 62, subdividing the big data voice classification problem by using a Map function based on the Hadoop platform, and carrying out voice classification solving on the sub-problem in a parallel and distributed manner by using multiple nodes to obtain a corresponding voice classification result;
and 63, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirements of large data voice classification.
Preferably, the designing of the pronunciation text in the step 1 includes selecting pronunciation text, and the selection principle of the corpus of pronunciation text includes one or more of the following:
a. the single word in the corpus is required to contain all the initial and final phenomena as much as possible, so that the sound system characteristics of the voices of different patients can be reflected better and more conveniently;
b. the vocabulary in the corpus is based on the Chinese investigation common table, so that the vocabulary can be conveniently compared with Mandarin Chinese;
c. sentences in the corpus are mainly obtained by dialogue with a patient according to a plurality of related topics, so that the sentences more conform to the real situation of speech recognition; "several related topics" include daily life topics or medical history topics, such as inquiry about the time of first onset and medical history.
d. Sentences in the corpus are complete in content and semantics, so that prosodic information of one sentence can be reflected as far as possible;
e. the triphones are not classified, so that the problem of sparse training data can be effectively solved.
Preferably, the design of the pronunciation text in the step 1 further includes the preparation of pronunciation text, and the preparation principle of pronunciation text includes one or more of the following:
a. single word part: taking common words of initials and finals listed in the investigation word list as the corpus used for the main recording of the voice library;
b. vocabulary part: based on a four-thousand word vocabulary, but not limited to, recording related words according to the original conclusion about related sound system, aiming at comprehensively reflecting the voice characteristics, including tone quality and supersonic characteristics, and adding example words to reflect the characteristics of some very characteristic voice phenomena; the "conclusion of related sound system records related words" refers to a summarized common vocabulary according to the characteristics of sounds, combination rules and rhythms and intonation used in the same language.
The characteristic speech phenomenon refers to the condition that the dialect is easy to read mispronounce, such as flat tongue, difficult to distinguish, and f and h are not differentiated.
c. Statement materials section: determining the corpus quantity according to the language mastering degree of different pronounciators, wherein the corpus is selected to have a certain representativeness while ensuring the scope of the corpus as wide as possible; "representative" as used herein refers to sentences that may exhibit motor dysarthria language characteristics, with prevalence.
d. Natural dialogue portion: the daily life is the question, the form of answering questions and free talking is adopted, the voice material of the speaker for 20-40 minutes is recorded, the words which are different from the common speaking in the daily spoken language are related, and the speaker is required to speak in the dialect.
Preferably, the voice recording in the step 2 includes determination of a speaker, and the selection principle of the speaker is to select a native speaker with clear mouth teeth, moderate speech speed (the term "moderate speech speed" means moderate speech speed, controlled between 120 and 150 words/min), proficiency in using a local language and willing to actively cooperate with investigation, and further ensure that the language environment where the speaker is located is relatively stable and the degree of culture is also required; or/and, the voice recording further comprises voice collection through a voice collector, and the voice collection adopts two modes: one is reading with prompt text, the prompt is a Chinese character material, and a speaker converts the Chinese character material into own native language and reads the native language; the other is natural speech, and the speaker uses cues to tell the folk stories, the state of national life, and humming of local folk songs.
Preferably, the acoustic parameter analysis of the voice file in step 4 includes voice labeling of a voice library, the basic voice labeling includes initial segmentation and alignment of syllables and labeling of initial and final tones, and the method includes two parts: the first part is character labeling, chinese characters and pinyins are word-sound transcription, voice information is recorded by Chinese characters so as to be provided for a recognition system to use, and materials can also be provided for linguistic study; the character label must mark basic character information and secondary linguistic phenomena, and the secondary linguistic phenomena in the basic label can be represented by general secondary linguistic symbols; the second part is syllable labeling, standard Mandarin syllable labeling is adopted for the Mandarin syllable labeling, and the syllable labeling is tone labeling; in the tone mark, 0 represents a light sound, 1 represents a shade level, 2 represents a sun level, 3 represents an up sound, and 4 represents a down sound.
Preferably, the analyzing the acoustic parameters of the voice file in the step 4 further includes extracting acoustic parameters; firstly, cutting and eliminating a mute section of recorded voice to ensure that an analyzed object is a single word, phrase, sentence and dialogue; then, judging the start and stop sections of the voice signal in the voice waveform data, and marking the voice; and finally, obtaining corresponding fundamental frequency and formant acoustic analysis parameter data according to an autocorrelation algorithm.
Preferably, the establishing of the database management system in step 5 includes selecting a database, and selecting an sql database management system which is easier to implement.
A big data voice classification flow method based on a Hadoop platform comprises the following steps: the method comprises the steps of constructing a voice library by using the method, subdividing big data voice classification problems by using a Map function based on a Hadoop platform on the basis of the voice library, and carrying out voice classification solving on the sub-problems in a multi-node parallel and distributed manner to obtain corresponding voice classification results; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirements of large data voice classification.
The method comprises the following specific steps:
(1) The Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) Initializing the Task of voice classification, putting the Task into a Task queue, and distributing the Task to corresponding nodes, namely Task Tracker, by Job Tracker according to the processing capacities of different nodes;
(3) Fitting the relation between the voice features to be classified and the voice feature library by using a support vector machine according to the distributed tasks by each Task Tracker to obtain the corresponding category of the voice;
(4) The corresponding category of the voice is used as Key/Value to be stored in a local file disk;
(5) If Key/Value of the voice classification intermediate result is the same, merging the voice classification intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) Job Tracker clears the task state, and the user obtains the result of voice classification from the distributed file processing system.
The method for establishing and analyzing the mobility dysarthria voice library under the big data background has the following beneficial effects:
(1) The invention aims at researching the voice characteristics of patients with dyskinesia caused by nervous system diseases, can realize the measurement of covering a large scale group and the collection of related information by virtue of the advantages of an open network platform, realizes the establishment of voice libraries such as Mandarin, dialect, healthy people voice, patient voice and the like, and establishes word libraries meeting the condition diagnosis of patients with dyskinesia on the basis.
(2) Under the continuous expansion of the voice library, the invention finally establishes rich data resource centers according to the information of mandarin, dialects, different medical histories, different illness states and the like, provides a network autonomous diagnosis way for patients with nervous system diseases, can assist doctors to carry out clinical diagnosis and treatment, and provides a rich and accurate data platform for quantification of the illness states of the nervous system diseases.
(3) On the basis of a voice library, the invention adopts Map functions to subdivide big data voice classification problems based on a Hadoop platform, and adopts multiple nodes to solve the voice classification problems in parallel and in a distributed manner to obtain corresponding voice classification results; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirements of large data voice classification.
Drawings
Fig. 1: examples of phonetic annotation of "bao" in embodiments of the present invention.
Fig. 2: formant data of the bao voice in the embodiment of the invention.
Fig. 3: in the embodiment of the invention, the basic framework of the Hadoop platform is provided.
FIG. 4 is a flow of big data voice classification based on Hadoop platform.
Detailed Description
The invention is further described with reference to fig. 1 to 4:
the voice library consists of an unvoiced sound library, a voiced sound library, a tone library, a voice synthesis program and a Chinese-pinyin conversion program.
1. Establishing an unvoiced sound library:
according to the characteristics of unvoiced sound, in order to improve the quality of synthesized voice. The unvoiced sound library is established by adopting a direct sampling method. The unvoiced sound part in front of the voiced sound section in various pinyin combinations is sampled to form an unvoiced sound library. Since the unvoiced sound in 1 syllable actually occupies only a small part, the unvoiced sound library composed of unvoiced sounds extracted from more than 400 unvoiced syllables actually occupies a small storage space.
2. Establishment of a voiced sound library:
the voiced sound is synthesized by the VTFR of the corresponding voiced sound called by the voiced sound synthesizing program. The voiced sound library is actually formed by various voiced sound VTFR, the VTFR extracting program is adopted to sequentially extract the VTFR of various voiced sounds, and the VTFR of various voiced sounds and the voiced sound synthesizing program are stored in 1 data packet, so that the voiced sound library is formed. The actual VTFR extracted is only 1 curve, and the space occupied by the voiced sound library thus constructed is very small.
The establishment of the voice corpus mainly comprises the following four main processes: designing pronunciation text; recording voice; parameter analysis of the voice file; establishing a database management system; data analysis of big data technology.
1. Designing pronunciation text;
1.1 selection of pronunciation text:
how to select the corpus is the key of corpus construction work. In order to ensure orderly and effective construction work and quality of the corpus, selection principles of the corpus are firstly researched and formulated before the corpus is constructed. The selection principle of the voice corpus comprises the following steps: 1. the single word in the corpus is required to contain all the initial and final phenomena as much as possible, so that the sound system characteristics of the dialect voice can be better and more conveniently reflected; 2. the vocabulary in the corpus is based on the Chinese investigation common table, so that the vocabulary can be conveniently compared with Mandarin Chinese; 3. sentences in the corpus are mainly selected from the spoken language corpus-! So the method is more in line with the real situation faced by voice recognition; 4. sentences in the corpus are complete in content and semantics, so that prosodic information of one sentence can be reflected as far as possible; 5. the triphones are not classified, so that the problem of sparse training data can be effectively solved.
1.2 preparation of pronunciation text:
the preparation of pronunciation text is one of the key links in establishing a speech database. When determining the pronunciation materials, the method comprises five parts according to pronunciation text selection principles: one is a single word portion. Taking common words of initials and finals listed in the investigation word list as the corpus used for the main recording of the voice library; and secondly, a vocabulary part. Based on a four-thousand word vocabulary, but not limited to, recording related words according to the original conclusion about related sound system, aiming at comprehensively reflecting the voice characteristics, including tone quality and supersonic characteristics, and adding example words to reflect the characteristics of some very characteristic voice phenomena; thirdly, the sentence material part determines the corpus quantity according to the language mastering degree of different pronounciators, and the corpus is selected to have a certain representativeness while ensuring the scope of the corpus as wide as possible; and fourthly, a natural dialogue part, namely taking the daily life as a question, adopting the forms of answering questions and free talking, recording the voice material of the speaker for about half an hour, and relating to the vocabulary different from the common speaking in the daily spoken language and requiring the speaker to speak in the dialect.
2. Recording voice;
2.1 determination of speaker:
the selection principle of the speaker is to select the native speaker with clear mouth and teeth, moderate speech speed, proficiency in using local language and willingness to actively cooperate with investigation, and also to ensure that the language environment of the speaker is relatively stable and to have a certain cultural degree.
2.2 voice acquisition:
the speaking mode in the recording process directly determines the purpose of the voice library. Due to the specificity of the collection corpus, two modes are adopted according to different research purposes: one is a read aloud with prompt text, the prompt is the literal material of chinese-! The speaker converts the voice into own native language and reads the native language; the other is natural voice, and the speaker can use the prompt to tell the folk stories, the national living condition, the humming of local folk songs, and the like.
3. Parameter analysis of voice files:
after the pronunciation text is recorded, the voice data needs to be analyzed and processed to obtain different characteristics of the voice signals, which is the key of the voice corpus design and is the necessary basis for the later voice processing. The invention focuses on researching voice information, so that the basic attribute of the voice signal waveform needs to be marked, and meanwhile, related acoustic parameters are extracted.
3.1 information labeling of voice library:
the voice labeling uses Praat software to make hierarchical labeling with reference to the Chinese sound segment labeling system SAMPA-C. The annotation of the voice library comprises a text annotation and a voice joint annotation, and the voice "bao" is taken as an example here, and is shown in fig. 1.
The first part is character labeling, chinese character+pinyin is character-sound transcription, and voice information is recorded by Chinese characters so as to be provided for a recognition system to use, and materials can also be provided for linguistic study. The text labels must designate basic text information and sub-linguistic phenomena, which can be represented by generic sub-linguistic symbols.
The second part is syllable labeling, standard Mandarin syllable labeling is adopted, and syllable labeling is tone labeling. In the tone mark, 0 represents a light sound, 1 represents a shade level, 2 represents a sun level, 3 represents an up sound, and 4 represents a down sound.
3.2 extraction of acoustic parameters:
for the recorded voice signals, acoustic parameters of each speech segment are also required to be extracted, and in actual operation, firstly, the recorded voice is subjected to segmentation and mute segment elimination so as to ensure that the analyzed objects are all single words; then, judging the beginning and ending sections of the voice signal in the voice waveform data, and marking the range of vowels; finally, corresponding fundamental frequency and formant data are obtained according to an autocorrelation algorithm, and voice 'bao' is taken as an example, as shown in fig. 2.
4. Establishment of a database management system:
4.1 selection of databases
For the selection of the database, a large amount of voice waveform data needs to be stored in the voice database, and the voice database is characterized by large data volume and unfixed length, and has lower requirements on aspects of transaction processing and recovery, security, network support and the like. Therefore, we can choose the sql database management system that is easier to implement.
4.2 creation of database management System
The database management system in the voice corpus is established to store four materials, namely, attribute materials of a speaker, such as age, gender, education condition of the speaker, mastering condition of Chinese, use condition of the speaker on a native language and the like; secondly, inputting and storing pronouncing text materials, namely inputting and storing pronouncing human hair phonemic materials and corresponding dialect pronouncing text materials such as Mandarin international phonetic symbols and the like; thirdly, an actual voice data material is mainly used for storing original parameters of recorded voice waveform patterns; and fourthly, storing acoustic analysis parameter data, namely, acoustic parameters extracted from the processed voice waveform.
5. Data analysis for big data technology
Big data is a data set with a large scale which greatly exceeds the capability range of the traditional database software tool in the aspects of acquisition, storage, management and analysis, and has four characteristics of massive data scale, rapid data circulation, various data types and low value density. The strategic significance of big data technology is not to grasp huge data information, but to specialize these meaningful data. In other words, if big data is compared to an industry, the key to realizing profitability of such industry is to improve the "processing ability" of the data, and to realize "value-added" of the data by "processing". In the construction of word stock, the important value of adopting the big data technology is that the aim of evaluating the quality of the voice elements in the word stock is realized through the pertinence analysis and research of the data, so that the construction of the word stock is more perfect.
The word stock is shared through the network platform, so that the test of different crowds is facilitated, more data samples are obtained, the voice stock is enriched, in the future, the word stock of the patient with the specific movement dysarthria can be built according to different regions and different dialects, and the richer and reliable data samples are provided for the subsequent automatic identification of disease classification and grading.
As shown in fig. 3, a voice classification mechanism based on a Hadoop platform is provided, a large number of images are collected, an image database is constructed, and effective characteristics of image classification are extracted; then, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and carrying out voice classification solving on the sub-problem in a multi-node parallel and distributed manner to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirements of large data voice classification.
As shown in fig. 4, the big data voice classification flow based on the Hadoop platform comprises the following specific steps:
(1) The Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) Initializing the Task of voice classification, putting the Task into a Task queue, and distributing the Task to corresponding nodes, namely Task Tracker, by Job Tracker according to the processing capacities of different nodes;
(3) Fitting the relation between the voice features to be classified and the voice feature library by using a support vector machine according to the distributed tasks by each Task Tracker to obtain the corresponding category of the voice;
(4) The corresponding category of the voice is used as Key/Value to be stored in a local file disk;
(5) If Key/Value of the voice classification intermediate result is the same, merging the voice classification intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) Job Tracker clears the task state, and the user obtains the result of voice classification from the distributed file processing system.
The invention has been described above by way of example with reference to the accompanying drawings, it is clear that the implementation of the invention is not limited to the above-described manner, but it is within the scope of the invention to apply the inventive concept and technical solution to other situations as long as various improvements made by the inventive concept and technical solution are adopted or without any improvement.

Claims (4)

1. A method for establishing and analyzing a mobility dysarthria voice library under a big data background comprises the following steps:
step 1, designing a pronunciation text;
step 2, recording voice;
step 3, labeling the voice file;
step 4, analyzing acoustic parameters of the voice file;
in the step 4, the acoustic parameter analysis of the voice file includes voice labeling of a voice library, the basic voice labeling includes initial and final segmentation and alignment of each syllable, and labeling of initial and final tones, including two parts:
the first part is character labeling, chinese characters and pinyins are word-sound transcription, voice information is recorded by Chinese characters so as to be provided for a recognition system to use, and materials can also be provided for linguistic study; the character label must mark basic character information and secondary linguistic phenomena, and the secondary linguistic phenomena in the basic label can be represented by general secondary linguistic symbols;
the second part is syllable labeling, standard Mandarin syllable labeling is adopted for the Mandarin syllable labeling, and the syllable labeling is tone labeling; in the tone mark, 0 represents light sound, 1 represents shade level, 2 represents sun level, 3 represents rising sound, and 4 represents falling sound;
the acoustic parameter analysis of the voice file in the step 4 further comprises extraction of acoustic parameters;
firstly, cutting and eliminating a mute section of recorded voice to ensure that an analyzed object is a single word, phrase, sentence and dialogue; then, judging the start and stop sections of the voice signal in the voice waveform data, and marking the voice; finally, corresponding fundamental frequency and formant acoustic analysis parameter data are obtained according to an autocorrelation algorithm;
step 5, establishing a database management system;
the establishment of the database management system in the step 5 comprises the selection of a database, and the selection of an sql database management system which is easy to realize is adopted; in the step 5, four materials are needed to be stored in the establishment of the database management system, namely, attribute materials of a speaker; secondly, inputting and storing the pronunciation text material of the patient and corresponding pronunciation thereof, the international phonetic symbols of Mandarin and other text materials; third, the actual voice data material is used for storing the original parameters of the recorded voice waveform graph; fourthly, acoustic analysis parameter data, namely, saving acoustic parameters extracted from the processed voice waveform;
step 6, data analysis of big data technology;
the data analysis of the big data technology in the step 6 is based on a voice classification mechanism of a Hadoop platform, and specifically comprises the following sub-steps:
step 61, collecting a plurality of patient voice files, segmenting and labeling voice segments, constructing a voice database, analyzing extracted acoustic parameters, and obtaining effective characteristics of voice classification;
step 62, subdividing the big data voice classification problem by using a Map function based on the Hadoop platform, and carrying out voice classification solving on the sub-problem in a parallel and distributed manner by using multiple nodes to obtain a corresponding voice classification result;
and 63, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirements of large data voice classification.
2. The method for establishing and analyzing the dyskinetic voice library under the big data background according to claim 1, wherein the method comprises the following steps of:
the design of the pronunciation text in the step 1 comprises the selection of the pronunciation text, and the selection principle of the corpus of pronunciation text comprises one or more of the following:
a. the single word in the corpus is required to contain all the initial and final phenomena as much as possible, so that the sound system characteristics of the voices of different patients can be reflected better and more conveniently;
b. the vocabulary in the corpus is based on the Chinese investigation common table, so that the vocabulary can be conveniently compared with Mandarin Chinese;
c. sentences in the corpus are mainly obtained by dialogue with a patient according to a plurality of related topics, so that the sentences more conform to the real situation of speech recognition;
d. sentences in the corpus are complete in content and semantics, so that prosodic information of one sentence can be reflected as far as possible;
e. the triphones are not classified, so that the problem of sparse training data can be effectively solved.
3. The method for establishing and analyzing the dyskinetic voice library in the big data background according to claim 2, wherein the method comprises the following steps:
the design of the pronunciation text in the step 1 further comprises the preparation of pronunciation text, and the preparation principle of pronunciation text comprises one or more of the following:
a. single word part: taking common words of initials and finals listed in the investigation word list as the corpus used for the main recording of the voice library;
b. vocabulary part: based on at least one four-thousand word vocabulary, recording related words according to the original conclusion about related sound system, aiming at comprehensively reflecting the voice characteristics, including tone quality and super-tone quality characteristics, and adding example words to reflect the characteristics of some very characteristic voice phenomena;
c. statement materials section: determining the corpus quantity according to the language mastering degree of different pronounciators, wherein the corpus is selected to have a certain representativeness while ensuring the scope of the corpus as wide as possible;
d. natural dialogue portion: the daily life is the question, the form of answering questions and free talking is adopted, the voice material of the speaker for 20-40 minutes is recorded, the words which are different from the common speaking in the daily spoken language are related, and the speaker is required to speak in the dialect.
4. The method for establishing and analyzing the mobility dysarthria voice library in the big data background according to claim 3, wherein the method comprises the following steps:
the voice recording in the step 2 comprises the determination of a speaker, and the selection principle of the speaker is that the speaker with clear mouth and teeth, moderate speed of speech, proficiency in using local language and willingness to actively cooperate with investigation is selected, the language environment of the speaker is ensured to be stable, and the speaker has cultural level;
or/and, the voice recording further comprises voice collection through a voice collector, and the voice collection adopts two modes: one is reading with prompt text, the prompt is a Chinese character material, and a speaker converts the Chinese character material into own native language and reads the native language; the other is natural speech, and the speaker uses cues to tell the folk stories, the state of national life, and humming of local folk songs.
CN202011546906.5A 2020-05-12 2020-12-24 Method for establishing and analyzing mobility dysarthria voice library in big data background Active CN112599119B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010395558X 2020-05-12
CN202010395558 2020-05-12

Publications (2)

Publication Number Publication Date
CN112599119A CN112599119A (en) 2021-04-02
CN112599119B true CN112599119B (en) 2023-12-15

Family

ID=75200795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011546906.5A Active CN112599119B (en) 2020-05-12 2020-12-24 Method for establishing and analyzing mobility dysarthria voice library in big data background

Country Status (1)

Country Link
CN (1) CN112599119B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450777A (en) * 2021-05-28 2021-09-28 华东师范大学 End-to-end sound barrier voice recognition method based on comparison learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067520A (en) * 1995-12-29 2000-05-23 Lee And Li System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models
CN102799684A (en) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 Video-audio file catalogue labeling, metadata storage indexing and searching method
CN103405217A (en) * 2013-07-08 2013-11-27 上海昭鸣投资管理有限责任公司 System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
CN105740397A (en) * 2016-01-28 2016-07-06 广州市讯飞樽鸿信息技术有限公司 Big data parallel operation-based voice mail business data analysis method
CN106128450A (en) * 2016-08-31 2016-11-16 西北师范大学 The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese
CN110111780A (en) * 2018-01-31 2019-08-09 阿里巴巴集团控股有限公司 Data processing method and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067520A (en) * 1995-12-29 2000-05-23 Lee And Li System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models
CN102799684A (en) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 Video-audio file catalogue labeling, metadata storage indexing and searching method
CN103405217A (en) * 2013-07-08 2013-11-27 上海昭鸣投资管理有限责任公司 System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
CN105740397A (en) * 2016-01-28 2016-07-06 广州市讯飞樽鸿信息技术有限公司 Big data parallel operation-based voice mail business data analysis method
CN106128450A (en) * 2016-08-31 2016-11-16 西北师范大学 The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese
CN110111780A (en) * 2018-01-31 2019-08-09 阿里巴巴集团控股有限公司 Data processing method and server

Also Published As

Publication number Publication date
CN112599119A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
Kent et al. Static measurements of vowel formant frequencies and bandwidths: A review
Hua Phonological development in specific contexts: Studies of Chinese-speaking children
Perry et al. Beyond single syllables: Large-scale modeling of reading aloud with the Connectionist Dual Process (CDP++) model
Tremblay et al. Effects of the native language on the learning of fundamental frequency in second-language speech segmentation
French et al. Forensic speech science
Chow et al. A musical approach to speech melody
WO2020134647A1 (en) Early-stage ad speech auxiliary screening system aiming at mandarin chinese
Kaminskaïa et al. Prosodic rhythm in Ontario French
Ali et al. Development and analysis of speech emotion corpus using prosodic features for cross linguistics
CN112599119B (en) Method for establishing and analyzing mobility dysarthria voice library in big data background
Duez Syllable structure, syllable duration and final lengthening in Parkinsonian French speech
Green et al. Range in the Use and Realization of BIN in African American English
Zhang et al. Adolescent depression detection model based on multimodal data of interview audio and text
Holt et al. F0 declination and reset in read speech of African American and White American women
Jepson et al. Prosodically conditioned consonant duration in Djambarrpuyŋu
Nance et al. Phonetic typology and articulatory constraints: The realization of secondary articulations in Scottish Gaelic rhotics
CN111583914B (en) Big data voice classification method based on Hadoop platform
Gnevsheva Acoustic analysis in the Accents of Non-Native English (ANNE) corpus
Lin et al. Classifying speech intelligibility levels of children in two continuous speech styles
Hasibuan et al. An In-Depth Analysis Of Syllable Formation And Variations In Linguistic Phonology
Pilar Phonological Idiosyncrasy of Kawayan Dialect of Southern Negros, Philippines
Alsulaiman Arabic fluency assessment: Procedures for assessing stuttering in arabic preschool children
Wagner et al. Polish Rhythmic Database―New Resources for Speech Timing and Rhythm Analysis
Hejná Exploration of Welsh English Pre-aspiration: How Wide-Spread is it?
Lai et al. Intonation and voice quality of Northern Appalachian English: a first look

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant