CN111583914A - Big data voice classification method based on Hadoop platform - Google Patents
Big data voice classification method based on Hadoop platform Download PDFInfo
- Publication number
- CN111583914A CN111583914A CN202010395559.4A CN202010395559A CN111583914A CN 111583914 A CN111583914 A CN 111583914A CN 202010395559 A CN202010395559 A CN 202010395559A CN 111583914 A CN111583914 A CN 111583914A
- Authority
- CN
- China
- Prior art keywords
- voice
- big data
- data
- classification
- hadoop platform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000000463 material Substances 0.000 claims description 34
- 238000012545 processing Methods 0.000 claims description 22
- 238000011160 research Methods 0.000 claims description 20
- 241001672694 Citrus reticulata Species 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 9
- 238000002372 labelling Methods 0.000 claims description 8
- 238000011835 investigation Methods 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000000630 rising effect Effects 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000013518 transcription Methods 0.000 claims description 3
- 230000035897 transcription Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 2
- 206010013887 Dysarthria Diseases 0.000 description 25
- 238000007726 management method Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000007405 data analysis Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000002354 daily effect Effects 0.000 description 4
- 208000012902 Nervous system disease Diseases 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 230000033764 rhythmic process Effects 0.000 description 3
- 206010011878 Deafness Diseases 0.000 description 2
- 208000032041 Hearing impaired Diseases 0.000 description 2
- 208000018737 Parkinson disease Diseases 0.000 description 2
- 208000006011 Stroke Diseases 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000003759 clinical diagnosis Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000009509 drug development Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 201000006417 multiple sclerosis Diseases 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 102000014461 Ataxins Human genes 0.000 description 1
- 108010078286 Ataxins Proteins 0.000 description 1
- 206010008025 Cerebellar ataxia Diseases 0.000 description 1
- 208000009415 Spinocerebellar Ataxias Diseases 0.000 description 1
- 208000030886 Traumatic Brain injury Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 206010002026 amyotrophic lateral sclerosis Diseases 0.000 description 1
- 201000004562 autosomal dominant cerebellar ataxia Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000006931 brain damage Effects 0.000 description 1
- 231100000874 brain damage Toxicity 0.000 description 1
- 208000029028 brain injury Diseases 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 206010008129 cerebral palsy Diseases 0.000 description 1
- 208000030251 communication disease Diseases 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000012517 data analytics Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000001660 hyperkinetic effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008140 language development Effects 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004899 motility Effects 0.000 description 1
- 210000001428 peripheral nervous system Anatomy 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001148 spastic effect Effects 0.000 description 1
- 208000027765 speech disease Diseases 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/65—Clustering; Classification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Public Health (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a big data voice classification method based on a Hadoop platform, which comprises the following steps: step 1, constructing a voice library; step 2, on the basis of the voice library, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain corresponding voice classification results; and 3, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
Description
Technical Field
The invention relates to a big data voice classification method based on a Hadoop platform.
Background
(1) Current study of motor dysarthria:
motor dysarthria (dysarthria) refers to a group of speech disorders resulting from disturbances in the control of muscles due to damage to the central or peripheral nervous system. Motor dysarthria is often manifested as slowed, weakened, inaccurate and uncoordinated movement of speech-related muscle tissues, and may also affect respiration, resonance, control of throat vocalization, dysarthria and rhythm, and is often referred to as dysarthria clinically. Common causes of motor dysarthria include brain trauma, cerebral palsy, amyotrophic lateral sclerosis, multiple sclerosis, stroke, Parkinson's disease, spinocerebellar ataxia, and the like. Dysarthria can be classified into flaccid, spastic, disorganized, hyperkinetic, and mixed types according to neuroanatomical and speech acoustics. Among the communication disorders associated with brain damage, dysarthria has an incidence rate of up to 54%. At present, the speech acoustics characteristics of dysarthria can be reflected from subjective and objective aspects through examination on aspects of voice, resonance, rhythm and the like in clinic, and the method is favorable for providing targeted treatment and comprehensively and scientifically clarifying the speech acoustics pathological mechanism of dysarthria.
The overall incidence of motor dysarthria has been reported in few domestic and foreign studies, and studies in 125 Parkinson's disease patients by Miller et al showed that 69.6% of patients had lower mean speech intelligibility than the normal control group, with 51.2% of patients having a standard deviation lower, indicating a higher incidence of dysarthria in Parkinson's patients. Bogousslavsky et al screened 1000 patients with primary stroke and found up to 46% of the patients with speech impairment, 12.4% of which were diagnosed with dysarthria. Hartelius et al also found 51% prevalence of dysarthria in patients with multiple sclerosis. This indicates that the incidence of dysarthria is high. At present, there is no unified assessment method for dysarthria at home, the dysarthria of motility has no special assessment standard, the dysarthria assessment method or improvement method and the dysarthria examination table of the Chinese rehabilitation research center are mostly adopted, and the degree and type of dysarthria are examined, scored, recorded and evaluated by clinicians or doctors in rehabilitation departments.
(2) The current research situation of the domestic voice library is as follows:
with the development of information technology and computer science, speech technology makes it possible to interact between machine behaviors and human natural language, and both speech synthesis, speech recognition and speech recognition research are necessarily dependent on the construction of a rear-end excellent speech corpus. At present, foreign speech libraries are developed more maturely, the research of Chinese speech libraries has been rapidly advanced in the last decade, and the research and establishment of speech libraries have fallen to the ground in different languages and cultural contexts. However, the construction of speech libraries for dysarthria is still under investigation.
The evaluation research of the sound-forming voice function in China mainly focuses on subjective evaluation, and only a few researchers distinguish the concept of sound formation and voice. Huang Zhaying et al proposed "Chinese word list for testing the ability to compose sound", the word list contains 50 words, and the speech rehabilitation teacher can comprehensively evaluate the ability to compose sound of 21 initial consonants and 4 tones by evaluating the pronunciation-forming voice of 50 tested words, and meanwhile, the ability to compare the sound position of tested words is evaluated by 18 sound position comparisons and 37 minimum voice pairs. Chen Sanding et al evaluated the initial consonant, vowel and tone of Mandarin Chinese to 50 deaf children, revealed the development law of deaf children's structure sound pronunciation of speaking Mandarin Chinese, still further proposed the pronunciation rehabilitation education principle of early, sequential, fault-tolerant and consolidation. Zhang Jing doctor of the university of east China studied the main wrong trend of hearing-impaired children in the consonant constitution, analyzed the cause, and correspondingly proposed the consonant phoneme treatment framework of hearing-impaired children.
(3) The current research situation of big data in the medical field is as follows:
currently, it is more popular to define big data: data that exceeds the capabilities of a typical database software tool to capture, store, process and analyze. Big data is different from traditional data concepts such as super-large-scale data and mass data, and has four basic characteristics: large amount, diversity, aging and value. The State Council points out that a new mode of online and offline interaction by taking the Internet as a carrier is vigorously developed by the State Council in the guidance opinions about actively promoting the action of the Internet +, and the development of new services of medical health and the like of the Internet is accelerated. Kayyali B et al studied the impact of big data on the U.S. medical industry, indicating that the value of big data will be more and more significant to the medical industry over time. At present, big data in the medical field mainly come from pharmaceutical enterprises, clinical diagnosis data, patient medical data, health management and social network data. For example, drug development is a relatively intensive process, even for small and medium-sized enterprises, data on drug development is above TB; the data of a hospital also increases very fast every day, 3000 images of a patient are imaged once in a dual-source CT examination, 1.5GB image data is generated approximately, a standard pathological examination image is about 5GB image, and the data of the patient such as medical treatment and electronic medical record are added, so that the data increase fast every day. Research methods based on massive big data analysis have led to thinking about scientific methodology. The research does not need to directly contact with a research object, and a new research discovery can be obtained by directly analyzing and mining mass data, so that a new scientific research mode is probably brought forward.
The establishment of the voice corpus is a complicated problem, and the problem that the later perfection of the voice corpus needs to be improved is solved, for example, the existing inter-word tone regulation rules are fully utilized, and the actual situations of tone variation and soft sound are reflected as much as possible. For the deficiency of the corpus, the utilization rate of the existing corpus can be improved in the preprocessing link. For the above reasons, the voice library should be an open database so that it can be added and modified at any time to complete the database. Because the speech conditions are different, the establishment of a specific speech corpus can also encounter various difficulties, and the problems discussed herein are only one kind of discussion for establishing a speech corpus, and hopefully, data support can be provided for speech research, and play an important role in better language development and improvement of the speech corpus.
In addition, the large data volume is undoubtedly a great advantage of the network big data analysis technology, but how to guarantee the quality of the mass data and how to implement the problems of cleaning, managing and analyzing the mass data also become a great technical difficulty of the research of the subject. The massive network big data has the characteristics of multi-source heterogeneity, interactivity, timeliness, burstiness, high noise and the like, so that the network big data has the characteristics of huge value, large noise and low value density. This poses a significant challenge to ensure data quality in network big data analytics research.
Disclosure of Invention
The invention designs a big data voice classification method based on a Hadoop platform, which solves the technical problems that the large data volume is undoubtedly a big advantage of a network big data analysis technology, but the problems of how to ensure the quality of mass data and how to realize the cleaning, management and analysis of the mass data and the like also become a big technical difficulty.
In order to solve the technical problems, the invention adopts the following scheme:
a big data voice classification method based on a Hadoop platform comprises the following steps:
and 3, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
Preferably, the step 2 comprises the following steps: (1) the Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) initializing voice classified tasks, putting the tasks into a Task queue, and distributing the tasks to corresponding nodes, namely a Task Tracker, by a Job Tracker according to the processing capacity of different nodes;
(3) each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;
(4) taking the corresponding class of the voice as Key/Value, and storing the Key/Value into a local file disk;
(5) if the Key/Value of the voice classification intermediate result is the same, merging the intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) and the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.
Preferably, the step 1 of constructing the voice library comprises the following steps: step 11, designing a pronunciation text; step 12, recording voice; step 13, marking the voice file; step 14, analyzing acoustic parameters of the voice file; and step 15, establishing a database management system.
Preferably, the designing of the pronunciation text in step 11 includes selecting the pronunciation text, and the selection principle of the corpus of pronunciation texts includes one or more of the following:
a. the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the voices of different patients can be reflected better and more conveniently;
b. the vocabularies in the corpus are based on the common Chinese survey table, so that the vocabularies can be conveniently compared with the common Chinese speech;
c. sentences in the corpus are mainly obtained by carrying out dialogue with the patient according to a plurality of related topics, so that the method is more suitable for the real situation faced by speech recognition; "several related topics" include daily life topics or medical history topics, such as queries for time to first onset and medical history.
d. Sentences in the corpus are complete in content and semanteme, so that prosodic information of one sentence can be reflected as much as possible;
e. the three phones are not classified and selected, so that the problem of sparse training data can be effectively solved.
Preferably, the designing of the pronunciation text in step 11 further includes compiling the pronunciation text, and the compiling principle of the pronunciation text includes one or more of the following:
a. a single-word part: taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library;
b. vocabulary part: based on at least one four thousand word list, recording related words according to the original conclusion about related sound system, striving to fully reflect the voice characteristics thereof, including the characteristics of tone quality and super-sound quality, aiming at some very distinctive voice phenomena, example words can be added to reflect the characteristics thereof; "recording related words for conclusion of related sound system" refers to a general vocabulary summarized according to the characteristics of sound, combination law, rhythm and intonation used in the same language. The characteristic speech phenomenon refers to the situation that the dialect is easy to read wrongly, such as the situation that the flat tongue sound and the warped tongue sound are difficult to distinguish, and f and h are not divided.
c. Sentence material part: determining the number of the linguistic data according to the language mastering degree of different speakers, wherein the linguistic data is selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible; "representative" as used herein refers to a general sentence that characterizes dysarthric speech.
d. And a natural conversation part: the method is characterized in that the method is used for recording 20-40 minutes of voice materials of a speaker in the forms of answering questions and freely talking, relates to words different from common Chinese in daily spoken language and requires the speaker to speak in a dialect.
Preferably, the voice recording of step 12 includes determination of speaker, and the selection principle of the speaker is to select a speaker with clear mouth and teeth, moderate speech rate ("moderate speech rate" means moderate speech rate, controlled at 150 words/minute) and a native speaker proficient in using local language and willing to actively cooperate with investigation, and to ensure that the language environment of the speaker is relatively stable and has cultural degree;
or/and, the voice recording further comprises voice acquisition through a voice acquisition device, and the voice acquisition adopts two modes: one is the reading with prompt text, the prompt is the text material of Chinese, the speaker converts it into own native language and reads aloud; the other is natural voice, and the speaker tells the folk story, the folk life condition and humming of local folk songs by using prompts.
Preferably, the acoustic parameter analysis on the speech file in step 14 includes speech annotation of the speech library, the basic speech annotation includes segmentation and alignment of initials and finals of each syllable, and annotation of initials and finals, which includes two parts:
the first part is character marking, Chinese character + pinyin is character pronunciation transcription, and the voice information is recorded by Chinese characters so as to be provided for an identification system and also provide materials for the research of linguistics; the character label must mark basic character information and sublingual phenomenon, and the sublingual phenomenon in the basic label can be represented by a general sublingual symbol;
the second part is syllable label, the standard mandarin syllable label is adopted in the mandarin syllable label, and the syllable label is tone label; in the tone notation, 0 indicates a light tone, 1 indicates a yin-ping, 2 indicates a yang-ping, 3 indicates a rising tone, and 4 indicates a falling tone.
Preferably, the analysis of the acoustic parameters of the voice file in step 14 further includes extraction of acoustic parameters;
firstly, segmenting recorded voice and eliminating mute sections to ensure that analyzed objects are single words, phrases, sentences and conversations; then, judging the start and end sections of the voice signal in the voice waveform data, and labeling the voice; and finally, obtaining corresponding fundamental frequency and formant acoustic analysis parameter data according to an autocorrelation algorithm.
Preferably, the step 15 of establishing the database management system includes selecting a database, and selecting an sql database management system which is easy to implement;
or/and in the step 15, four materials need to be stored in the establishment of the database management system, wherein the four materials are pronunciation person attribute materials; secondly, pronunciation text materials are recorded and stored, wherein the pronunciation materials of the patient and corresponding pronunciations and text materials such as mandarin international phonetic symbols are recorded and stored; thirdly, actual voice data material is used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.
The big data voice classification method based on the Hadoop platform has the following beneficial effects:
(1) the big data voice classification method based on the Hadoop platform solves the technical problems of long time consumption, poor instantaneity and the like of big data voice classification. The Hadoop platform-based big data voice classification mechanism is adopted, the advantages of the cloud computing technology are utilized, an ideal big data voice classification result is obtained as a target, the defects existing in the current voice classification mechanism are overcome well, the voice classification time is shortened greatly, the classification speed can adapt to the online requirement of big data voice classification, and the overall effect of voice classification is obviously superior to that of other current voice classification mechanisms.
(1) The invention aims to research the voice characteristics of patients with motor dysarthria caused by nervous system diseases, can realize measurement covering large-scale groups and collection of related information by relying on the advantages of an open network platform, realizes establishment of voice libraries such as mandarin, dialects, healthy human voices and patient voices, and establishes a word library meeting the condition diagnosis of the patients with motor dysarthria on the basis.
(2) Under the condition that the voice library is continuously expanded, a rich data resource center is finally established according to information such as Putonghua, dialect, different medical histories and different disease conditions, a network autonomous diagnosis way is provided for patients with nervous system diseases, doctors can be assisted in clinical diagnosis and treatment, and a rich and accurate data platform is provided for quantification of the disease conditions of the nervous system diseases.
(3) On the basis of a voice library, based on a Hadoop platform, a Map function is adopted to subdivide a big data voice classification problem, and a multi-node parallel and distributed sub-problem is used for carrying out voice classification solution to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
Drawings
FIG. 1: the speech annotation of "bao" in the embodiment of the present invention is exemplified.
FIG. 2: in the embodiment of the invention, the resonance peak data of the 'bao' voice is obtained.
FIG. 3: the basic framework of the Hadoop platform in the embodiment of the invention.
FIG. 4 shows a big data voice classification process based on a Hadoop platform.
Detailed Description
The invention is further illustrated below with reference to fig. 1 to 4:
the voice library is composed of an unvoiced sound library, a voiced sound library, a tone library, a voice synthesis program and a Chinese-pinyin conversion program.
1. Establishing an unvoiced sound library:
according to the characteristics of unvoiced sound, the quality of synthesized speech is improved. The unvoiced sound library is established by adopting a direct sampling method. That is, the unvoiced parts in front of the voiced speech segments in various pinyin combinations are sampled to form an unvoiced speech library. Since the unvoiced sounds in 1 syllable actually occupy only a small part, the unvoiced sound library constituted by unvoiced sounds extracted from 400 unvoiced syllables actually occupies a small storage space.
2. Establishing a voiced sound library:
voiced sounds are synthesized by a voiced synthesis program calling VTFR synthesis for voiced sounds. The voiced sound library is actually composed of VTFRs of various voiced sounds, VTFRs of various voiced sounds are sequentially extracted by adopting a VTFR extracting program, and the VTFRs of various voiced sounds and a voiced sound synthesizing program are stored in 1 data packet, so that the voiced sound library is formed. The actually extracted VTFR is only 1 curve, and the space occupied by the voiced sound library formed by the curve is very small.
The establishment of the voice corpus mainly comprises the following four main processes: designing a pronunciation text; recording voice; analyzing parameters of the voice file; establishing a database management system; data analysis of big data technology.
1. Designing a pronunciation text;
1.1 selection of pronunciation text:
how to select corpora is the key of corpus database construction. In order to ensure the order and effectiveness of the database building work and the quality of the corpus, a selection principle of the corpus is firstly researched and formulated before the corpus is built. The selection principle of the speech corpus comprises the following steps: firstly, the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the dialect speech can be reflected better and more conveniently; secondly, the vocabularies in the corpus are based on the Chinese survey common table, so that the vocabularies can be conveniently compared with the Chinese mandarin; thirdly, sentences in the corpus are mainly selected from spoken language corpora | so that the method is more suitable for the real situation faced by speech recognition; the sentences in the corpus are complete in content and semanteme, so that the prosodic information of one sentence can be reflected as much as possible; and fifthly, selecting three phones without classification, so that the problem of sparse training data can be effectively solved.
1.2, preparation of pronunciation texts:
the formulation of pronunciation texts is one of the key links for establishing a voice database. When determining pronunciation materials, the selection principle of pronunciation texts comprises five parts: one is the single word portion. Taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library; the second is the vocabulary part. Based on a four thousand word list, but not limited to the four thousand word list, the related words are recorded according to the original conclusion about the related sound system, the voice characteristics including the characteristics of tone quality and super-sound quality can be comprehensively reflected, and example words can be added to reflect the characteristics of the voice phenomenon with great particularity; thirdly, the statement material part determines the number of the linguistic data according to the language mastering degree of different speakers, and the linguistic data is selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible; and fourthly, a natural conversation part, which is a subject of daily life, records voice materials of a speaker for about half an hour in a form of answering questions and freely talking, relates to words in daily spoken language which are different from the common Chinese language, and requires the speaker to speak in a dialect.
2. Recording voice;
2.1 determination of speaker:
the selection principle of the speaker is to select a native speaker who has clear mouth and teeth, moderate speech speed, proficient use of local language and willing to actively cooperate with investigation, and to ensure that the language environment of the speaker is stable and has a certain cultural degree.
2.2 voice collection:
the speaking mode in the recording process directly determines the purpose of the voice library. Because of the particularity of collecting the corpus, according to different research purposes, two modes are adopted: one is reading aloud with prompt text, the prompt is the literal material of Chinese | the speaker converts it into his own native language and reads aloud; the other is natural voice, and the speaker can tell the folk story, the national living condition, the humming of the local folk song and the like by using prompts.
3. Parameter analysis for the speech file:
after the pronunciation text is recorded, the voice data needs to be analyzed to obtain different features of the voice signal, which is a key for designing the voice corpus and a necessary basis for the post-stage voice processing. The invention focuses on researching voice information, so that the basic attribute of the voice signal waveform needs to be labeled, and meanwhile, the related acoustic parameters are extracted.
3.1 information annotation of the voice library:
the voice labeling uses Praat software and carries out hierarchical labeling by referring to a Chinese sound segment labeling system SAMPA-C. The labels of the voice library comprise a text label and a sound mixing label, wherein the voice "bao" is taken as an example, and is shown in fig. 1.
The first part is character marking, Chinese character + pinyin is character pronunciation transcription, and the phonetic information is recorded with Chinese character for identification system and linguistic research. The character label must mark basic character information and sublingual phenomena, and the sublingual phenomena in the basic label can be represented by a universal sublingual symbol.
The second part is a syllable label, the mandarin syllable label adopts a standard mandarin syllable label, and the syllable label is a tonal label. In the tone notation, 0 indicates a light tone, 1 indicates a yin-ping, 2 indicates a yang-ping, 3 indicates a rising tone, and 4 indicates a falling tone.
3.2 extraction of acoustic parameters:
for the recorded voice signals, the acoustic parameters of each speech segment need to be extracted, and in actual operation, the recorded voice is firstly segmented and the mute segments are eliminated so as to ensure that the analyzed objects are single words; then, judging the start and end sections of the voice signals in the voice waveform data, and marking the range of the vowels; finally, corresponding fundamental frequency and resonance peak data are obtained according to an autocorrelation algorithm, taking voice "bao" as an example, as shown in fig. 2.
4. Establishing a database management system:
4.1 database selection
For the selection of the database, because a large amount of voice waveform data needs to be stored in the voice database, the voice waveform data is characterized by large data volume, unfixed length, and lower requirements on aspects of transaction processing and recovery, safety, network support and the like. Therefore, we can choose a more easily implemented sql database management system.
4.2 creation of database management System
Establishing a database management system in a voice corpus needs to store four materials, namely, speaker attribute materials, such as age, gender, education condition, Chinese mastering condition, mother language use condition and the like of a speaker; secondly, a pronunciation text material is recorded and stored, and the pronunciation of the pronunciation person and text materials such as dialect pronunciation and mandarin international phonetic symbol corresponding to the pronunciation person are recorded and stored; thirdly, actual voice data material is mainly used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.
5. Data analysis for big data technology
The big data is a data set with large scale which greatly exceeds the capability range of the traditional database software tools in the aspects of acquisition, storage, management and analysis, and has the four characteristics of large data scale, rapid data circulation, various data types and low value density. The strategic significance of big data technology is not to grasp huge data information, but to specialize the data containing significance. In other words, if big data is compared to an industry, the key to realizing profitability in the industry is to improve the "processing ability" of the data and realize the "value-added" of the data through the "processing". In the word bank construction, the important value of the big data technology is that the aim of evaluating the quality of the voice elements in the word bank is achieved through the targeted analysis and research on the data, so that the word bank construction is more complete.
The word stock is shared through the network platform, so that tests of different crowds are facilitated, more data samples are obtained, the voice library is enriched, in the future, the more targeted exercise dysarthria patient word stock can be established according to different regions and different dialects, and more abundant and reliable data samples are provided for subsequent automatic identification of disease classification and classification.
As shown in fig. 3, a speech classification mechanism based on a Hadoop platform is proposed, which includes collecting a large number of images, constructing an image database, and extracting effective features of image classification; then, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
As shown in fig. 4, the big data speech classification process based on the Hadoop platform includes the following specific steps:
(1) the Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) initializing voice classified tasks, putting the tasks into a Task queue, and distributing the tasks to corresponding nodes, namely a Task Tracker, by a Job Tracker according to the processing capacity of different nodes;
(3) each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;
(4) taking the corresponding class of the voice as Key/Value, and storing the Key/Value into a local file disk;
(5) if the Key/Value of the voice classification intermediate result is the same, merging the intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) and the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.
The invention is described above with reference to the accompanying drawings, it is obvious that the implementation of the invention is not limited in the above manner, and it is within the scope of the invention to adopt various modifications of the inventive method concept and solution, or to apply the inventive concept and solution directly to other applications without modification.
Claims (9)
1. A big data voice classification method based on a Hadoop platform comprises the following steps:
step 1, constructing a voice library;
step 2, on the basis of the voice library, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain corresponding voice classification results;
and 3, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
2. The Hadoop platform-based big data speech classification method according to claim 1, characterized in that: the step 2 comprises the following steps:
(1) the Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) initializing voice classified tasks, putting the tasks into a task queue, and distributing the tasks to corresponding nodes, namely a TaskTracker, by a Job Tracker according to the processing capacity of different nodes;
(3) each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;
(4) taking the corresponding class of the voice as Key/Value, and storing the Key/Value into a local file disk;
(5) if the Key/Value of the voice classification intermediate result is the same, merging the intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) and the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.
3. The Hadoop platform-based big data speech classification method according to claim 1 or 2, characterized in that:
step 1, constructing a voice library comprises the following steps: step 11, designing a pronunciation text; step 12, recording voice; step 13, marking the voice file; step 14, analyzing acoustic parameters of the voice file; and step 15, establishing a database management system.
4. The Hadoop platform-based big data speech classification method according to claim 2 or 3, characterized in that:
the designing of the pronunciation text in the step 11 includes selecting the pronunciation text, and the selection principle of the corpus of the pronunciation text includes one or more of the following:
a. the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the voices of different patients can be reflected better and more conveniently;
b. the vocabularies in the corpus are based on the common Chinese survey table, so that the vocabularies can be conveniently compared with the common Chinese speech;
c. sentences in the corpus are mainly obtained by carrying out dialogue with the patient according to a plurality of related topics, so that the method is more suitable for the real situation faced by speech recognition;
d. sentences in the corpus are complete in content and semanteme, so that prosodic information of one sentence can be reflected as much as possible;
e. the three phones are not classified and selected, so that the problem of sparse training data can be effectively solved.
5. The Hadoop platform-based big data speech classification method according to claim 4, characterized in that:
the designing of the pronunciation text in the step 11 further includes compiling the pronunciation text, and the compiling principle of the pronunciation text includes one or more of the following:
a. a single-word part: taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library;
b. vocabulary part: based on a four thousand word list, but not limited to the four thousand word list, the related words are recorded according to the original conclusion about the related sound system, the voice characteristics including the characteristics of tone quality and super-sound quality can be comprehensively reflected, and example words can be added to reflect the characteristics of the voice phenomenon with great particularity;
c. sentence material part: determining the number of the linguistic data according to the language mastering degree of different speakers, wherein the linguistic data is selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible;
d. and a natural conversation part: the method is characterized in that the method is used for recording 20-40 minutes of voice materials of a speaker in the forms of answering questions and freely talking, relates to words different from common Chinese in daily spoken language and requires the speaker to speak in a dialect.
6. The Hadoop platform-based big data speech classification method according to claim 5, characterized in that:
the voice recording in the step 12 comprises the determination of a speaker, and the selection principle of the speaker is to select a native speaker who has clear mouth and teeth, moderate speed, proficient use of local language and is willing to actively cooperate with investigation, and to ensure that the language environment of the speaker is stable and has cultural degree;
or/and, the voice recording further comprises voice acquisition through a voice acquisition device, and the voice acquisition adopts two modes: one is the reading with prompt text, the prompt is the text material of Chinese, the speaker converts it into own native language and reads aloud; the other is natural voice, and the speaker tells the folk story, the folk life condition and humming of local folk songs by using prompts.
7. The Hadoop platform based big data speech classification method according to any one of claims 1-6, characterized in that:
the acoustic parameter analysis on the voice file in step 14 includes voice labeling of a voice library, where the basic voice labeling includes segmentation and alignment of initials and finals of each syllable, and labeling of initials and finals, and includes two parts:
the first part is character marking, Chinese character + pinyin is character pronunciation transcription, and the voice information is recorded by Chinese characters so as to be provided for an identification system and also provide materials for the research of linguistics; the character label must mark basic character information and sublingual phenomenon, and the sublingual phenomenon in the basic label can be represented by a general sublingual symbol;
the second part is syllable label, the standard mandarin syllable label is adopted in the mandarin syllable label, and the syllable label is tone label; in the tone notation, 0 indicates a light tone, 1 indicates a yin-ping, 2 indicates a yang-ping, 3 indicates a rising tone, and 4 indicates a falling tone.
8. The Hadoop platform-based big data speech classification method according to claim 5, characterized in that: the analysis of the acoustic parameters of the voice file in step 14 further comprises the extraction of the acoustic parameters;
firstly, segmenting recorded voice and eliminating mute sections to ensure that analyzed objects are single words, phrases, sentences and conversations; then, judging the start and end sections of the voice signal in the voice waveform data, and labeling the voice; and finally, obtaining corresponding fundamental frequency and formant acoustic analysis parameter data according to an autocorrelation algorithm.
9. The Hadoop platform based big data speech classification method according to any one of claims 1-8, characterized in that: the establishment of the database management system in the step 15 comprises the selection of a database, and an easily realized sql database management system is selected;
or/and in the step 15, four materials need to be stored in the establishment of the database management system, wherein the four materials are pronunciation person attribute materials; secondly, pronunciation text materials are recorded and stored, wherein the pronunciation materials of the patient and corresponding pronunciations and text materials such as mandarin international phonetic symbols are recorded and stored; thirdly, actual voice data material is used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010395559.4A CN111583914B (en) | 2020-05-12 | 2020-05-12 | Big data voice classification method based on Hadoop platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010395559.4A CN111583914B (en) | 2020-05-12 | 2020-05-12 | Big data voice classification method based on Hadoop platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111583914A true CN111583914A (en) | 2020-08-25 |
CN111583914B CN111583914B (en) | 2023-03-28 |
Family
ID=72112583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010395559.4A Active CN111583914B (en) | 2020-05-12 | 2020-05-12 | Big data voice classification method based on Hadoop platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111583914B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6067520A (en) * | 1995-12-29 | 2000-05-23 | Lee And Li | System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models |
CN104538036A (en) * | 2015-01-20 | 2015-04-22 | 浙江大学 | Speaker recognition method based on semantic cell mixing model |
CN105261246A (en) * | 2015-12-02 | 2016-01-20 | 武汉慧人信息科技有限公司 | Spoken English error correcting system based on big data mining technology |
CN106128450A (en) * | 2016-08-31 | 2016-11-16 | 西北师范大学 | The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese |
-
2020
- 2020-05-12 CN CN202010395559.4A patent/CN111583914B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6067520A (en) * | 1995-12-29 | 2000-05-23 | Lee And Li | System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models |
CN104538036A (en) * | 2015-01-20 | 2015-04-22 | 浙江大学 | Speaker recognition method based on semantic cell mixing model |
CN105261246A (en) * | 2015-12-02 | 2016-01-20 | 武汉慧人信息科技有限公司 | Spoken English error correcting system based on big data mining technology |
CN106128450A (en) * | 2016-08-31 | 2016-11-16 | 西北师范大学 | The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese |
Non-Patent Citations (1)
Title |
---|
陈小莹;陈晨;华侃;于洪志;: "语音语料库的设计研究" * |
Also Published As
Publication number | Publication date |
---|---|
CN111583914B (en) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bigi | SPPAS-multi-lingual approaches to the automatic annotation of speech | |
Hua | Phonological development in specific contexts: Studies of Chinese-speaking children | |
Feraru et al. | Cross-language acoustic emotion recognition: An overview and some tendencies | |
Van Heuven | Making sense of strange sounds:(Mutual) intelligibility of related language varieties. A review | |
WO2020134647A1 (en) | Early-stage ad speech auxiliary screening system aiming at mandarin chinese | |
McCrocklin et al. | Revisiting popular speech recognition software for ESL speech | |
Duchateau et al. | Developing a reading tutor: Design and evaluation of dedicated speech recognition and synthesis modules | |
Myles et al. | Using information technology to support empirical SLA research. | |
JP7110055B2 (en) | Speech synthesis system and speech synthesizer | |
Ambrazaitis | Nuclear intonation in Swedish | |
Ali et al. | Development and analysis of speech emotion corpus using prosodic features for cross linguistics | |
Liu et al. | AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning | |
Zealouk et al. | Voice pathology assessment based on automatic speech recognition using Amazigh digits | |
CN112599119B (en) | Method for establishing and analyzing mobility dysarthria voice library in big data background | |
CN111583914B (en) | Big data voice classification method based on Hadoop platform | |
Alsulaiman | Arabic fluency assessment: Procedures for assessing stuttering in arabic preschool children | |
Hasibuan et al. | An In-Depth Analysis Of Syllable Formation And Variations In Linguistic Phonology | |
Pilar | Phonological Idiosyncrasy of Kawayan Dialect of Southern Negros, Philippines | |
Li et al. | English sentence pronunciation evaluation using rhythm and intonation | |
Qin | On spoken English phoneme evaluation method based on sphinx-4 computer system | |
Ekpenyong et al. | Tone modelling in Ibibio speech synthesis | |
Lai et al. | Intonation and voice quality of Northern Appalachian English: a first look | |
Jiang | How does modification affect the processing of formulaic language? Evidence from L1 and L2 speakers of Chinese | |
Di Benedetto et al. | Lexical Access Model for Italian--Modeling human speech processing: identification of words in running speech toward lexical access based on the detection of landmarks and other acoustic cues to features | |
Shriberg et al. | The relationship of filled-pause F0 to prosodic context |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |