CN111583914B - Big data voice classification method based on Hadoop platform - Google Patents

Big data voice classification method based on Hadoop platform Download PDF

Info

Publication number
CN111583914B
CN111583914B CN202010395559.4A CN202010395559A CN111583914B CN 111583914 B CN111583914 B CN 111583914B CN 202010395559 A CN202010395559 A CN 202010395559A CN 111583914 B CN111583914 B CN 111583914B
Authority
CN
China
Prior art keywords
voice
big data
data
classification
hadoop platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010395559.4A
Other languages
Chinese (zh)
Other versions
CN111583914A (en
Inventor
杜炜
马春
谷宗运
陈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Traditional Chinese Medicine AHUTCM
Original Assignee
Anhui University of Traditional Chinese Medicine AHUTCM
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Traditional Chinese Medicine AHUTCM filed Critical Anhui University of Traditional Chinese Medicine AHUTCM
Priority to CN202010395559.4A priority Critical patent/CN111583914B/en
Publication of CN111583914A publication Critical patent/CN111583914A/en
Application granted granted Critical
Publication of CN111583914B publication Critical patent/CN111583914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a big data voice classification method based on a Hadoop platform, which comprises the following steps: step 1, constructing a voice library; step 2, on the basis of the voice library, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain corresponding voice classification results; and 3, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.

Description

Big data voice classification method based on Hadoop platform
Technical Field
The invention relates to a big data voice classification method based on a Hadoop platform.
Background
(1) Current study of motor dysarthria:
motor dysarthria (dysarthria) refers to a group of speech disorders resulting from disturbances in the control of muscles due to damage to the central or peripheral nervous system. Motor dysarthria is often manifested as slowed, weakened, inaccurate and uncoordinated movement of speech-related muscle tissues, and may also affect respiration, resonance, control of throat vocalization, dysarthria and rhythm, and is often referred to as dysarthria clinically. Common causes of motor dysarthria include brain trauma, cerebral palsy, amyotrophic lateral sclerosis, multiple sclerosis, cerebral stroke, parkinson's disease, spinocerebellar ataxia, and the like. Dysarthria can be classified into flaccid, spastic, disorganized, hyperkinetic, and mixed types according to neuroanatomical and speech acoustics. Among the communication disorders associated with brain damage, dysarthria has an incidence rate of up to 54%. At present, the speech acoustics characteristics of dysarthria can be reflected from subjective and objective aspects through examination on aspects of voice, resonance, rhythm and the like in clinic, and the method is favorable for providing targeted treatment and comprehensively and scientifically clarifying the speech acoustics pathological mechanism of dysarthria.
The overall incidence of motor dysarthria has been reported in few domestic and foreign studies, and studies in 125 Parkinson's disease patients by Miller et al showed that 69.6% of patients had lower mean speech intelligibility than the normal control group, with 51.2% of patients having a standard deviation lower, indicating a higher incidence of dysarthria in Parkinson's patients. Bogousslavsky et al screened 1000 patients with primary stroke and found up to 46% of the patients with speech disorders, of which 12.4% were diagnosed with dysarthria. The Hartelius et al study also found 51% prevalence of dysarthria in multiple sclerosis patients. This indicates that the incidence of dysarthria is high. At present, there is no unified assessment method for dysarthria at home, the dysarthria of motility has no special assessment standard, the dysarthria assessment method or improvement method and the dysarthria examination table of the Chinese rehabilitation research center are mostly adopted, and the degree and type of dysarthria are examined, scored, recorded and evaluated by clinicians or doctors in rehabilitation departments.
(2) The current research situation of the domestic voice library is as follows:
with the development of information technology and computer science, speech technology makes it possible to interact between machine behaviors and human natural language, and both speech synthesis, speech recognition and speech recognition research are necessarily dependent on the construction of a rear-end excellent speech corpus. At present, foreign speech libraries are developed more maturely, the research of Chinese speech libraries has been rapidly advanced in the last decade, and the research and establishment of speech libraries have fallen to the ground in different languages and cultural contexts. However, the construction of speech libraries for dysarthria is still under investigation.
Domestic assessment research on the sound-forming voice function mainly focuses on subjective assessment, and only a few researchers distinguish the sound-forming concept from the voice concept. Huang Zhaying et al proposed "Chinese word list for testing the ability to compose sound", the word list contains 50 words, and the speech rehabilitation teacher can comprehensively evaluate the ability to compose sound of 21 initial consonants and 4 tones by evaluating the pronunciation-forming voice of 50 tested words, and meanwhile, the ability to compare the sound position of tested words is evaluated by 18 sound position comparisons and 37 minimum voice pairs. Chen Sanding et al evaluated the initial consonant, vowel and tone of Mandarin Chinese to 50 deaf children, revealed the development law of deaf children's structure sound pronunciation of speaking Mandarin Chinese, still further proposed the pronunciation rehabilitation education principle of early, sequential, fault-tolerant and consolidation. Zhang Jing doctor of the university of east China studied the main wrong trend of hearing-impaired children in the consonant constitution, analyzed the cause, and correspondingly proposed the consonant phoneme treatment framework of hearing-impaired children.
(3) The current research situation of big data in the medical field is as follows:
currently, it is more popular to define big data: data that exceeds the ability of a typical database software tool to capture, store, process and analyze. Big data is different from traditional data concepts such as super-large-scale data and mass data, and has four basic characteristics: large amount, diversity, aging and value. The State Council points out that a new mode of online and offline interaction by taking the Internet as a carrier is vigorously developed in the guidance opinions about actively promoting the action of the Internet plus, and the development of new services of medical health and the like of the Internet is accelerated. Kayyali B et al studied the impact of big data on the U.S. medical industry, indicating that the value of big data will be more and more significant to the medical industry over time. At present, big data in the medical field mainly come from pharmaceutical enterprises, clinical diagnosis data, patient medical data, health management and social network data. For example, drug development is a relatively intensive process, even for small and medium-sized enterprises, data on drug development is above TB; the data of hospitals are rapidly increased every day, 3000 images of a patient are imaged at one time in a dual-source CT examination, 1.5GB image data is generated approximately, a standard pathological examination image is approximately 5GB, and the data of the patient such as medical treatment and electronic medical record are added, so that the data are rapidly increased every day. Research methods based on massive big data analysis have led to thinking about scientific methodology. The research does not need to directly contact with a research object, and a new research discovery can be obtained by directly analyzing and mining mass data, so that a new scientific research mode is probably brought forward.
The establishment of the voice corpus is a complicated and complicated problem, and the later perfection of the voice corpus still needs to be improved, for example, the existing inter-word tonal modification rules are fully utilized, and the actual situations of tonal modification and soft sound are reflected as much as possible. For the deficiency of the corpus, the utilization rate of the existing corpus can be improved in the preprocessing link. For the above reasons, the voice library should be an open database so that it can be added and modified at any time to complete the database. Since the speech conditions are different, the establishment of a specific speech corpus can also encounter various difficulties, and the problems discussed herein are only an investigation on the establishment of a speech corpus, and hopefully, the data support can be provided for the research of speech, and the improvement of a speech corpus for better language development plays an important role.
In addition, the large data volume is undoubtedly a great advantage of the network big data analysis technology, but how to guarantee the quality of the mass data and how to implement the problems of cleaning, managing and analyzing the mass data also become a great technical difficulty of the research of the subject. The massive network big data has the characteristics of multi-source heterogeneity, interactivity, timeliness, burstiness, high noise and the like, so that the network big data has the characteristics of huge value, large noise and low value density. This poses a significant challenge to ensuring data quality in network big data analytics research.
Disclosure of Invention
The invention designs a big data voice classification method based on a Hadoop platform, which solves the technical problems that the large data volume is undoubtedly a big advantage of a network big data analysis technology, but the problems of how to ensure the quality of the big data and how to realize the cleaning, management and analysis of the big data and the like are also a big technical difficulty.
In order to solve the technical problems, the invention adopts the following scheme:
a big data voice classification method based on a Hadoop platform comprises the following steps:
step 1, constructing a voice library;
step 2, on the basis of the voice library, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain corresponding voice classification results;
and 3, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
Preferably, the step 2 comprises the following steps: (1) The Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) Initializing voice classified tasks, putting the tasks into a Task queue, and distributing the tasks to corresponding nodes, namely a Task Tracker, by a Job Tracker according to the processing capacity of different nodes;
(3) Each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;
(4) Taking the corresponding class of the voice as Key/Value, and storing the Key/Value into a local file disk;
(5) If the Key/Value of the voice classification intermediate result is the same, merging the intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) And the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.
Preferably, the step 1 of constructing the voice library includes the following steps: step 11, designing a pronunciation text; step 12, recording voice; step 13, marking the voice file; step 14, analyzing acoustic parameters of the voice file; and step 15, establishing a database management system.
Preferably, the designing of the pronunciation text in step 11 includes selecting the pronunciation text, and the selection principle of the corpus of pronunciation texts includes one or more of the following:
a. the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the voices of different patients can be reflected better and more conveniently;
b. the vocabulary in the corpus is based on the Chinese survey common table, so that the vocabulary can be conveniently compared with the Chinese mandarin;
c. sentences in the corpus are mainly obtained by carrying out dialogue with the patient according to a plurality of related topics, so that the method is more suitable for the real situation faced by speech recognition; "several related topics" include daily life topics or medical history topics, such as queries for time to first onset and medical history.
d. The sentences in the corpus are complete in content and semantics, so that the prosodic information of one sentence can be reflected as much as possible;
e. the three phones are not classified and selected, so that the problem of sparse training data can be effectively solved.
Preferably, the designing of the pronunciation text in step 11 further includes compiling the pronunciation text, and the compiling principle of the pronunciation text includes one or more of the following:
a. a single-word part: taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library;
b. vocabulary part: based on at least one four thousand word list, recording related words according to the original conclusion about related sound system, striving to fully reflect the voice characteristics thereof, including the characteristics of tone quality and super-sound quality, aiming at some very distinctive voice phenomena, example words can be added to reflect the characteristics thereof; "recording related words for conclusion of related sound system" refers to a general vocabulary summarized according to the characteristics of sound, combination law, rhythm and intonation used in the same language. The characteristic speech phenomenon refers to the situation that the dialect is easy to read wrongly, such as the situation that the flat tongue sound and the warped tongue sound are difficult to distinguish, and f and h are not divided.
c. Sentence material part: determining the number of the linguistic data according to the language mastering degree of different speakers, wherein the linguistic data are selected to be ensured to have a certain representativeness and have a range as wide as possible; "representative" as used herein refers to a general statement that characterizes dysarthric speech.
d. And a natural conversation part: the method is characterized in that the method is used for recording 20-40 minutes of voice materials of a speaker in the forms of answering questions and freely talking, relates to words different from common Chinese in daily spoken language and requires the speaker to speak in a dialect.
Preferably, the voice recording of step 12 includes determination of a speaker, and the speaker selects a speaker with clear mouth and teeth, moderate speech rate ("moderate speech rate" means moderate speech rate, controlled at 120-150 words/minute), proficient use of local language and willing to actively cooperate with a mother language speaker for investigation, and further ensures that the voice environment where the speaker is located is relatively stable and has cultural degree;
or/and, the voice recording further comprises voice acquisition through a voice acquisition device, and the voice acquisition adopts two modes: one is the reading with the prompt text, the prompt is the Chinese character material, the speaker converts it into the own native language and reads; the other is natural voice, and the speaker uses the prompt to tell the folk story, the folk life condition and humming of the local folk song.
Preferably, the acoustic parameter analysis on the speech file in step 14 includes speech annotation of the speech library, the basic speech annotation includes segmentation and alignment of initials and finals of each syllable, and annotation of initials and finals, which includes two parts:
the first part is character marking, chinese character + pinyin is character pronunciation transcription, and voice information is recorded by Chinese characters so as to be provided for an identification system and provide materials for the research of linguistics; the character label must mark basic character information and sublingual phenomenon, and the sublingual phenomenon in the basic label can be represented by a general sublingual symbol;
the second part is syllable label, the standard mandarin syllable label is adopted in the mandarin syllable label, and the syllable label is tone label; in the tone notation, 0 indicates a light tone, 1 indicates yin ping, 2 indicates yang ping, 3 indicates a rising tone, and 4 indicates a falling tone.
Preferably, the analysis of the acoustic parameters of the voice file in step 14 further includes extraction of acoustic parameters;
firstly, segmenting recorded voice and eliminating mute sections to ensure that analyzed objects are single words, phrases, sentences and conversations; then, judging the start and end sections of the voice signal in the voice waveform data, and labeling the voice; and finally, obtaining corresponding fundamental frequency and formant acoustic analysis parameter data according to an autocorrelation algorithm.
Preferably, the step 15 of establishing the database management system includes selecting a database, and selecting an sql database management system which is easy to implement;
or/and in the step 15, four materials need to be stored in the establishment of the database management system, wherein the four materials are pronunciation person attribute materials; secondly, pronunciation text materials are recorded and stored, wherein the pronunciation materials of the patient and corresponding pronunciations and text materials such as mandarin international phonetic symbols are recorded and stored; thirdly, actual voice data material is used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.
The big data voice classification method based on the Hadoop platform has the following beneficial effects:
(1) The big data voice classification method based on the Hadoop platform solves the technical problems of long time consumption, poor instantaneity and the like of big data voice classification. The method adopts a big data voice classification mechanism based on a Hadoop platform and utilizes the advantages of a cloud computing technology to obtain an ideal big data voice classification result, well overcomes the defects of the current voice classification mechanism, greatly shortens the voice classification time, has the classification speed suitable for the online requirement of big data voice classification, and has the integral effect of voice classification obviously superior to that of other current voice classification mechanisms.
(1) The invention aims to research the voice characteristics of the patient with the motor dysarthria caused by nervous system diseases, can realize the measurement covering large-scale groups and the collection of related information by relying on the advantages of an open network platform, realizes the establishment of voice libraries such as mandarin, dialects, healthy human voices and patient voices, and establishes a word library meeting the condition diagnosis of the patient with the motor dysarthria on the basis of the voice libraries.
(2) Under the condition that the voice library is continuously expanded, the invention finally establishes rich data resource centers according to information such as Putonghua, dialects, different medical histories and different disease conditions, provides a network autonomous diagnosis approach for patients with nervous system diseases, can also assist doctors in clinical diagnosis and treatment, and provides a rich and accurate data platform for quantification of the disease conditions of the nervous system diseases.
(3) On the basis of a voice library, based on a Hadoop platform, a Map function is adopted to subdivide a big data voice classification problem, and multi-node parallel and distributed sub-problems are used for carrying out voice classification solution to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
Drawings
FIG. 1: the speech annotation of "bao" in the embodiment of the present invention is exemplified.
FIG. 2: in the embodiment of the invention, the resonance peak data of the 'bao' voice is obtained.
FIG. 3: the embodiment of the invention discloses a basic framework of a Hadoop platform.
FIG. 4 shows a big data voice classification process based on a Hadoop platform.
Detailed Description
The invention is further described below with reference to fig. 1 to 4:
the voice library is composed of an unvoiced sound library, a voiced sound library, a tone library, a voice synthesis program and a Chinese-pinyin conversion program.
1. Establishing an unvoiced sound library:
according to the characteristics of unvoiced sound, the quality of synthesized speech is improved. The unvoiced sound library is established by adopting a direct sampling method. Namely, the unvoiced part in front of the voiced segment in each pinyin combination is sampled to form an unvoiced library. Since the unvoiced sounds actually occupy only a small part of 1 syllable, the unvoiced sound library consisting of unvoiced sounds extracted from 400 monotone syllables actually occupies a small storage space.
2. Establishing a voiced sound library:
voiced sounds are synthesized by a voiced synthesis program calling VTFR synthesis for voiced sounds. The voiced sound library is actually composed of VTFRs of various voiced sounds, VTFRs of various voiced sounds are sequentially extracted by adopting a VTFR extracting program, and the VTFRs of various voiced sounds and a voiced sound synthesizing program are stored in 1 data packet, so that the voiced sound library is formed. The actually extracted VTFR is only 1 curve, and the space occupied by the voiced sound library formed by the curve is very small.
The establishment of the voice corpus mainly comprises the following four main processes: designing a pronunciation text; recording voice; analyzing parameters of the voice file; establishing a database management system; data analysis of big data technology.
1. Designing a pronunciation text;
1.1 Selecting pronunciation texts:
how to select corpora is the key of corpus database construction. In order to ensure the order and effectiveness of the database building work and the quality of the corpus, a selection principle of the corpus is firstly researched and formulated before the corpus is built. The selection principle of the speech corpus comprises the following steps: 1. the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the dialect speech can be reflected better and more conveniently; 2. the vocabularies in the corpus are based on the common Chinese survey table, so that the vocabularies can be conveniently compared with the common Chinese speech; 3. the sentences in the corpus are mainly selected from the spoken language corpus | so the method is more suitable for the real situation faced by speech recognition; 4. sentences in the corpus are complete in content and semanteme, so that prosodic information of one sentence can be reflected as much as possible; 5. three phones are not classified and selected, so that the problem of training data sparseness can be effectively solved.
1.2 And (3) compiling pronunciation texts:
the formulation of pronunciation texts is one of the key links for establishing a voice database. When determining the pronunciation material, the selection principle of the pronunciation text comprises five parts: one is a single word portion. Taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library; the second is the vocabulary part. Based on a four thousand word list but not limited to the four thousand word list, the related words are recorded according to the original conclusion about related sound system, the voice characteristics including tone quality and super-sound quality characteristics are strived to be comprehensively reflected, and example words can be added to reflect the characteristics of some distinctive voice phenomena; thirdly, the sentence material part determines the number of the linguistic data according to the language mastering degree of different speakers, and the linguistic data are selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible; and fourthly, a natural conversation part, which is a subject of daily life, records voice materials of a speaker for about half an hour in a form of answering questions and freely talking, relates to words in daily spoken language which are different from the common Chinese language, and requires the speaker to speak in a dialect.
2. Recording voice;
2.1 Determination of speaker:
the selection principle of the speaker is to select a mother language speaker which has clear mouth and teeth, moderate speed, proficient use of local language and is willing to actively cooperate with investigation, ensure the language environment of the speaker to be stable and have a certain cultural degree.
2.2 Voice collection:
the speaking mode in the recording process directly determines the purpose of the voice library. Because of the particularity of collecting the corpus, according to different research purposes, two modes are adopted: one is reading aloud with prompt text, the prompt is the literal material of Chinese | the speaker converts it into his own native language and reads aloud; the other is natural voice, and the speaker can use the prompt to tell a folk story, a folk living condition, humming of local folk songs and the like.
3. Parameter analysis for voice files:
after the pronunciation text is recorded, the voice data needs to be analyzed to obtain different features of the voice signal, which is a key for designing the voice corpus and a necessary basis for the post-stage voice processing. The invention focuses on researching voice information, so that the basic attribute of the voice signal waveform needs to be labeled, and meanwhile, the related acoustic parameters are extracted.
3.1 And (3) information annotation of a voice library:
the speech annotation uses Praat software to perform hierarchical annotation with reference to the chinese sound segment annotation system SAMPA-C. The labels of the voice library comprise a text label and a sound-mixing note label, wherein the voice "bao" is taken as an example, and is shown in fig. 1.
The first part is character marking, chinese character + pinyin is character pronunciation transcription, and the phonetic information is recorded with Chinese character for identification system and linguistic research. The character label must mark basic character information and sublingual phenomena, and the sublingual phenomena in the basic label can be represented by a universal sublingual symbol.
The second part is syllable notation, the standard Mandarin syllable notation is adopted for the Mandarin syllable notation, and the syllable notation is tone notation. In the tone notation, 0 indicates a light tone, 1 indicates a yin-ping, 2 indicates a yang-ping, 3 indicates a rising tone, and 4 indicates a falling tone.
3.2 Extraction of acoustic parameters:
for the recorded voice signals, the acoustic parameters of each speech segment need to be extracted, and in actual operation, the recorded voice is firstly segmented and the mute segment is eliminated, so as to ensure that the analyzed objects are single words; then, judging the start and end sections of the voice signals in the voice waveform data, and marking the range of the vowels; finally, corresponding fundamental frequency and resonance peak data are obtained according to an autocorrelation algorithm, taking voice "bao" as an example, as shown in fig. 2.
4. Establishing a database management system:
4.1 Database selection
For the selection of the database, because a large amount of voice waveform data needs to be stored in the voice database, the database is characterized by large data volume, unfixed length, and lower requirements on aspects of transaction processing and recovery, safety, network support and the like. Therefore, we can choose a more easily implemented sql database management system.
4.2 Establishment of database management system
Establishing a database management system in a voice corpus needs to store four materials, namely, speaker attribute materials, such as age, sex, education condition, chinese mastering condition, mother language use condition and the like of a speaker; secondly, a pronunciation text material is recorded and stored, and the pronunciation of the pronunciation person and text materials such as dialect pronunciation and mandarin international phonetic symbol corresponding to the pronunciation person are recorded and stored; thirdly, actual voice data material is mainly used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.
5. Data analysis for big data technology
The big data is a data set with large scale which greatly exceeds the capability range of the traditional database software tools in the aspects of acquisition, storage, management and analysis, and has the four characteristics of large data scale, rapid data circulation, various data types and low value density. The strategic significance of big data technology is not to grasp huge data information, but to specialize the data containing significance. In other words, if large data is compared with an industry, the key to realizing profitability of the industry lies in improving the processing capacity of the data and realizing the value increment of the data through processing. In the word bank construction, the important value of the big data technology is that the aim of evaluating the quality of the voice elements in the word bank is achieved through the targeted analysis and research on data, so that the word bank construction is more complete.
The word stock is shared through the network platform, so that the testing of different crowds is facilitated, more data samples are obtained, the voice library is enriched, in the future, the word stock of the patient with the motor dysarthria with pertinence can be established according to different regions and dialects, and more enriched and reliable data samples are provided for the subsequent automatic identification of disease classification and classification.
As shown in fig. 3, a speech classification mechanism based on a Hadoop platform is proposed, which includes collecting a large number of images, constructing an image database, and extracting effective features of image classification; then, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.
As shown in fig. 4, the big data speech classification process based on the Hadoop platform includes the following specific steps:
(1) The Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) Initializing voice classified tasks, putting the tasks into a Task queue, and distributing the tasks to corresponding nodes, namely a Task Tracker, by a Job Tracker according to the processing capacity of different nodes;
(3) Each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;
(4) Taking the corresponding class of the voice as Key/Value, and storing the Key/Value into a local file disk;
(5) If the Key/Value of the voice classification intermediate result is the same, merging the voice classification intermediate result, transmitting the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) And the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.
The invention is described above with reference to the accompanying drawings, it is obvious that the implementation of the invention is not limited in the above manner, and it is within the scope of the invention to adopt various modifications of the inventive method concept and solution, or to apply the inventive concept and solution directly to other applications without modification.

Claims (9)

1. A big data voice classification method based on a Hadoop platform comprises the following steps:
step 1, constructing a voice library;
step 2, on the basis of the voice library, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain corresponding voice classification results;
and 3, finally, combining the voice classification results of the sub-problems by using the Reduce function so as to adapt to the online requirement of the voice classification of the big data.
2. The Hadoop platform-based big data speech classification method according to claim 1, characterized in that: the step 2 comprises the following steps:
(1) The Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;
(2) Initializing voice classified tasks, putting the tasks into a Task queue, and distributing the tasks to corresponding nodes, namely a Task Tracker by a Job Tracker according to the processing capacity of different nodes;
(3) Each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;
(4) Taking the corresponding category of the voice as Key/Value, and storing the Key/Value into a local file disk;
(5) If the Key/Value of the voice classification intermediate result is the same, merging the intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;
(6) And the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.
3. The Hadoop platform based big data voice classification method according to claim 2, characterized in that:
step 1, constructing a voice library comprises the following steps: step 11, designing a pronunciation text; step 12, recording voice; step 13, marking the voice file; step 14, analyzing acoustic parameters of the voice file; and step 15, establishing a database management system.
4. The Hadoop platform-based big data speech classification method according to claim 3, characterized in that:
the designing of the pronunciation text in the step 11 includes selecting the pronunciation text, and the selection principle of the corpus of the pronunciation text includes one or more of the following:
a. the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the voices of different patients can be reflected better and more conveniently;
b. the vocabularies in the corpus are based on the common Chinese survey table, so that the vocabularies can be conveniently compared with the common Chinese speech;
c. sentences in the corpus are mainly obtained by carrying out dialogue with the patient according to a plurality of related topics, so that the method is more suitable for the real situation faced by speech recognition;
d. sentences in the corpus are complete in content and semanteme, so that prosodic information of one sentence can be reflected as much as possible;
e. the three phones are not classified and selected, so that the problem of sparse training data can be effectively solved.
5. The Hadoop platform-based big data speech classification method according to claim 4, characterized in that:
the designing of the pronunciation text in step 11 further includes compiling the pronunciation text, and the compiling principle of the pronunciation text includes one or more of the following:
a. a single-word part: taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library;
b. vocabulary part: based on a four thousand word list, but not limited to the four thousand word list, the related words are recorded according to the original conclusion about the related sound system, the voice characteristics including the characteristics of tone quality and super-sound quality can be comprehensively reflected, and example words can be added to reflect the characteristics of the voice phenomenon with great particularity;
c. sentence material part: determining the number of the linguistic data according to the language mastering degree of different speakers, wherein the linguistic data is selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible;
d. and a natural conversation part: the method is characterized in that the daily life is a subject, the voice material of a speaker is recorded for 20-40 minutes in a mode of answering questions and freely talking, and the speaker is required to be spoken by dialects in daily spoken language which is different from common Chinese speech.
6. The Hadoop platform based big data voice classification method according to claim 5, characterized in that:
the voice recording in the step 12 comprises the determination of a speaker, and the selection principle of the speaker is to select a native speaker who has clear mouth and teeth, moderate speed, proficient use of local language and is willing to actively cooperate with investigation, and to ensure that the language environment of the speaker is stable and has cultural degree;
or/and, the voice recording further comprises voice acquisition through a voice acquisition device, and the voice acquisition adopts two modes: one is the reading with prompt text, the prompt is the text material of Chinese, the speaker converts it into own native language and reads aloud; the other is natural voice, and the speaker uses the prompt to tell the folk story, the folk life condition and humming of the local folk song.
7. The Hadoop platform-based big data speech classification method according to claim 6, characterized in that:
the acoustic parameter analysis on the voice file in step 14 includes voice labeling of a voice library, where the basic voice labeling includes segmentation and alignment of initials and finals of each syllable, and labeling of initials and finals, and includes two parts:
the first part is character marking, chinese character + pinyin is character pronunciation transcription, and voice information is recorded by Chinese characters so as to be provided for an identification system and provide materials for the research of linguistics; the character label must mark basic character information and sublingual phenomenon, and the sublingual phenomenon in the basic label can be represented by a general sublingual symbol;
the second part is syllable label, the standard mandarin syllable label is adopted in the mandarin syllable label, and the syllable label is tone label; in the tone notation, 0 indicates a light tone, 1 indicates yin ping, 2 indicates yang ping, 3 indicates a rising tone, and 4 indicates a falling tone.
8. The Hadoop platform-based big data speech classification method according to claim 7, characterized in that: the analysis of the acoustic parameters of the voice file in step 14 further comprises the extraction of the acoustic parameters;
firstly, segmenting recorded voice and eliminating mute sections to ensure that analyzed objects are single words, phrases, sentences and conversations; then, judging the start and end sections of the voice signals in the voice waveform data, and marking the voice; and finally, obtaining corresponding fundamental frequency and formant acoustic analysis parameter data according to an autocorrelation algorithm.
9. The Hadoop platform based big data voice classification method according to claim 8, characterized in that: step 15, the establishment of the database management system comprises the selection of a database, and an easily realized sql database management system is selected;
in the step 15, four materials need to be stored in the establishment of the database management system, wherein the four materials are pronunciation person attribute materials; secondly, pronunciation text materials are recorded and stored, wherein the pronunciation materials of the patient and corresponding pronunciations and text materials such as mandarin international phonetic symbols are recorded and stored; thirdly, actual voice data material is used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.
CN202010395559.4A 2020-05-12 2020-05-12 Big data voice classification method based on Hadoop platform Active CN111583914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010395559.4A CN111583914B (en) 2020-05-12 2020-05-12 Big data voice classification method based on Hadoop platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010395559.4A CN111583914B (en) 2020-05-12 2020-05-12 Big data voice classification method based on Hadoop platform

Publications (2)

Publication Number Publication Date
CN111583914A CN111583914A (en) 2020-08-25
CN111583914B true CN111583914B (en) 2023-03-28

Family

ID=72112583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010395559.4A Active CN111583914B (en) 2020-05-12 2020-05-12 Big data voice classification method based on Hadoop platform

Country Status (1)

Country Link
CN (1) CN111583914B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067520A (en) * 1995-12-29 2000-05-23 Lee And Li System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models
CN104538036A (en) * 2015-01-20 2015-04-22 浙江大学 Speaker recognition method based on semantic cell mixing model
CN105261246A (en) * 2015-12-02 2016-01-20 武汉慧人信息科技有限公司 Spoken English error correcting system based on big data mining technology
CN106128450A (en) * 2016-08-31 2016-11-16 西北师范大学 The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067520A (en) * 1995-12-29 2000-05-23 Lee And Li System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models
CN104538036A (en) * 2015-01-20 2015-04-22 浙江大学 Speaker recognition method based on semantic cell mixing model
CN105261246A (en) * 2015-12-02 2016-01-20 武汉慧人信息科技有限公司 Spoken English error correcting system based on big data mining technology
CN106128450A (en) * 2016-08-31 2016-11-16 西北师范大学 The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈小莹 ; 陈晨 ; 华侃 ; 于洪志 ; .语音语料库的设计研究.科技信息.2008,(36),全文. *

Also Published As

Publication number Publication date
CN111583914A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
Swain et al. Databases, features and classifiers for speech emotion recognition: a review
Hua Phonological development in specific contexts: Studies of Chinese-speaking children
Feraru et al. Cross-language acoustic emotion recognition: An overview and some tendencies
Pao et al. Mandarin emotional speech recognition based on SVM and NN
Jessen Speaker classification in forensic phonetics and acoustics
WO2020134647A1 (en) Early-stage ad speech auxiliary screening system aiming at mandarin chinese
French et al. Forensic speech science
Duchateau et al. Developing a reading tutor: Design and evaluation of dedicated speech recognition and synthesis modules
Kandali et al. Vocal emotion recognition in five native languages of Assam using new wavelet features
Myles et al. Using information technology to support empirical SLA research.
Ali et al. Development and analysis of speech emotion corpus using prosodic features for cross linguistics
Shattuck-Hufnagel Toward an (even) more comprehensive model of speech production planning
CN112599119B (en) Method for establishing and analyzing mobility dysarthria voice library in big data background
Zealouk et al. Voice pathology assessment based on automatic speech recognition using Amazigh digits
Green et al. Range in the Use and Realization of BIN in African American English
CN111583914B (en) Big data voice classification method based on Hadoop platform
da Silva Junior et al. Speech Rhythm of English as L2: an investigation of prosodic variables on the production of Brazilian Portuguese speakers
Alsulaiman Arabic fluency assessment: Procedures for assessing stuttering in arabic preschool children
Pilar Phonological Idiosyncrasy of Kawayan Dialect of Southern Negros, Philippines
Hasibuan et al. An In-Depth Analysis Of Syllable Formation And Variations In Linguistic Phonology
Wagner et al. Polish Rhythmic Database―New Resources for Speech Timing and Rhythm Analysis
Lai et al. Intonation and voice quality of Northern Appalachian English: a first look
Jiang How does modification affect the processing of formulaic language? Evidence from L1 and L2 speakers of Chinese
Shriberg et al. The relationship of filled-pause F0 to prosodic context
Jose et al. Phonological idiosyncrasies of the Southern Sorsogon dialect in

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant