CN117219116B

CN117219116B - Modern Chinese language voice analysis method, system and storage medium

Info

Publication number: CN117219116B
Application number: CN202311342763.XA
Authority: CN
Inventors: 林春雨; 龚明袖
Original assignee: Shenzhen Municipal Yuan Software Co ltd; Guangdong Polytechnic Normal University
Current assignee: Shenzhen Municipal Yuan Software Co ltd; Guangdong Polytechnic Normal University
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2024-03-29
Anticipated expiration: 2043-10-17
Also published as: CN117219116A

Abstract

The invention discloses a modern Chinese language voice analysis method, a system and a storage medium, wherein the method comprises the following steps: s1, acquiring modern Chinese voice data, wherein the modern Chinese voice data comprises Mandarin voice data and Chinese dialect voice data; s2, preprocessing the acquired modern Chinese voice data, including removing noise, equalizing audio quality, extracting acoustic characteristics, and identifying and marking international phonetic symbols of the voice data; s3, analyzing the preprocessed voice data according to a voice disassembly rule to obtain initials, finals and tones of the voice data. The invention can accurately and clearly analyze the modern Chinese voice data, and is beneficial to the analysis and linguistic study of the modern Chinese voice.

Description

Modern Chinese language voice analysis method, system and storage medium

Technical Field

The invention relates to the field of voice analysis, in particular to a modern Chinese voice analysis method, a system and a storage medium.

Background

Modern chinese speech analysis plays a key role in speech recognition. By analyzing and understanding the characteristics, phonemes and voice change rules of Chinese voice, a more accurate and reliable Chinese voice recognition system can be developed. Simultaneously, chinese speech analysis is also an important component of linguistic research. Through analysis of Chinese speech, phonetic features such as the system, the rule of change of pitch, the system of initial consonants and vowels and the like of Chinese can be explored and researched, and the method is very valuable for learning the structure of Chinese speech, speech evolution and research on comparison with other languages. However, in the process of modern Chinese speech analysis, the situation of unclear phoneme segmentation often occurs, and no effective solution has been proposed at present for the above-mentioned problems.

Disclosure of Invention

The embodiment of the invention provides a modern Chinese voice analysis method, a system and a storage medium, which analyze the preprocessed voice data by constructing a voice disassembly rule so as to accurately obtain initials, finals and tones of the voice data and improve the modern Chinese voice analysis efficiency.

According to an aspect of an embodiment of the present invention, there is provided a modern chinese language speech analysis method, comprising the steps of:

s1, acquiring modern Chinese voice data;

s2, preprocessing the acquired modern Chinese voice data, including noise removal, audio quality equalization, acoustic feature extraction and international phonetic symbol recognition and marking;

s3, analyzing the preprocessed voice data according to a voice disassembly rule to obtain initials, finals and tones of the voice data, wherein the method specifically comprises the following steps:

s31, analyzing the preprocessed voice data P according to a voice disassembly rule to obtain a combination P1 of an intonation I and an initial consonant and a final sound;

I＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*).matcher(P).group("intonation")；

wherein I is tone, numerals 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and (9) are tone of different formats, shangyun is naming of combination of initials and finals, internationis naming of tone, P is voice data after pretreatment, and a group (internation) is input voice data and output tone;

s32, splitting the combination P1 of the initials and the finals according to a voice splitting rule to obtain an initial consonant C and a final sound V.

As an optional implementation manner, in step S32, the combination P1 of the initials and the finals is split according to the voice disassembly rule to obtain the initials C and the finals V, which specifically are:

s321, judging whether the P1 is a zero initial consonant, if so, respectively analyzing the combination P1 of the initial consonant and the final sound according to a first initial consonant disassembly rule and a first final sound disassembly rule to obtain an initial consonant C and a final sound V, and if not, executing the step S322;

wherein C is the initial consonant and null is the nulling soundNaming of mother, 0The symbols of the initials are zero respectively, vowels are the names of the finals, and P1 is the combination of the initials and the finals;

wherein V is vowel, null is the naming of zero initial consonant, 0The symbols of the initials are zero respectively, vowels are the names of the finals, and P1 is the combination of the initials and the finals;

s322, if the P1 is not zero, respectively analyzing the P1 according to a second vowel disassembly rule and a second vowel disassembly rule to obtain a pending initial C1 and a pending vowel V1;

c1 = (;

wherein, C1 is the initial consonant to be determined, confonant is the naming of the initial consonant, the international phonetic symbol consonant is all consonants in the international phonetic symbol consonant table, vowell is the naming of the final, and P1 is the combination of the initial and the final;

v1= (;

wherein V1 is the final to be defined, confonant is the naming of the initial consonant, the international phonetic symbol consonant is all consonants in the international phonetic symbol consonant table, vowell is the naming of the final, and P1 is the combination of the initial and the final;

s323, judging whether the obtained vowel V1 is an empty vowel, if V1 is not the empty vowel, confirming that the previously obtained vowel C1 is the required vowel C and the obtained vowel V1 is the required vowel V, ending the analysis, otherwise executing step S324;

s324, V1 is blank vowel, judging whether there is vowel or not, if so, analyzing the undetermined vowel C1 according to a third vowel disassembly rule and a third vowel disassembly rule to obtain an initial consonant C and a vowel V, and if not, executing step S325;

c= (;

wherein, C is an initial consonant, confonant is a name for the initial consonant, international phonetic symbols consonant is all consonants in the international phonetic symbol consonant table, vowell is a name for a final, vowels are all consonants in the vowels table, and C1 is a pending initial;

v= (;

wherein V is a vowel, confonant is a name for an initial consonant, international phonetic symbols consonant is all consonants in an international phonetic symbol consonant table, vowell is a name for a vowel, vowels are all consonants in an vowel table, and C1 is a to-be-determined initial consonant;

s325, without vowels, the initial consonant C is zero, and the final V is the value of the undetermined initial consonant C1.

As an alternative embodiment, the modern chinese speech data in step S1 includes mandarin chinese speech data and chinese dialect speech data.

As an alternative implementation manner, the identifying and marking of the international phonetic symbols on the voice data in the step S2 can be performed manually.

As an optional implementation manner, the voice disassembling rule in step S3 includes a tone disassembling rule, an initial disassembling rule, and a final disassembling rule.

As an optional implementation manner, the first initial consonant disassembly rule, the second initial consonant disassembly rule and the third initial consonant disassembly rule are included.

As an optional implementation manner, the vowel disassembly rules include a first vowel disassembly rule, a second vowel disassembly rule and a third vowel disassembly rule.

According to yet another aspect of an embodiment of the present invention, there is also provided a modern chinese language speech analysis system, including:

an acquisition unit configured to acquire voice data;

the preprocessing unit is used for preprocessing voice data, wherein the preprocessing comprises noise removal, audio quality equalization, acoustic feature extraction and international phonetic symbol identification and marking;

the creating unit is used for creating a voice dismantling rule comprising a tone dismantling rule, an initial dismantling rule and a final dismantling rule;

and the analysis unit is used for analyzing the preprocessed voice data according to the voice disassembly rule to obtain initials, finals and tones of the voice data.

According to a further aspect of embodiments of the present invention, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above-described data expansion method when run.

The invention has the beneficial effects that:

1. the present invention uses international phonetic symbols, which are a symbology for representing human phonetic elements, to identify and mark modern Chinese phonetic data. It is not limited by specific language, and can accurately represent various phonetic phonemes and their pronunciation modes. International phonetic symbols contain a wide range of symbols describing phonemes, consonants, vowels, accents, etc. of speech. Pinyin is used mainly to mark the pronunciation of Chinese characters, especially in Mandarin, while International phonetic symbols are a more general symbology for describing the pronunciation of various languages. Compared with pinyin, the international phonetic symbols are more accurate and specific, can represent more detailed pronunciation differences, and have irreplaceable functions for exploring and researching phonetic features such as the sound system, the pitch law, the initial consonant and vowel system and the like of Chinese, knowing the phonetic structure and the phonetic evolution of the Chinese and comparing research with other languages;

2. according to the basic dismantling principle that consonants are initial consonants, vowels and consonants behind the vowels are vowels and numbers are tones, the method constructs the voice dismantling rules comprising the tone dismantling rules, the initial dismantling rules and the vowel dismantling rules, carries out layer-by-layer decomposition analysis on the preprocessed voice data, judges whether the voice data are blank vowels or not through an international phonetic symbol consonant list, judges whether the vowels are vowels or not through an vowel list, finally obtains the required initial consonants, vowels and tones, and improves the modern Chinese voice analysis efficiency.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart of a modern Chinese language speech analysis method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of an alternative modern Chinese language speech analysis system according to an embodiment of the present invention;

fig. 3 is a flowchart of an alternative splitting of the combination of initials and finals P1 according to the speech splitting rule according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, the embodiment of the invention provides a modern Chinese language voice analysis method, which comprises the following steps:

s1, acquiring modern Chinese voice data;

modern chinese speech data includes mandarin chinese speech data and chinese dialect speech data. Mandarin is taken as a common language of modern Chinese and nations, plays an important role in communication and communication of the modern China, and is a branch of Chinese, and is a language used by people in local areas.

The invention obtains modern Chinese voice data, including Mandarin Chinese voice data and Chinese dialect voice data.

S2, preprocessing the acquired voice data;

the method comprises the steps of preprocessing acquired voice data, removing noise, balancing audio quality and extracting acoustic characteristics, and further comprises the steps of identifying and marking the voice data by international phonetic symbols, wherein the identification and marking of the international phonetic symbols can be achieved through manual marking.

Because the Chinese phonetic system has great difference with other languages (English, etc.), currently no commonly available software tool can directly realize the international phonetic symbol marking of Chinese phonetic, the international phonetic symbol marking of Chinese phonetic is finished manually at present, and meanwhile, the machine marking is also the research and development direction to be continued after the invention.

International phonetic symbols (International Phonetic Alphabet, abbreviation: IPA): the phonetic symbol is a phonetic system based on Latin letters, and is designed by the international phonetic society at the end of the 19 th century to be the standard expression form of spoken language pronunciation, including vowels, consonants and tones. International phonetic symbols follow the principle of "one-tone-one-symbol, one-symbol-one-tone". The present invention uses international phonetic symbols as one kind of symbol system for representing human phonetic elements to recognize and mark modern Chinese phonetic data. It is not limited by specific language, and can accurately represent various phonetic phonemes and their pronunciation modes. International phonetic symbols contain a wide range of symbols describing phonemes, consonants, vowels, accents, etc. of speech. Chinese phonetic alphabets are mainly used for labeling the pronunciation of Chinese characters, in particular the pronunciation in Mandarin, while international phonetic symbols are a more universal symbol system for describing the pronunciation of various languages. Compared with pinyin, the international phonetic symbols are more accurate and specific, can represent more detailed pronunciation differences, and have irreplaceable functions for exploring and researching phonetic features such as the sound system, the pitch law, the initial consonant and vowel system and the like of Chinese, knowing the phonetic structure and the phonetic evolution of the Chinese and comparing research with other languages.

Consonant: consonants (consonants) or called consonants are a type of speech, and in contrast to vowels, airflow in a tuning link of pronunciation is completely or partially blocked in a certain part of a tuning organ. The pronunciation of consonants can be divided into three phases: resistance formation-holding-removing. Different consonants are generated by different conditions of the three phases.

Vowels: vowels (vowels), also known as vowels, are one type of phoneme, as opposed to consonants (also known as consonants). A vowel is a sound that is emitted by an air flow through the mouth during sound production without obstruction. When vowels are generated, air flow passes through the glottis from the lungs to impact the vocal cords, so that the vocal cords uniformly vibrate, then the tremolo air flow passes through the oral cavity and the nasal cavity unhindered, and different sounds are generated through tongue and lip adjustment.

Tone: the full meaning of Tone (Tone) is a high-low-level note (Tone, or pitch) attached to syllables. The linguist Zhao Yuanren gets inspiration from the musical scale, designs a five degree scale notation, and indicates the lowest pitch with the number 1 and the highest pitch with the number 5. The five degree Zhao's notation is accepted by the International society of phonetic technology and becomes one of the international phonetic standard. This five degree marking method has been the most widely used tone mark to date. To determine the tonal value of a tone, it is necessary to record and analyze it electronically. The same syllable is given different single tones, so that different meanings can be expressed, such as mom (m ā), hemp (mA), horse (m A) and curse (m A) of modern standard Chinese are composed of the same consonant and vowel, but the meaning can be distinguished by tone. When the tone of Chinese syllable uses international phonetic symbol tone, the tone value digital representation method is preferably used.

Currently, tone notations are of various types, with lifting symbols (arrows), of digital type (including multi-digit number indicating tone values and single-digit number indicating tone numbers only), and some of the digital numbers are marked with superscripts, and are represented by (1) - (9). For the convenience of machine recognition, the artificial reference to the international phonetic symbols in step S2 of the present invention is represented by the characters "0-9" or "(1) -d" of the numeric type.

For example, mandarin has 4 tones, i.e. yin flat, yang flat, up (reading "look at") and down. Therefore, the tone number of 1/2/3/4 is used for marking, the tone value of 1 (tone value is a numerical index of specific tone level, and there is a difference from 1 digit to a plurality of digits) is 55, the tone value of 2 is 35, the tone value of 3 is 214, and the tone value of 4 is 51. The mandarin chinese is labeled with 4 tonal values such as:

a first sound: [ fan55] fan

Second sound:eggplant

Third sound:frying

Fourth sound: [ tan51] egg

Labeling with 4 tone marks such as:

a first sound: [ fan1] guava

Second sound:eggplant

Third sound:frying

Fourth sound: [ tan4] egg

For example, in the cantonese dialect, there are 9 tones, respectively: yin-level, yin-up, yin-down (also called upper yin-in), lower yin-in (also called middle yin-in), yang-level, yang-up, yang-down, yang-in. The 9 tones of the cantonese dialect, if noted by the key number, may be marked 1-9 digits, such as:

if noted with a key, it can be noted as:

therefore, tone marks 1-9 or (1) - (9) are useful, and tone values (varying from 1 digit to plural digits) are also useful.

S3, analyzing the preprocessed voice data according to a voice disassembly rule to obtain initials, finals and tones of the voice data, wherein the basic disassembly rule is that consonants are initials, consonants behind vowels and vowels are finals, and numbers are tones, specifically:

in the present invention, the preprocessed voice data P is international phonetic symbol data of a voice, for example, the international phonetic symbol data P of a mandarin chinese is "fan55", and the combination P1 of the tone I and the consonant+vowel is obtained by splitting P. And for mandarin chinese, for example: "do you get good? The sentence is divided according to three words, namely, firstly, you, P is "ni3" or "ni214", then, the initials, finals and tones are divided according to the voice resolution rule.

The voice dismantling rules comprise an initial dismantling rule, a final dismantling rule and an intonation dismantling rule, wherein the initial dismantling rule comprises a first initial dismantling rule, a second initial dismantling rule and a third initial dismantling rule, and the final dismantling rule comprises a first final dismantling rule, a second final dismantling rule and a third final dismantling rule; wherein:

the tone disassembly rule is:

I＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*).matcher(P).group("intonation")

wherein I is tone, numerals 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and (9) are tone of different formats, shangyun is naming of combination of initials and finals, internationis naming of tone, P is voice data after pretreatment, and a register (P) is input voice data, and group ("internation") is output tone.

After the tone I in the international phonetic symbol P of the voice is split according to the tone resolution rule, the combination P1 of the initial consonant and the final is split to obtain:

P1＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*).matcher(P).group("shengyun")

wherein, P1 is the combination of initials and finals, shangyun is the naming of the combination of initials and finals, the numbers 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and the modes are tones with different formats, the interaction is the naming of the tones, P is the voice data after pretreatment, the mat (P) is the input voice data, and the group ("shangyun") is the combination of the output initials and finals.

S32, splitting the combination P1 of the initials and the finals according to a voice splitting rule to obtain an initial consonant C and a final sound V, wherein the combination is shown in the accompanying figure 3, and specifically comprises the following steps:

firstly judging whether the sign of the zero initial consonant exists in the international phonetic symbol (for example, the marking convention of the zero initial consonant is 0, One of the three) is zero, if the zero initial is zero, the P1 is respectively analyzed according to the first initial disassembly rule and the first final disassembly rule to respectively obtain the initial C and the final V,

the first initial consonant disassembly rule is:

wherein C is the initial consonant, null is the naming of the zero initial consonant, 0,The symbols of the initials are zero respectively, vowels are the names of the finals, and P1 is the combination of the initials and the finals;

the first vowel dismantling rule is as follows:

if it is determined that P1 is zero, the combination P1 of the initials and finals is resolved according to the first initial consonant resolution rule and the first final resolution rule, so as to obtain the initials C and the finals V, and if not, the resolving is finished, step S322 is executed.

S322, in step S321, after judging that no zero initial symbol exists in the P1, normally splitting the initial and final, and respectively analyzing the P1 according to a second initial dismantling rule and a second final dismantling rule to obtain an undetermined initial C1 and undetermined final V1;

the second mother disassembly rule is as follows:

c1 = (;

the second vowel dismantling rule is as follows:

v1= (;

the international phonetic symbol consonant table of table 1 below, since only consonants will constitute initials, the remaining finals are after finding (dismantling) the initials according to table 1; if there is no remaining (the value of the vowel V1 to be determined is null), the vowel is null;

table 1: international phonetic symbol consonant table

S323, judging whether the vowel V1 obtained in the step S322 is an empty vowel, if V1 is not the empty vowel, confirming that the previously obtained vowel C1 and the previously obtained vowel V1 are the required vowel C and vowel V, ending the analysis, otherwise executing the step S324;

s324, judging whether an initial consonant or not if V1 is an empty vowel, if so, analyzing the undetermined initial consonant C1 according to a third initial consonant disassembly rule and a third vowel disassembly rule to obtain the initial consonant C and the vowel V, and ending the analysis, otherwise, executing the step S325;

if no vowel exists, then judging whether the vowel exists or not, and the vowel list is shown in the following table 2, so that the vowel consonant is found according to the table 2, and the undetermined vowel C1 is analyzed according to the third vowel disassembly rule and the third vowel disassembly rule, so that the required vowel C and vowel V are obtained.

Table 2: phonation rhyme table

The third initial consonant disassembly rule is:

c= (;

the third vowel dismantling rule is as follows:

v= (;

s325, if the vowel is not vowel, confirming that the initial consonant C is zero initial consonant, outputting 0,Vowels v=c1;

v=c1= (.

Wherein V is a final, C1 is a to-be-determined initial, confonant is a name for the initial, international phonetic symbols consonant is all consonants in an international phonetic symbol consonant table, vowell is a name for the final, and P1 is a combination of the initial and the final;

in the following, a "double" of mandarin chinese speech data (p=fan 55) is taken as an example of parsing:

P＝fan55

according to the tone dismantling rule in the voice dismantling rule, obtaining:

I＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*).matcher("fan55").group("intonation")＝55

P1＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*).matcher("fan55").group("shengyun")＝fan

splitting the combination P1 of initials and finals, judging that the initial symbol is 0 without zero,Or->Then, resolving the P1 according to the second vowel resolving rule and the second vowel resolving rule to obtain a pending initial consonant C1 and a pending vowel V1:

undetermined consonant c1= (

Vowel V1 = (

If the obtained undetermined vowel an is not an empty vowel, confirming that the undetermined vowel C1 and the undetermined vowel V1 obtained before are the required vowel C and vowel V:

the initial consonant C is f, the final sound V is an, the tone I is 55, the analysis is finished,

according to another aspect of the embodiment of the present invention, the present invention further provides a modern chinese language speech analysis system, as shown in fig. 2, which includes:

the obtaining unit 101 is configured to obtain modern chinese speech data, including mandarin chinese speech data and dialect speech data.

The preprocessing unit 102 is configured to perform preprocessing on the voice data, including noise removal, audio quality equalization, and acoustic feature extraction, and further includes performing international phonetic symbol recognition and labeling on the voice data, where the international phonetic symbol recognition and labeling may be performed manually.

A creating unit 103, configured to create a voice resolution rule including a tone resolution rule, an initial resolution rule, and a final resolution rule; the voice dismantling rules comprise a tone dismantling rule, an initial dismantling rule and a final dismantling rule, wherein the initial dismantling rule comprises a first initial dismantling rule, a second initial dismantling rule and a third initial dismantling rule, and the final dismantling rule comprises a first final dismantling rule, a second final dismantling rule and a third final dismantling rule.

And the parsing unit 104 is configured to parse the preprocessed voice data according to a voice disassembly rule, so as to obtain initials, finals and tones of the voice data.

As an alternative embodiment, the manner in which the obtaining unit 101 is configured to obtain the modern chinese speech data may specifically be:

the acquired modern Chinese voice data can be audio or video, and terminals such as a computer, a Pad, a mobile phone and the like can acquire voice or video; the unit can provide a user-friendly interface, and the investigation content can be customized according to the user requirements; the audio and video functions of the striping directory system are realized. The language researcher is enabled to conveniently enter voice data and related information and add descriptive metadata such as speaker information, language, dialect, etc. All acquired investigation contents are stored in the network terminal, so that the dependence on a personal computer is reduced, and the investigation can be logged in and inquired anywhere; the simple acquisition interface is clear in guidance, so that the interference to the speaker is reduced; uploading the data to a server in real time or independently uploading the data after recording; the voice remarks are automatically played during recording and used for guiding investigation; the multiple acquisition and recording modes facilitate the recording work.

As an optional implementation manner, the preprocessing unit 102 is configured to perform preprocessing on the voice data, including noise removal, audio quality equalization, and acoustic feature extraction, and further includes performing international phonetic symbol recognition and marking on the voice data, where the method specifically may be:

the system is capable of processing the uploaded voice data, including preprocessing and conversion of the voice signal. This includes the steps of removing noise, equalizing the audio quality, extracting acoustic features, and the like. The preprocessing also comprises the identification and marking of the international phonetic symbols on the voice data, and the identification and marking of the international phonetic symbols can be performed manually. The system can be internally provided with an information register for professional proofreading annotators, and basic information of the professional proofreading annotators is stored: name, gender, year and month of birth, cultural degree, profession, photo, work unit, contact phone. The international phonetic symbol mark can also support simultaneous marking of multiple people, and marking contents are mutually independent; it is also possible to set a flag as to whether the other party is seen; the method can set a standard for introducing similar marks as a reference and other marking functions, can lighten repeated work of marking processes, can also expand application in teaching, such as distributing the same voice to a plurality of students for marking, can display marks of all classmates at the same time after marking, and highlights investigation items with different marks after voice data splitting, thereby rapidly copying the reasons of the difference of the marks of the students.

As an optional implementation manner, the creating unit 103 may specifically be configured to create a speech sound resolution rule including a tone resolution rule, an initial resolution rule, and a final resolution rule, where the method includes:

according to the basic disassembly principle: consonants are initials, vowels and consonants behind the vowels are vowels, and numbers are tones. The system-created voice dismantling rules comprise a tone dismantling rule, an initial dismantling rule and a final dismantling rule, wherein the initial dismantling rule comprises a first initial dismantling rule, a second initial dismantling rule and a third initial dismantling rule, and the final dismantling rule comprises a first final dismantling rule, a second final dismantling rule and a third final dismantling rule which are stored in the system for later use.

As an optional implementation manner, the manner in which the parsing unit 104 is configured to parse the preprocessed voice data according to the voice disassembly rule to obtain the initials, finals and tones of the voice data may specifically be:

analyzing the preprocessed voice data P according to a voice disassembly rule to obtain a tone I and an initial consonant and final combination P1; splitting the combination P1 of the initials and the finals according to a voice splitting rule to obtain an initial consonant C and a final V; judging whether the P1 is a zero initial consonant, if so, analyzing the preprocessed voice data according to a first initial consonant disassembly rule and a first final sound disassembly rule to obtain an initial consonant C and a final sound V, and ending the analysis; if it is judged that there is no zero initial sign (e.g., 0,One of the three), splitting the initials and finals normally, and analyzing the P1 according to a second initial dismantling rule to obtain an undetermined initial C1; analyzing the P1 according to the second vowel dismantling rule to obtain a vowel V1 to be determined; judging whether the obtained vowel V1 is an empty vowel, if V1 is not the empty vowel, confirming that the previously obtained vowel C1 and the previously obtained vowel V1 are the required vowel C and vowel V, and ending the analysis; if V1 is an empty vowel, judging whether an initial consonant vowel exists or not, if so, analyzing the undetermined vowel C1 according to a third initial consonant disassembly rule and a third vowel disassembly rule to obtain an initial consonant C and a vowel V, and ending the analysis; step S25, if the vowel is not vowel, confirming that the initial consonant C is zero initial consonant, and outputting the initial consonant C as 0 and the initial consonant C as +.>One of them; vowel v=c1.

According to a further aspect of the embodiments of the present invention, there is also provided a computer readable storage medium comprising a stored program, wherein the program when run performs the steps of the method embodiments described in any of the above.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

s1, acquiring modern Chinese voice data;

s2, preprocessing the acquired modern Chinese voice data;

s3, analyzing the preprocessed voice data according to a voice disassembly rule to obtain initials, finals and tones of the voice data.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A modern chinese language speech analysis method, characterized in that it comprises the steps of:

s1, acquiring modern Chinese voice data;

the voice dismantling rules comprise tone dismantling rules, initial consonant dismantling rules and final dismantling rules; the first initial consonant dismantling rule, the second initial consonant dismantling rule and the third initial consonant dismantling rule are included; the vowel dismantling rules comprise a first vowel dismantling rule, a second vowel dismantling rule and a third vowel dismantling rule;

the tone disassembly rule is as follows:

after the tone I is split according to the tone resolution rule, the combination P1 of the initial consonant and the final sound is obtained by splitting:

wherein P1 is the combination of initials and finals, shangyun is the naming of the combination of initials and finals, the numbers 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and the modes are tones in different formats, the interaction is the naming of the tones, P is the voice data after pretreatment, the mat (P) is the input voice data, and the group ("shangyun") is the combination of the output initials and finals;

s32, splitting the combination P1 of the initials and the finals according to a voice splitting rule to obtain an initial consonant C and a final sound V, wherein the method specifically comprises the following steps:

the first initial consonant disassembly rule is as follows:

the first vowel dismantling rule is as follows:

.group("vowel")；

the second mother disassembly rule is as follows:

c1 = (;

the second vowel dismantling rule is as follows:

v1= (;

the third initial consonant disassembly rule is as follows:

c= (;

the third vowel dismantling rule is as follows:

v= (;

2. The modern chinese language voice analysis method of claim 1, wherein the modern chinese language voice data in step S1 comprises mandarin chinese language voice data and chinese dialect voice data.

3. The method according to claim 1, wherein the step S2 of identifying and marking the phonetic data by international phonetic symbols can be performed by manual marking.

4. A modern chinese speech analysis system, characterized in that it is adapted to perform the method as claimed in any one of the claims 1-3, comprising:

an acquisition unit configured to acquire voice data;

5. A computer-readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when run, performs the method of any one of claims 1 to 3.