CN117612553A

CN117612553A - Modern voice recording, analyzing and displaying system

Info

Publication number: CN117612553A
Application number: CN202311342385.5A
Authority: CN
Inventors: 林春雨; 龚明袖
Original assignee: Shenzhen Municipal Yuan Software Co ltd; Guangdong Polytechnic Normal University
Current assignee: Shenzhen Municipal Yuan Software Co ltd; Guangdong Polytechnic Normal University
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2024-02-27
Anticipated expiration: 2043-10-17
Also published as: CN117612553B

Abstract

The invention discloses a modern voice recording, analyzing and displaying system, which comprises: the analysis unit is used for storing the modern voice data, voice disassembly rules and a plurality of voice basic data tables, and the display unit is used for displaying the voice data investigation analysis results on a map according to geographic information. The invention can improve the efficiency of modern voice recording, analysis and display and provides powerful support for the social scientific research of voice.

Description

Modern voice recording, analyzing and displaying system

Technical Field

The invention relates to the field of voice analysis, in particular to a modern voice recording, analyzing and displaying system.

Background

The method adopted by the traditional language voice collecting and recording analysis is a paper note recording method, a recording pen or mp3 recording is adopted in the later step, the recording duration is long, the later re-listening is inconvenient, a certain entry is not convenient to locate in a section of recording for several hours, the later step adopts computer software to collect voice data, the form of recording is adopted, the collected voice of each investigation item is split and analyzed, but the traditional computer software is of a single-machine type and cannot accurately split and analyze the voice data, and through the analysis of Chinese voice, the voice characteristics such as the voice system, the voice change rule, the sound and vowel system and the like of Chinese can be explored and researched, so the method is very valuable for understanding the voice structure, the voice evolution and the comparison research with other languages. However, existing products often suffer from inaccurate speech phoneme splitting, incorrect splitting, and the like, and have very limited auxiliary functions in teaching and scientific research activity analysis and research. In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a modern voice recording, analyzing and displaying system, which is characterized in that a recording unit is used for collecting and recording various modern voice data in audio or video forms, a voice disassembling rule created by an analyzing unit is used for dividing the modern voice data obtained by recording into sound, rhyme and tone, and the sound, the rhyme and tone are compared with voice data stored in a system database and Chinese ancient voice data in a multi-layer manner, and an analysis result is output; the voice data investigation analysis result is displayed on the map according to the geographic information through the display unit, so that the efficiency of modern voice recording, analysis and display is improved, and powerful support is provided for teaching and scientific research activity analysis and research.

According to one aspect of an embodiment of the present invention, there is provided a modern voice recording, analysis and display system comprising:

the acquisition and recording unit is used for acquiring and recording various modern voice data;

the analysis unit is used for carrying out multi-level comparison analysis on the collected and recorded modern voice data through splitting sound, rhyme and tone and outputting an analysis result;

and the display unit is used for displaying the survey and analysis results of the modern voice data on the map according to the geographic information.

As an alternative embodiment, the modern voice recording, analysis and display system further comprises:

a storage unit for storing modern voice data including but not limited to recordings, voice resolution rules and voice base data tables.

As an alternative embodiment, the analysis unit comprises:

the preprocessing subunit is used for preprocessing the collected and recorded modern voice data, wherein the preprocessing comprises noise removal, audio quality equalization, acoustic feature extraction and international phonetic symbol recognition and marking of the voice data;

the creating subunit is used for creating a voice dismantling rule, wherein the voice dismantling rule comprises an initial dismantling rule, a final dismantling rule and a tone dismantling rule;

the disassembly subunit is used for analyzing and disassembling the preprocessed voice data according to the created voice disassembly rule to obtain initials, finals and tones of the voice data;

and the comparison subunit is used for carrying out multi-level comparison analysis on the disassembled voice data and voice data stored in the system and Chinese ancient voice data, and outputting an analysis result.

As an optional implementation manner, the tone resolution rule created by the creation subunit is:

I＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*).matcher(P).group("intonation")

Wherein I is tone, numerals 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and (9) are tone of different formats, shangyun is naming of combination of initials and finals, internationis naming of tone, P is modern voice data after pretreatment, and a group (internation) is input voice data.

The first and second consonant dismantling rules and the third consonant dismantling rules are included, wherein:

first initial consonant disassembly rules:

wherein C is the initial consonant, null is the naming of the zero initial consonant, 0,The symbols of the initials are zero respectively, vowels are the names of the finals, and P1 is the combination of the initials and the finals;

second mother disassembly rule:

c1 = (;

wherein, C1 is the initial consonant to be determined, confonant is the naming of the initial consonant, the international phonetic symbol consonant is all consonants in the international phonetic symbol consonant table, vowell is the naming of the final, and P1 is the combination of the initial and the final;

third initial consonant disassembly rule:

c= (

.matcher(C1).group("consonant")；

Wherein, C is an initial consonant, confonant is a name for the initial consonant, international phonetic symbols consonant is all consonants in the international phonetic symbol consonant table, vowell is a name for a final, vowels are all consonants in the vowels table, and C1 is a pending initial;

The vowel dismantling rules comprise a first vowel dismantling rule, a second vowel dismantling rule and a third vowel dismantling rule, wherein:

first final dismantling rule:

wherein V is vowel, null is the naming of zero initial consonant, 0The symbols of the zero initial consonants, respectively, vowels areNaming vowels, wherein P1 is the combination of the initial consonant and the vowels;

the second vowel dismantling rule:

v1= (;

wherein V1 is the final to be defined, confonant is the naming of the initial consonant, the international phonetic symbol consonant is all consonants in the international phonetic symbol consonant table, vowell is the naming of the final, and P1 is the combination of the initial and the final;

the third vowel dismantling rule is as follows:

v= (

.matcher(C1).group("vowel")；

Wherein V is a final, confonant is a name for an initial, international phonetic symbols consonant are all consonants in an international phonetic symbol consonant table, vowell is a name for a final, vowels are all consonants in an vowels table, and C1 is a pending initial.

As an alternative embodiment, the voice basic data table includes, but is not limited to, a word table, a grammar table, a phonetic table, a homonym table, a lineage division table, a mid-paleo initial table, a mid-paleo final table, an international phonetic symbol vowel table, an international phonetic symbol consonant table, and an phonation vowel table.

As an alternative embodiment, the modern voice data recorded by the recording unit includes audio data and video data, and the audio data and the video data are added with basic information of a recorded person and descriptive metadata of a recorded language.

As an optional implementation manner, the multi-level comparison analysis of the disassembled voice data with the voice data stored in the database and the Chinese mid-ancient voice data includes: comparing and analyzing the disassembled voice data with voice data of different languages and dialects stored in a database; comparing and analyzing the disassembled single-point acquisition recording voice data with the ancient voice stored in the database; comparing and analyzing the disassembled multi-point recorded voice data with the ancient voice stored in the database; and comparing and analyzing the disassembled multi-point acquisition and recording voice data.

As an optional implementation manner, the display unit displays the voice data investigation analysis result on the map according to the geographic information, including but not limited to displaying the voice data acquisition point on the map according to the geographic information; displaying a plurality of different places where a certain sound appears on a map according to geographic information; language existence, coexistence and presentation of the situation of the specific geographic position on the map according to geographic information; the customized content of the modern voice researchers is displayed on a map according to geographic information; the method also comprises the steps of setting a mark with the color matched with the fixed language in the map, setting numbers on the mark to represent the number of language types in the geographic position, displaying the color of the mark by using the color of the language with the largest speaking, and displaying the name of the specific language type when the mouse is placed on the mark number in the map.

As an alternative implementation mode, the international phonetic symbols are identified and marked on the voice data, including manual identification and marking, and manual checking and confirmation after the phonetic symbols are retrieved from a system voice basic data table and imported according to the voice data for marking.

As an alternative implementation manner, the modern voice recording, analyzing and displaying system further comprises a management unit for protecting the security and privacy of system data, and ensuring the security and privacy protection of voice data and information, including data encryption, access control and authority management.

The invention has the beneficial effects that:

1. the recording unit can record audio and video at the same time, and can be used for remote multi-user at the same time; recording imported pictures as investigation references; the entry can record the explanation voice of the project implementer, break through the regional limitation controlled by remote investigation;

2. the analysis unit of the invention is humanized for labeling international phonetic symbols, can label the international phonetic symbols in terms of pronunciation, rhyme and tone, can select close investigation points from a system for pre-matching, and then performs phonetic symbol correction; the multi-user labeling can be realized, and the project responsible person can preferentially reserve, remotely modify and approve; multiple users may be allowed to view and modify survey results as co-researchers; then carrying out multi-layer comparison and research, and finally outputting analysis and comparison results;

3. The storage unit stores the lookup table for providing the multi-language basic knowledge data in the profession, and the creation of the queriable basic data table is beneficial to researchers, scholars or cross-cultural intercommuniers to better know the language use standard, idioms, polite words and the like of target culture. Meanwhile, language education is facilitated, and teachers can inquire language habits, common errors, grammar rules and other data and are used for teaching preparation and development of teaching materials. Students can also expand vocabulary, increase grammar levels, and increase the accuracy of language expressions by querying relevant data.

Drawings

FIG. 1 is a schematic diagram of a modern voice recording, analysis and display system according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of a table of results of a present invention's present invention versus ancient voice analysis;

FIG. 3 is a schematic diagram showing the presence of language and specific geographic locations in accordance with one embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, an embodiment of the present invention provides a modern voice recording, analyzing and displaying system, which includes:

the acquisition and recording unit 101 is used for acquiring and recording various modern voice data;

the analysis unit 102 is used for carrying out multi-level comparison analysis on the modern voice data obtained through recording through splitting sound, rhyme and tone, and outputting an analysis result;

A storage unit 103 for storing the collected modern voice data and a plurality of voice basic data tables; the voice basic data table can be searched according to the requirement and relevant information of the voice basic data table can be called.

And the display unit 104 is used for displaying the voice data material on the map according to the geographic information.

As a specific example of an implementation, the recording unit 101 is configured to collect and record a plurality of modern voice data.

The various modern voice data contents recorded by the system recording unit 101 include mandarin voice data, dialect voice data, and minority language voice data. Mandarin plays an important role in communication in China modern society, dialects (particularly Chinese dialects) are branches of Chinese-ethnic languages, are languages used by people in local areas, and are formed by development of common languages on the basis of northern dialects, and the common languages depend on each other to promote social development and progress. Classification of dialects: currently, the language community of China has ten main dialects for dividing modern Chinese dialects: respectively, official speech, jin, wu, hui, gan, min, gu, yue, ping. The language of the minority is usually independent of the main languages such as mandarin, dialect and the like, and is mainly used in the minority population, and the minority population has a unique voice, vocabulary and grammar system. These languages differ significantly from the main languages, including basic words, syntactic structures, pronunciation, and phonetic features. With the continuous deep interaction, communication and fusion of the nations (called 'three-way intersection' for short), a plurality of language contacts are generated between the minority nations and the Chinese, so that the method is worthy of deep research.

The system provided by the invention can be used as a new tool for modern voice field investigation, can be used for carrying out investigation work by multiple persons at the same time, and can realize the acquisition and recording of voice or video by the terminals such as a computer, a Pad, a mobile phone and the like; the unit can provide a user-friendly interface, and the investigation content can be customized according to the user requirements; the audio and video functions of the striping directory system are realized. The language researcher can conveniently input voice data and related information and add descriptive metadata such as speaker information, language, dialect, geographical location of life and the like.

The recording of the field survey of the system can be performed at a plurality of survey (recording) points, and the basic data of the survey filling include the following:

survey (panel) point data: positioning from top to bottom to province-territory-county-street office or village-resident or administrative village; population in this county: nationality and population in this county: is there a minority of ethnic languages? How does kind, distribution, population and use case, if any? What are the types of Chinese dialects in this county, how many kinds of "accents" are, whether there are islands of dialects, how are the distribution, population and usage changes? Is there a music or local play of the talking and singing of the useful dialect? If any, what are the categories and use cases?

Survey staff data: name, gender, year and month of birth, cultural degree, profession, photo, work unit, contact phone.

Speaker data: the requirements for the speaker can give prompts on the system entry web page as follows, standard speaker conditions: the investigation is carried out by age above 60 years, the investigation must be carried out locally and grown up, the family language environment is simple (parents and spouse are local people) and the investigation is not carried out in the foreign place (more than 1 year), and the investigation can say local dialects of the tunnel (masses and self-approval). With primary or middle school cultural levels (it is generally not preferable to choose a college or university cultural level). Specifically, the data filled in by the speaker includes the following: name, photograph, unit, home address, phone, E-mail, gender, ethnicity, year of birth (calendar), place of birth from province level to natural village level, cultural level, occupation, what kind of words include mandarin, foreign language, what will now be said mainly, where father is, what will be said, where mother is, what will be said, where spouse is, what will be said.

The content that the speaker needs to be collected includes: word sound, vocabulary, grammar and phonetic and stroke. Wherein the word sound is like a ladder; words such as "sun" (a picture of sun can be uploaded at the same time for explanation); grammar sentence, such as the affirmative sentence "small yesterday fishing a big fish" in statement sentence. ", negative sentence" i did not catch fish ". "; the language is a small paragraph such as the article of northern wind and Sun: "North wind and sun both feel that their own power ratio is great, they struggle for a long time, and finally decide ratio. In the morning, north wind and sun are together. They see a horse rider wearing a new cloak. North wind theory: "that riding person seems to like his new cloak, but i can easily blow down the cloak from his body. "solar theory: "I feel you do nothing. To try out, one can look at who can take the cloak off from him. You first come bar-! "the north wind blows a great deal of wind against the riding person, the hat of the person calling … … to … … flies, and the person catches up with his own hat to run. Leaves also fall from the tree. Animals are frightened. The ship which is parked in the port is submerged. North wind blows out fully, but regardless of whether it is, the rider only tightly wraps the cloak around himself. At this time, the sun speaks loudly: "the round is me la-! "the sun emits soft light, under the irradiation of sunlight, flowers bloom, and bees and butterflies fly. The bird begins singing. Animals doze in warm sunlight. People can get to the street for meeting chat in a dispute. The horse riding person is very hot in the sun, goes to the river, takes off the clothes, and jumps into the river to swim. Thus, the warm and warming sun overcomes the strong storm and removes the cloak from the rider. "

From the standpoint of structural elements and research, a language is basically composed of word sounds, words, grammar and language pieces, and basically the required investigation can be entered into these 4 classes. Survey data for more than 3000 survey points have been collected in the system of the present invention, while various survey databases commonly used in the industry are provided for recall, for example: the Chinese dialect survey word list, the fizeau survey word list, the Chinese dialect word survey order list and the like are stored in a system database.

The system acquisition unit 101 implements the following functions: the investigated standard entries of words, sentences and texts can be imported through Excel; the system is provided with common characters, words, sentences and survey items; the word list imported by the researchers can be matched with the 'six elements of word sound' of the system: shooting, calling, waiting, tuning, rhyme and sound; the self-contained vocabulary entry stored in the system is attached with a picture, and the self-imported words can be matched with the existing picture of the system; multiple questionnaires (words, sentences, and pieces) can be set for the same item; a word can be added to a certain investigation point in the recording process, the added word does not influence the word set by the project, and only the current investigation point is valid, but if the investigated word is changed in the project setting, the word set by the investigation point is valid for all investigation points; each investigation word and the like can record voice remarks, and the voice remarks can be automatically played when being recorded, so that prompts are given to recording staff or speakers, prompt contents which are not well expressed by words are solved, and a researcher can play a role in guiding even if the researcher does not arrive at the scene; the multi-point and multi-person simultaneous investigation work is carried out, the mutual influence is avoided, and the efficiency is greatly improved; the terminals such as a computer, a Pad, a mobile phone and the like can realize the acquisition and recording of voice or video, and are flexible and convenient; all investigation contents are stored in the network terminal, so that the dependence on a personal computer is reduced, and the investigation can be logged in and inquired anywhere; the simple acquisition interface is clear in guiding, and the interference to the speaker is reduced; uploading the data to a server in real time or independently uploading the data after recording; the voice remarks are automatically played during recording and used for guiding investigation; the multiple recording modes facilitate recording work.

As a specific example of implementation, the analysis unit 102 is configured to split the voice, the rhyme and the tone of the modern voice data obtained by recording, perform multi-level comparison analysis, and output an analysis result, and specifically includes:

the preprocessing subunit is used for preprocessing the modern voice data obtained by recording, and the preprocessing comprises noise removal, audio quality equalization, acoustic feature extraction and international phonetic symbol recognition and marking of the voice data.

The system is capable of processing the uploaded voice data, including preprocessing and conversion of the voice signal. This includes the steps of removing noise, equalizing the audio quality, extracting acoustic features, and the like. The preprocessing also comprises the steps of identifying and marking the international phonetic symbols of the voice data, and marking the voice data by adopting the international phonetic symbols which are universal phonetic systems in the world generally for all human languages. International phonetic symbols (english: international Phonetic Alphabet, abbreviated: IPA), early also called "ten thousand phonetic symbols", are a set of systems for phonetic marking, based on latin letters, designed by the international phonetic society as a standardized marking method for spoken sounds. The present invention uses international phonetic symbols, which are a symbology for representing human phonetic elements, to identify and mark modern Chinese phonetic data. It is not limited by specific language, and can accurately represent various phonetic phonemes and their pronunciation modes. International phonetic symbols contain a wide range of symbols describing phonemes, consonants, vowels, accents, etc. of speech. Pinyin is used mainly to mark the pronunciation of Chinese characters, especially in Mandarin, while International phonetic symbols are a more general symbology for describing the pronunciation of various languages. Compared with pinyin, the international phonetic symbols are more accurate and specific, can represent more detailed pronunciation differences, and have irreplaceable functions for exploring and researching phonetic features such as the sound system, the pitch law, the initial consonant and vowel system and the like of Chinese, knowing the phonetic structure and the phonetic evolution of the Chinese and comparing research with other languages. For our chinese-like tonal languages, a syllable (if a word is understood to be a complete syllable with respect to chinese) includes consonants, vowels and tones; we also commonly refer to initials (essentially by consonants), finals (both vowels and consonants can be used as finals), and intonation, such as the international phonetic symbol of "guava" in cantonese is "[ fan55]", which includes consonants, vowels, and intonation; the analysis of modern speech is to split, analyze and study initials, finals and intonation.

For the recognition and labeling of international phonetic symbols of modern voice data, the system can be divided into two modes, wherein the first mode adopts manual labeling; the second mode is based on a basic database stored by the system to semi-automatically mark initials, finals and tones, and can help a user to mark a large amount of voice data quickly and improve the marking accuracy and consistency.

The first mode is to label manually: an information register is arranged for professional proofreading labeling personnel in the system, and basic information of the professional proofreading labeling personnel is stored: name, gender, year and month of birth, cultural degree, profession, photo, work unit, contact phone. The phonetic symbol mark supports simultaneous marking of multiple persons, and marking contents are mutually independent; a flag can be set as to whether the other party is seen; the method can introduce a plurality of marking functions such as standard of similar points as references, firstly can lighten repeated work of marking process, secondly can be applied to teaching, for example, voice of the same investigation (recording) point is distributed to a plurality of students for marking, and after marking, marks of all classmates can be displayed simultaneously, investigation items with different marks are highlighted, and reasons of marking difference are quickly repeated.

The second mode is to semi-automatically mark initials, finals and tones: because of the great variety of Chinese dialects, the voice system has great difference with other languages (English, etc.), and currently no commonly available software tool can directly realize the international phonetic symbol marking of Chinese voice, so the international phonetic symbol marking of Chinese voice is currently carried out manually. At present, the system adopts semiautomatic labeling of initials, finals and tones, one is a character template with a record, and takes character information of investigation contents of the record point as an index, searches in the existing data of the basic database of the system, and imports international phonetic symbols of the existing database data; the other is that there is no word template of the record, can use the existing software to change the pronunciation of the record into characters, after the literal content is confirmed to be correct, search in the existing data of the basic database of this system with characters as index, import the international phonetic symbol, check and revise after the international phonetic symbol is imported, the phonetic symbol of the investigation point of the same dialect type that the existing data is selected is imported first, check and revise again, can reduce the repeated input, raise the efficiency like this. For example, at present, if a professional international phonetic symbol identification mark personnel identifies 15 minutes of recording, the time of a week is at least required, if the international phonetic symbol is identified and marked semi-automatically by adopting a system and then is modified by a sound recorder, the time of 1-2 days is required, the efficiency is greatly improved, and meanwhile, the accuracy and consistency of the marking are also improved.

the voice dismantling rules comprise a tone dismantling rule, an initial dismantling rule and a final dismantling rule, wherein the initial dismantling rule comprises a first initial dismantling rule, a second initial dismantling rule and a third initial dismantling rule, and the final dismantling rule comprises a first final dismantling rule, a second final dismantling rule and a third final dismantling rule.

The disassembly subunit is used for analyzing and disassembling the preprocessed voice data according to the voice disassembly rule to obtain the initials, the finals and the tones of the voice data, wherein the basic disassembly rule is that consonants are initials, vowels and consonants on the back surface of vowels are finals, and numbers are tones;

currently, tone notations are of various types, with lifting symbols (arrows), of digital type (including multi-digit numbers to indicate tone values and single-digit numbers to indicate tone numbers), and some numbers are marked with superscripts, and are represented by (1) - (9). For ease of machine recognition, the labels for the international phonetic symbols are defaulted in the present invention to be represented by the characters "0-9" or "(1) -" of the numeric type.

For example, mandarin has 4 tones, i.e. yin flat, yang flat, up (reading as "appreciation"), and down. Therefore, the tone number of 1/2/3/4 is used for marking, the tone value of 1 (tone value is a numerical index of specific tone level, and there is a difference from 1 digit to a plurality of digits) is 55, the tone value of 2 is 35, the tone value of 3 is 214, and the tone value of 4 is 51. Labeling with 4 tonal values such as:

a first sound: [ fan55] fan

Second sound:eggplant

Third sound:frying

Fourth sound: [ tan51] egg

Labeling with 4 tone marks such as:

a first sound: [ fan1] guava

Second sound:eggplant

Third sound:frying

Fourth sound: [ tan4] egg

For example, in the cantonese dialect, there are 9 tones, respectively: yin-level, yin-up, yin-down (also called upper yin-in), lower yin-in (also called middle yin-in), yang-level, yang-up, yang-down, yang-in. The 9 tones of the cantonese dialect, if noted by the key number, may be marked 1-9 digits, such as:

if noted with a key, it can be noted as:

therefore, tone marks 1-9 or (1) - (9) are useful, and tone values (varying from 1 digit to plural digits) are also useful.

The disassembling subunit of the system analyzes the preprocessed voice data according to a voice disassembling rule to obtain initials, finals and tones of the voice data, and specifically comprises the following steps:

Step S1, analyzing the preprocessed voice data P according to a voice disassembly rule to obtain a tone I and a combination P1 of initials and finals;

in the present invention, the preprocessed voice data P is international phonetic symbol data of a voice, for example, "double" (p=fan 55) of a mandarin voice data, and the P needs to be split to obtain a combination P1 of tone I and initials+finals. And for mandarin chinese, for example: "do you get good? "split this sentence in three words, first you, P is ni3 (labeled with a key) or ni214 (labeled with a key), then split according to a speech splitting rule, where:

the tone disassembly rule is: i= (

Wherein I is tone, numerals 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and (9) are tone of different formats, shangyun is naming of combination of initials and finals, internationis naming of tone, P is voice data after pretreatment, and a group (internation) is input voice data and output tone;

After the tone I in the international phonetic symbol P of the voice is split according to the tone resolution rule, the combination P1 of the initial consonant and the final is split to obtain:

P1＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*).matcher(P).group("shengyun")

wherein, P1 is the combination of initials and finals, shangyun is the naming of the combination of initials and finals, the numbers 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and the modes are tones with different formats, the interaction is the naming of the tones, P is the voice data after pretreatment, the mat (P) is the input voice data, and the group ("shangyun") is the combination of the output initials and finals.

Step S2, splitting the combination P1 of the initials and the finals according to a voice splitting rule to obtain an initial consonant C and a final sound V, wherein the method specifically comprises the following steps:

step S21, judging whether the P1 is a zero initial consonant, if so, analyzing the preprocessed voice data according to a first initial consonant disassembly rule and a first final sound disassembly rule to obtain an initial consonant C and a final sound V, and if not, executing the step S22;

firstly judging whether the sign of the zero initial consonant exists in the international phonetic symbol (for example, the marking convention of the zero initial consonant is 0,If the zero initial consonant exists, analyzing the P1 according to the first initial consonant disassembly rule and the first final sound disassembly rule to respectively obtain an initial consonant C and a final sound V.

The first initial consonant disassembly rule is:

the first vowel dismantling rule is as follows:

wherein V is vowel, null is the naming of zero initial consonant, 0The symbols of the initials are zero respectively, vowels are the names of the finals, and P1 is the combination of the initials and the finals;

step S22, in step S21, it is determined that there is no zero-initial symbol (e.g., 0,One of the three), splitting the initials and finals normally, and analyzing the P1 according to a second initial dismantling rule to obtain an undetermined initial C1; analyzing the P1 according to the second vowel dismantling rule to obtain a vowel V1 to be determined;

second mother disassembly rule: c1 = (;

the second vowel dismantling rule: v1= (

Voice) (? < vowell > (-) the router (P1) group ("vowell");

Wherein V1 is the final to be defined, confonant is the naming of the initial consonant, the international phonetic symbol consonant is all consonants in the international phonetic symbol consonant table, vowell is the naming of the final, and P1 is the combination of the initial and the final.

Step S23, judging whether the vowel V1 obtained in the step S22 is an empty vowel, if V1 is not the empty vowel, confirming that the previously obtained vowel C1 and the previously obtained vowel V1 are the required vowel C and vowel V, ending the analysis, otherwise executing the step S24;

the international phonetic symbol consonant table of table 1 below, since only consonants will constitute initials, the remaining finals are after finding (dismantling) the initials according to table 1; if there is no remaining (the value of the pending final V1 is null), it is a null final.

Table 1: international phonetic symbol consonant table

Step S24, if V1 is an empty vowel, judging whether an initial consonant or not, if so, analyzing the undetermined vowel C1 according to a third initial consonant disassembly rule and a third vowel disassembly rule to obtain an initial consonant C and a vowel V, and if so, executing the step S25;

if no vowel exists, then judging whether the vowel exists or not, and finding out the vowel consonant according to the vowel list in the following table 2, if so, analyzing the undetermined vowel C1 according to a third vowel disassembly rule and a third vowel disassembly rule to obtain the required vowel C and vowel V.

Table 2: phonation rhyme table

Third initial consonant disassembly rule: c= (;

third final dismantling rule: v= (;

wherein V is a vowel, confonant is a name for an initial consonant, international phonetic symbols consonant is all consonants in an international phonetic symbol consonant table, vowell is a name for a vowel, vowels are all consonants in an vowel table, and C1 is a to-be-determined initial consonant;

step S25, if the vowel is not made, confirming that the initial consonant C is zero initial consonant, outputting 0,One of them; vowel v=c1;

v= (

.group("consonant")。

Wherein V is a final, C1 is a to-be-determined initial, confonant is a name for the initial, international phonetic symbols consonant is all consonants in the international phonetic symbol consonant table, vowell is a name for the final, and P1 is a combination of the initial and the final.

This corresponds to correction of errors in the early international phonetic symbols in a sense.

In the following, a "double" of mandarin chinese speech data (p=fan 55) is taken as an example of parsing:

P＝fan55

according to the tone dismantling rule in the voice dismantling rule, obtaining:

I＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*).matcher("fan55").group("intonation")＝55

P1＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*).matcher("fan55").group("shengyun")＝fan

splitting the combination P1 of initials and finals, judging that the initial symbol is 0 without zero,Or->Then, resolving the P1 according to the second vowel resolving rule and the second vowel resolving rule to obtain a pending initial consonant C1 and a pending vowel V1:

undetermined initial consonant c1= (

Voice) (? < vowell > (-), matches ("fan"), group ("con-sonant") =f

Vowel V1 = (

Voice) (? < vowell > (-), matches ("fan"), group ("vowell") =an

If the obtained undetermined vowel an is not an empty vowel, confirming that the undetermined vowel C1 and the undetermined vowel V1 obtained before are the required vowel C and vowel V:

the initial consonant C is f, the final sound V is an, the tone I is 55, and the analysis is finished.

The vocabulary, grammar and language of Chinese in the recorded voice material can be split into words, and the words are taken into account and split after labeling; the minority language word list can be split, and after the splitting, professional staff can carry out screening confirmation of the initial and final tones in the system, and the correct initial and final tones obtained by the system splitting are added into a database of the initial, final and final tones after one-to-one correspondence, so that the call and statistics of the total number and types of the final stages are convenient.

And the comparison subunit is used for carrying out multi-layer comparison analysis on the disassembled voice data and the Chinese mid-ancient voice data stored in the system, and outputting a comparison analysis result.

The corpus obtained by the survey mark and the embedded corpus of the database, namely, the voice data stored in the database, the paleo-voice in Chinese and the like are subjected to multi-angle and multi-level comparison analysis; the method comprises the steps of comparing the initial and final tone characteristics, the voice variants, the voice correspondence rules, the continuous reading tone change rules and the like among different languages, dialects or speakers. And (3) comparing the corpus obtained by the single-point investigation mark with the Chinese ancient voice, comparing the corpus obtained by the multi-point investigation mark with the Chinese ancient voice, and outputting various comparison results. The comparison result comprises a table form, a chart form, statistical data and the like, and by taking comparison analysis of different languages as an example, the table form can show the expression forms of the initial and final tones of the same word in the different languages, the chart form can show the similarity degree of the initial and final tones of the different languages, and the system supports the output and display of the analysis result in various forms, such as report forms and chart phonetic symbols proofreading: different people mark the same investigation point at the same time, and can see the marked content or only see the content of themselves (the error correction and teaching process of the applicant after marking is commonly used).

Taking the comparison with the mid-palace sounds as an example, the mid-palace sounds are also called mid-palace Chinese sounds, and the mid-palace sounds refer to a period in the history of Chinese development, and approximately cover the six century to thirteen century of the male element. In this period, the pronunciation of Chinese has a large variation, and the pronunciation of Chinese is represented by Chinese characters (specifically "anti-cutting") in this period, and the phonological positions of Chinese characters are induced by Chinese characters. For example: the information of 'taking, calling, etc., tuning, rhyme, sound, five-tone, clear and turbid, group, system, ancient rhyme and the like' is included, and the phonetic alphabet and international phonetic symbols are not used at the moment, and are all expressed by words to represent the pronunciation difference, namely 'six elements of word and tone': the six elements of the Chinese character pronunciation are composed of six elements of the Chinese character pronunciation. Various attributes of the Chinese ancient rhymes of the Chinese characters, namely 'shooting, calling, waiting, tuning, rhyme, sound, five-tone, clear and turbid, group, system, ancient rhyme and the like', are determined; some information is not uniquely corresponding, for example, some words have different phonological positions (six elements). For the established method, the system matching method is adopted, and cannot be established, and the method is provided by researchers. Wherein the word sound is like a ladder, and the Chinese and Jinyin is t ^h i55", may be manually or semi-automatically marked by the system, and the lead-in (in the case of an existing recording) may be retrieved from a homophone table (see table 3 below).

Further, the data stored in each base table in the system is added into the base table for subsequent use after being confirmed. Similar to homophone tables, the investigation data of each investigation point can generate homophone tables of different types for summarizing and storing the initial consonant and vowel of different dialects, and when the system data is called, the call can be conducted in a similar priority mode according to different dialects and geographic information. The output of various tables such as a sound system table, a tone matching table, a homophone table and the like can be supported as required.

Table 3: homophone word list

The other relevant attribute information of the Chinese and ancient sounds of the ladder-shaped character is that the elements of the Chinese and ancient sounds of six elements of the character sound which are input by a researcher in advance are ' transparent ', and the information of five sounds, clear and turbid, fine, groups and systems ' is matched in a Chinese and ancient sounds table (table 4 below) in a basic table by a system according to the ' transparent '.

Table 4: chinese ancient initial consonant table

The system matches the ancient vowel information according to the information of the ancient vowels in the character sound 6 elements of the ladder-shaped character in the ancient vowel table (table 5 below) in the basic table to obtain the comparison analysis result of the ancient vowels in the first line (the comparison analysis result of the ancient vowels in 10 characters shown in fig. 2) as shown in fig. 2, and outputs the comparison analysis result for research and teaching.

The Chinese characters have very complex conditions, and the complex and simplified shapes of the Chinese characters have various variants and shapes after the calendar is established and the new Chinese is established, so that the Chinese characters which look the same can belong to different pronunciation and vowels for different reasons such as shapes, semantics and the like, and for this reason, the researcher provides the most accurate six elements of the pronunciation by himself. For other determined phonological position attributes, a system matching method is adopted.

Table 5: chinese ancient final table (part)

As a specific example of implementation, the storage unit 103 is configured to store modern voice data including a record, a voice disassembling rule, and a plurality of voice base data tables; the voice basic data table can be searched according to the requirement and relevant information of the voice basic data table can be called.

The content stored in the storage unit comprises recorded modern voice data and a plurality of voice basic data tables; the method also comprises the step of creating a voice disassembly rule created by the subunit, wherein the voice disassembly rule comprises an initial disassembly rule, a final disassembly rule and a tone disassembly rule for call preparation.

The voice basic data table in the storage unit stores a common investigation word table, a grammar table and a language-piece table; the Chinese phonetic alphabet comprises an initial consonant list, a vowel list, a homonym list, a sound system structure list, an initial and vowel system, a vowel classification list, a Chinese ancient initial list, a Chinese ancient vowel list, an international phonetic symbol consonant list, an initial consonant and vowel list, a vowel common-use initial consonant, a vowel common-use tone, a vowel list and a Chinese ancient list. The system provides a lookup table of various language basic knowledge data in the profession, such as an initial consonant table, a vowel table and the like, and the creation of the queriable language basic knowledge data table is beneficial to researchers, scholars or cross-cultural intercommuniers to better know the Chinese phonetic information, language use standards, idioms, polite words and the like of target culture, and the related knowledge information of phonetic and linguistic is learned by using the most convenient way. By inquiring, using and comparing the language knowledge base data, the system can provide important support for the socioeconomic research related to linguistics. For example, researchers can query social media or questionnaire data, learn about language usage patterns, attitudes, value views, etc. of specific groups, and thus, study social phenomena, group behaviors, and cultural transitions in depth. The language knowledge base data query and use provided by the system are beneficial to language education. The teacher can inquire the data such as language habit, common errors, grammar rules and the like, and is used for teaching preparation and development of teaching materials. Students can also learn about the condition of a language, expand vocabulary, increase grammar level and increase the accuracy of language expression by querying relevant data. For the translation and localization industries, the language knowledge base data query provided by the system of the invention is of great importance. The translator can query the data of terms, expression habits, cultural backgrounds and the like in the specific field to ensure the accuracy of translation and meet the habit and cultural requirements of target audiences. In the technical field of language, the system can provide data support for technical development such as machine translation, natural language processing and the like. Researchers and developers can query data such as corpora, language rules, language variants, etc., for training and improving the performance and effectiveness of language technology systems.

The system also provides a retrieval function: for retrieving a term having a plurality of search point data; searching a certain sound with a plurality of check points; a search point is searched for a plurality of voices.

As a specific example of implementation, the display unit 104 is configured to display the results of the investigation and analysis of the voice data on the map according to geographic information.

The various survey analysis results are presented in a map manner, such as survey point distribution, at which points a certain sound appears, etc., and the voice data material presented in geographic information on the map comprises direct presentation of all survey items and presentation of content that is needed by researchers to choose themselves. For example: researchers can choose the corresponding rule of the Chinese ancient full-turbidity initial sound input words to display, for example, see that it is read as yin tone class or yang tone class or what specific tone value exists in each point of the generated map.

As shown in fig. 3, the presence (coexistence) of languages and specific geographic locations can be displayed in a map. The system sets the color of the mark in the figure to match with the fixed language, the number represents the number of the language types of the geographic position, and a plurality of places simultaneously speak a plurality of dialects, for example 1, namely the place only speaks 1 language; if 2 is that there are 2 languages, the color of the sign is displayed with the most language, the mouse is placed on the number in the map, what language is seen, for example, the mouse is placed on the number 2 of the sign in the map in fig. 3, the local words of the fowling and the guest are displayed, the color of the sign is the color of the fowling set by the system, and the fowling people in the region are proved to be more than the guest.

As a specific example of implementation, the system further includes a rights management unit for security and privacy protection of system data, ensuring security and privacy protection of voice data and information, including data encryption, access control, and rights management. Professional collation labeling personnel marking permission: A. setting phonetic symbols of different survey points marked by different people according to the survey points; B. different people mark the same investigation point at the same time, and can see the marked content of other people or only see the content of the people. If there are multiple marks on the same point (inviting others to correct errors, multiple people to mark or importing marks of points near the platform), the comparison can be repeated to select one as the final result.

The inventive system provides a multi-user multi-persona design; a standard RBAC structure; the system roles comprise roles of an administrator, a researcher, a teacher, a student, an item initiator, an item participant, a researcher, a language emotion inspector, a general learner and the like; application scene: the system user can register and invite to participate in two kinds by oneself, different users can possess multiple roles, different roles have different functions and data authorities. For example, user a may be the person responsible for the project in project a, and only the participant in project b. Providing a variety of underlying data: standard: the platform prepares various basic data according to professional habits; the content is as follows: various commonly used survey words, sentences and notes; classification of language system and dialect; chinese ancient sounds and vowels table; international phonetic symbols vowels, consonants and vowels; the sound of the present invention is usually consonant, vowel and tone; ancient sounds; ancient sounds; application scene: when doing project investigation, the basic data can be directly quoted, and the basic data is modified on the basis of the basic data to quickly form the required result; the present invention is characterized by that the present invention uses the Chinese traditional medicine as main raw material. Providing commonly used small functions: splitting the polyphones: when the past project is imported, if several phonetic symbols of the original multi-phonetic characters in the data list are recorded in one row, the two rows can be automatically split; splitting initial consonant and vowel: when the past project is imported, if the original data is recorded with the original sound, rhyme and tone, the original data can be automatically disassembled into sound, rhyme and tone; matching the Chinese ancient voice: well-matched relevant items of the Chinese ancient sounds; special investigation: for investigation of a specific sentence in a certain region, a large-scale investigation work is performed by an applet.

A complete project survey as one embodiment to which the system of the present invention is applied includes:

1. new project: including item names and item descriptions;

2. survey content setting: characters, words, sentences and notes can be set respectively; multiple questionnaires are supported for the same project (e.g., a complete table, a simplified table, a complete table for areas with multiple demographics, a simplified table for a small population, or a complete or simplified questionnaire is taken as needed); the questionnaire content can be a table for importing system basic data, importing an existing Excel table, manually adding, deleting, modifying and the like; the system can automatically match the following contents according to the word sound 6 elements 'take, call, etc., tone, rhyme and sound' set by a user: five tones, clear and turbid, thin, group system, ancient rhyme and other information; the vocabulary can automatically match the existing pictures of the system; each investigation item supports voice remarks besides text remarks, and the voice remarks can be automatically played during recording/video;

3. inviting project participants: the project can invite multiple persons to participate, each participant can independently set the authority of the project (which points are investigated, whether the recording function exists, whether the mark is participated, whether the questionnaire content can be modified, etc.); only the project link is sent to the invitee, and the invitee clicks confirmation (if no user of the system can prompt to register first) to participate in the project;

4. Survey point management: the investigation points can be added in advance by project manager, and then allocated to different investigators, or can be added by the investigators at will; the investigation point adding process can reasonably add investigation points by referring to the semantic distribution map;

5. audio/video recording: the investigation terminal supports a desktop/notebook computer, a Pad of an android system and a mobile phone of the android system; the method has the advantages that any software is not required to be installed, and the audio and video investigation can be carried out as long as a browser can be connected with the internet (the use of a Google Chrome browser is suggested);

6. phonetic symbol marking: supporting simultaneous marking of multiple people, wherein marking contents are mutually independent; a flag can be set as to whether the other party is seen; the method has the advantages that the standard of similar points can be imported as a reference, and various marking functions can be achieved, firstly, repeated work of the marking process can be reduced, secondly, the method is applied to teaching, for example, voices of the same investigation point are distributed to a plurality of students for marking, marking of all classmates can be displayed at the same time after marking, investigation items with different marks are highlighted, the reason of the difference of the marks of the quick repeated disc is explained, and the practical operation capability of the students is improved;

7. project output: viewing initial consonant table, vowel table, homophone table, phonetic system structure table, initial and vowel system, vowel class inference table, ancient and modern sound comparison, voice map, etc.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A modern voice recording, analysis and display system, comprising:

2. The modern voice recording, analysis and display system of claim 1, further comprising:

3. The modern voice recording, analysis and display system of claim 2, wherein the analysis unit comprises:

4. The modern speech recording, analysis and presentation system according to claim 3, wherein the tone resolution rules created by the creation subunit are:

I＝(？<shengyun>([^0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|⑩])*)(？<intonation>([0-9|①|②|③|④|⑤|⑥|⑦|⑧|⑨|

⑩])*).matcher(P).group("intonation")

Wherein, I is tone, numerals 0-9, (1), (2), (3), (4), (5), (6), (7), (8), (9) and the like are tone with different formats, shangyun is naming of combination of initials and finals, internationis naming of tone, P is modern voice data after pretreatment, and a group (internation) is input voice data, and group is output tone;

first initial consonant disassembly rules:

second mother disassembly rule:

c1 = (

.group("consonant")；

third initial consonant disassembly rule:

c= (

.matcher(C1).group("consonant")；

first final dismantling rule:

the second vowel dismantling rule:

v1= (

.group("vowel")；

the third vowel dismantling rule is as follows:

v= (

.matcher(C1).group("vowel")；

5. The modern speech recording, analysis and presentation system of claim 2 wherein the speech base data tables include, but are not limited to, word tables, grammar tables, phonetic alphabets tables, phonetic system division tables, mid-paleo-vowels tables, international phonetic symbol consonants tables and phonation tables.

6. The modern voice recording, analysis and display system of claim 1, wherein the modern voice data recorded by the recording unit comprises audio and video data, wherein the audio and video data is added with basic information of a recorded person and descriptive metadata of a recorded language.

7. The modern voice recording, analysis and display system according to claim 3, wherein the multi-level comparison analysis of the disassembled voice data with the voice data stored in the database and the chinese ancient voice data comprises: comparing and analyzing the disassembled voice data with voice data of different languages and dialects stored in a database; comparing and analyzing the disassembled single-point acquisition recording voice data with the ancient voice stored in the database; comparing and analyzing the disassembled multi-point recorded voice data with the ancient voice stored in the database; and comparing and analyzing the disassembled multi-point acquisition and recording voice data.

8. The modern voice recording, analysis and display system according to claim 1, wherein the display of the voice data survey analysis results on the map is performed according to geographic information, including but not limited to the display of the voice data recording points on the map according to geographic information; the display of a plurality of different places where a certain sound appears is performed according to geographic information on a map.

9. A modern speech recording, analysis and display system according to claim 3 wherein the international phonetic symbols are identified and marked on the speech data, including manual identification and marking and manual verification after the phonetic symbols are retrieved from the system's speech base data table and imported for marking based on the speech data.

10. The modern voice recording, analysis and display system according to claim 1, further comprising a management unit for security and privacy protection of system data, ensuring security and privacy protection of voice data and information, including data encryption, access control and rights management.