CN111508522A - Statement analysis processing method and system - Google Patents

Statement analysis processing method and system Download PDF

Info

Publication number
CN111508522A
CN111508522A CN201910094372.8A CN201910094372A CN111508522A CN 111508522 A CN111508522 A CN 111508522A CN 201910094372 A CN201910094372 A CN 201910094372A CN 111508522 A CN111508522 A CN 111508522A
Authority
CN
China
Prior art keywords
sentence
chunk
exercise
word
prosodic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910094372.8A
Other languages
Chinese (zh)
Inventor
夏海荣
张少飞
于佳玉
刘悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hujiang Education Technology Shanghai Co ltd
Original Assignee
Hujiang Education Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hujiang Education Technology Shanghai Co ltd filed Critical Hujiang Education Technology Shanghai Co ltd
Priority to CN201910094372.8A priority Critical patent/CN111508522A/en
Publication of CN111508522A publication Critical patent/CN111508522A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a method and a system for analyzing and processing sentences, wherein the method comprises the following steps: performing prosodic hierarchy analysis on the exercise sentences, determining chunk time boundaries of each prosodic chunk in each sentence, and setting intonation marks for the exercise sentences; setting a rereading mark for the exercise sentence; and taking the determined chunk time boundary, the intonation marks and the practice sentences with the marks being re-read as standard prosody level sentences. By the method provided by the invention, prosody hierarchy analysis is carried out on the text of the input sentence, so that a linear word sequence of a whole sentence is converted into a prosody hierarchy structure, and a user can learn and master the method for carrying out prosody structure analysis on the text and use the method in pronunciation. By the method, the user can master the use of intonation and rereading when the sentence is read.

Description

Statement analysis processing method and system
Technical Field
The present application relates to the field of data analysis technologies, and in particular, to a method and a system for analyzing and processing sentences.
Background
Reading is an important learning method in language learning: the reading can improve the accuracy and the fluency of the pronunciation of the learner and the comprehension capacity of the learner on the sentences and even chapters, thereby strengthening the correct use of the rhythm characteristics such as the stress reading, the intonation and the like.
In reading aloud, the learner may experience the following errors or inaccuracies: mispronunciations or impracticalities of words (including vowels, consonants, syllable boundaries, accents, continuations, transcription savers, etc.), intraword and interword disfluences (including inappropriate durations and pauses), prosodic changes (omission or misuse of accents) such as lack of pitch capability, lack of correct grammatical and semantically related intonation changes (e.g., intonation or precipitation at the end of a sentence), inability to correctly understand a sentence and control the rhythm of speech output by phrases (Phrasing).
Currently, more traditional schemes practice reading aloud in two ways:
the first method is as follows: talking dictionary
A standalone electronic dictionary device, or desktop software, software running in a mobile device (including WeChat applets, web pages, etc.). After a user queries a word, the voiced dictionary provides a traditional paraphrase of the word, along with audible audio (live voice or computer synthesized language) of the pronunciation of the word that can be played. The learner learns the pronunciation of the word by playing the audio and may orally mimic it. The voiced dictionary may also provide a number of word-related illustrative sentences, which may also be accompanied by audio that may be played.
The second method comprises the following steps: talking book
The audio file can be an independently distributed audio file (mp3, etc.), a matching optical disk of a book, an early recording tape, or a program form on a certain content platform: such as PodCast, himalayan FM, wechat, public, etc. The way learners use audiobooks is usually "listening". The learner can also imitate by himself.
The third method comprises the following steps: pronunciation evaluation software
Including software running on a desktop system, software running on a mobile device (mobile applications, wechat applets, web programs, etc.), and other smart devices running an operating system (smart televisions, smart speakers, etc.). Such software typically provides demonstration audio that compares the learner's spoken speech to the demonstration speech to produce an overall score, and typically also provides scores for the segmentation dimensions including pronunciation accuracy, completeness, and fluency.
Although the scheme can guide the user to read aloud, the first and second modes cannot evaluate the reading level of the user, and the learner cannot get immediate feedback;
in the third mode, although the training of the readers can be scored, only sentence-level reading scoring can be provided, and the system cannot realize the targeted training of the learner on the structural segment; and the mode only provides recorded demonstration audio and cannot provide a teaching function, so that the user's grasp of reading skills is reduced.
Disclosure of Invention
The invention provides a statement analysis processing method and system, which are used for solving the problem that in the prior art, the user cannot carry out targeted training due to the fact that whole statement analysis evaluation is carried out on user reading data.
The specific technical scheme is as follows:
a method of statement analysis processing, the method comprising:
performing prosodic hierarchy analysis on the exercise sentences to determine chunk time boundaries of each prosodic chunk in each sentence, wherein the prosodic chunk comprises at least one word, and the time boundaries represent pause positions of the sentences;
setting intonation marks for the exercise sentences according to the determined chunk time boundary;
setting a rereading mark for the exercise sentence according to the determined chunk time boundary;
and taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
Optionally, performing prosody hierarchy analysis on the exercise sentences to determine chunk time boundaries of each prosody chunk in each sentence, includes:
performing prosodic hierarchy analysis on the practice sentences to determine word time boundaries corresponding to all words in the practice sentences;
determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
Optionally, determining the chunk time boundary of each prosodic chunk according to the word time boundary of each word includes:
determining a sentence layer in the practice sentence according to the word time boundary of each word;
determining a intonation phrase layer in the sentence layer;
determining a prosodic phrase layer in the intonation phrase layer;
determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
Optionally, setting intonation marks for the exercise sentences according to the determined chunk time boundary, including:
acquiring data in the exercise sentence and acquiring a tone labeling set, wherein the data comprises each line of text and voice corresponding to each line of text, and the labeling set comprises each tone;
and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
Optionally, setting a rereading mark for the exercise sentence according to the determined chunk time boundary, including:
acquiring data in the exercise sentence and acquiring a rereading label set;
and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
Optionally, after taking the determined chunk time boundary, the intonation flag, and the re-reading marked exercise sentence as a standard prosody level sentence, the method further includes:
acquiring an exercise sentence of the user based on the standard prosody level sentence;
determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence;
and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
Optionally, after outputting prompting information for prompting a rhythm chunk for performing repeated exercises, the method further includes:
detecting whether a rhythm chunk currently trained by a user passes evaluation;
if not, prompting the user to continue training the current rhythm chunk;
if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
A system of statement analysis processing, the system comprising:
the analysis module is used for carrying out prosody level analysis on the exercise sentences, determining the chunk time boundary of each prosody chunk in each sentence, and setting intonation marks for the exercise sentences according to the determined chunk time boundary; setting a re-reading mark for the exercise sentence according to the determined chunk time boundary, wherein the prosodic chunk comprises at least one word, and the time boundary represents the pause position of the sentence;
and the processing module is used for taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
Optionally, the analysis module is specifically configured to perform prosody hierarchy analysis on the exercise sentence, and determine a word time boundary corresponding to each word in the exercise sentence; determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
Optionally, the analysis module is specifically configured to determine a sentence level in the practice sentence according to the word time boundary of each word; determining a intonation phrase layer in the sentence layer; determining a prosodic phrase layer in the intonation phrase layer; determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
Optionally, the analysis module is specifically configured to obtain data in the exercise sentence and obtain a tone labeling set, where the data includes each line of text and a voice corresponding to each line of text, and the labeling set includes each tone; and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
Optionally, the analysis module is specifically configured to obtain data in the exercise sentence and obtain a rereading annotation set; and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
Optionally, the processing module is further configured to obtain an exercise sentence of the user based on the standard prosody level sentence; determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence; and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
Optionally, the processing module is further configured to detect whether the currently trained prosody chunk of the user passes the evaluation; if not, prompting the user to continue training the current rhythm chunk; if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
By the method provided by the invention, prosody hierarchy analysis is carried out on the text of the input sentence, so that a linear word sequence of a whole sentence is converted into a prosody hierarchy structure, and a user can learn and master the method for carrying out prosody structure analysis on the text and use the method in pronunciation. By the method, the user can master the use of intonation and rereading when the sentence is read.
In addition, the prosodic chunks of the sentences of the user can be decomposed and analyzed, and errors of the user in each prosodic chunk of the sentences are determined, so that the user can do partial exercise aiming at each prosodic chunk and even aiming at a single word, and the pertinence and the learning efficiency of the reading-aloud learning are improved.
Drawings
FIG. 1 is a flowchart of a method for analyzing and processing a statement according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a prosodic hierarchy in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a statement analysis processing system according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention are described in detail with reference to the drawings and the specific embodiments, and it should be understood that the embodiments and the specific technical features in the embodiments of the present invention are merely illustrative of the technical solutions of the present invention, and are not restrictive, and the embodiments and the specific technical features in the embodiments of the present invention may be combined with each other without conflict.
First, the terms to which the present invention relates are explained:
the sentence: sequences of words and punctuation, organized according to grammatical rules and semantic requirements, express specific meanings, usually ending with punctuation.
And (3) voice: human vocal organs (vocal cords, vocal tract, tongue, mouth, lips, teeth) naturally convert a specific sentence into sound in the form of a sequence of phonemes under the coordination of the brain.
Rhythm: for the expression needs in human natural language, specific phones/syllables are assigned different prosodic parameters: duration (Duration), Pitch (Pitch), Energy (Energy), and Pause (Pause) to produce a "twitch-down, twitch-down" effect. Humans can perceive whether prosodic parameters match text.
Tone: refers to the trend of the pitch trajectory of the pronunciation of a sentence or a segment of a sentence. In general, statement sentences and special interrogations use down-tones, while general interrogations use up-tones.
Semantic rereading: unlike the repeated reading of symbols (stress) in english words, speakers often make the prosody of some words more prominent in peripheral words according to the semantic and expression requirements of sentences, such as increased pitch, increased energy (volume), increased duration, additional pause, and so on.
The grammar structure is as follows: the process of parsing the sentence described by the natural language text into a syntax tree describing the aforementioned components according to the linguistic criteria, such as main, predicate, object, predicate, shape, complement. The syntax tree is usually represented as a nested structure, e.g., S ═ NP + VP, meaning that a sentence S is composed of a name phrase (as subject) plus a verb phrase (as predicate).
The rhythm structure is as follows: the prosodic structure is a process of reorganizing a text sequence into an 'interconnected block structure' according to the communication requirement during the speaking process of a speaker. The correct and proper prosodic structure can reduce the communication cost of a speaker and a listener. The prosodic structure affects the prosodic features of the text after it is read. This block-like structure also has nested (hierarchical) junctions
However, the structure is much shallower than the grammar structure, and generally has only 2-3 layers. For example: S-IP 1+ IP2, IP 1-PP 1+ PP2 indicate that a sentence S is composed of two intonation phrases IP1 and IP2, where IP1 is composed of a prosodic phrase PP1 and a prosodic phrase PP 2.
Fig. 1 shows a statement analysis processing method according to an embodiment of the present invention, where the method includes:
s1, performing prosody hierarchy analysis on the exercise sentences to determine the chunk time boundary of each prosody chunk in each sentence;
first, the prosodic hierarchy needs to be analyzed, and in the embodiment of the present invention, the prosodic hierarchy can be divided into 3 layers: sentence layer (S), intonation phrase layer (IP), prosodic phrase layer (PP). One S may be composed of one to several IPs, one IP may also be composed of one to several PPs, and the marks between the chunks are chunk time boundaries.
Specifically, S ═ w0,w1,w2,,,wi,,,wn]Is divided intoThe results of the precipitated hierarchy are: s [ [ w ]0,w1,w2][[w3,w4],[wi,,,wn]]Wherein the sentence S includes two IPs: IP1 ═ w0,w1,w2],IP2=[w3,,,wi,,,wn]Wherein IP2 contains two PPs, namely: PP1 ═ w3,w4],PP2=[wi,,,wi,,,wn]。
Before performing prosodic hierarchy analysis on an exercise sentence, word time boundaries of individual words in the sentence need first be determined. In the embodiment of the present invention, the word time boundary is: IP _ Boundary, PP _ Boundary, None-Boundary.
For example, the statement is: this is a serous issue and sensing well with discussh Moscow. The word time boundaries for each word are shown in table 1:
Figure BDA0001964146400000071
Figure BDA0001964146400000081
TABLE 1
The time boundaries of the individual prosodic chunks in the sentence can be determined based on the time boundaries of the words. For example, the statement is: this is a serous issue and sensing well with disc with Moscow. When the learner reads the sentence with 12 words and 17 syllables, the learner needs to avoid completing the pronunciation at one stroke, but the learner should accord with the characteristics of the sentence to make proper analysis and planning of the prosodic structure.
Thus, after prosody hierarchy analysis, the prosody analysis results are shown in fig. 2, where the complete exercise sentence is divided into corresponding hierarchies in fig. 2.
It should be noted that, in the embodiment of the present invention, the prosodic hierarchy analysis may be calculated and solved by using a machine-learned algorithm model such as a conditional random field algorithm, a hidden markov model, a recurrent neural network, or the like.
Through the prosody level analysis, the prosody levels in the voice can be marked, and the prosody levels can be used for evaluating the voice exercise of the user and used as a scoring basis for subsequent voice exercise.
S2, according to the determined chunk time boundary, tone marks are set for the training sentences;
first, in the embodiment of the present invention, the intonation types, the applicable cases, and the intonation trends are shown in table 2:
Figure BDA0001964146400000082
TABLE 2
Based on the contents in table 2, the set of labels of the intonation is I ═ None, L ow, High, L ow _ L ow, L ow _ High, High _ L ow, High _ High };
training data set D ═ D0,D1,,,Di,,,,DkIn which D isi=Si,Ti,Si=[w0,w1,,,wi,,,wn],Ti=[ti0,ti1,,,tij,,,tin|tij]。
Further, in the embodiment of the present invention, the intonation is labeled based on an unsupervised clustering algorithm, and the specific steps are as follows:
1. specifying standard documents to determine record formats, decision bases, arbitration schemes, and the like;
2. preparing data to be marked, including each line of text and corresponding voice;
3. operating a rhythm level marking process to determine a rhythm level boundary;
4. calculating a word time boundary of each word in the speech by using a forced alignment algorithm;
5. extracting acoustic features in the sentence, generating Ai ═ Ai0, Ai1,, aij,, ain ] data;
6. for the set of ij in all Ai, unsupervised clustering similar to K-Means is performed:
6-1) for None-Boundary, skip;
6-2) for PP _ Boundary, the clustering target is 2 types;
6-3) for IP _ Boundary, the clustering target is 4 types;
based on the method, model training is established, and the steps of establishing a machine learning model are as follows:
1. processing the large-scale data set according to the labeling method;
2. extracting text features and constructing pairs between text feature representations and tone types;
3. a model of the relationship between the text feature representation and the tone type is trained using a learning algorithm.
The steps of using the model to distinguish intonation are as follows:
a. initializing classification calculation and loading the learning model;
b. extracting text feature representation;
c. the text feature representation is input to a classification algorithm and an output target pitch type is generated.
The intonation output results are shown in table 3:
Figure BDA0001964146400000091
Figure BDA0001964146400000101
TABLE 3
S3, setting a rereading mark for the exercise sentence according to the determined chunk time boundary;
and S4, taking the determined chunk time boundary, the intonation marks and the practice sentences with the repeated reading marks as standard prosody level sentences.
After the intonation analysis of the exercise sentence is completed, the exercise sentence also needs to be re-read and analyzed, that is, a re-read mark of each word segment in the speech is marked, wherein the re-read type is shown in table 4:
type of rereading Is suitable for Rereading situation
None Common to the null word, or weakened real word Weak reading
Normal Common to real words Is normal
Emphasized The real word is required for highlighting the semantic meaning Rereading
TABLE 4
Based on the contents in table 2, the label set of the rereading is: e ═ None, Nor mal, alpha sized }; for each word in the training data, one label in the set E needs to be labeled, and the training data set D ═ D0,D1,,,Di,,,,Dk}, wherein: di=Si,Ai,TiRepresenting one document (sentence) in the training set;
Si=[wi0,wi1,,,wij,,,win]a sequence of words (Token) representing the document (sentence), length being; a. thei=[ai0,ai1,,,aij,,,ain]Representing a sequence of acoustic features (Acoustics features) corresponding to each word (Token); t isi=[ti0,ti1,,,tij,,,tin|tij]Where E denotes a tag sequence corresponding to each word (Token).
Further, rereading labeling is carried out by using unsupervised clustering, and the labeling method comprises the following steps:
1. specifying standard documents to determine record formats, decision bases, arbitration schemes, and the like;
2. preparing data to be marked, including each line of text and corresponding voice;
3. operating a rhythm level marking process to determine a rhythm level boundary;
4. calculating a word time boundary of each word in the speech by using a forced alignment algorithm;
5. extracting acoustic features in the sentence, generating Ai ═ Ai0, Ai1,, aij,, ain ] data;
6. unsupervised clustering similar to K-Means is performed for the set of ij in all Ai, with the target of clustering being 3 classes.
Based on the method, model training is established, and the steps of establishing a machine learning model are as follows:
1. processing the large-scale data set according to the labeling method;
2. extracting text features and constructing pairs between text feature representations and tone types;
3. a model of the relationship between the text feature representation and the tone type is trained using a learning algorithm.
The steps of using the model to distinguish intonation are as follows:
a. initializing classification calculation and loading the learning model;
b. extracting text feature representation;
c. the text feature representation is input to a classification algorithm and an output target pitch type is generated.
The intonation output results are shown in table 4:
Figure BDA0001964146400000111
Figure BDA0001964146400000121
TABLE 4
After completing the prosody level analysis, acquiring a practice sentence of the user based on the standard prosody level sentence, determining a prosody chunk with an error in the practice sentence based on the prosody level, and outputting prompting information for prompting the user to practice the prosody chunk. In brief, different prosody chunks exist in one sentence, and the system analyzes each prosody chunk so as to determine whether an error exists in the practice sentence of the user, and if the error exists, a user prompt is given and the position of the error is prompted.
And when errors exist, the system enters a repeated exercise stage of the error rhythm chunk, and detects whether the rhythm chunk currently exercised by the user passes the evaluation, wherein the evaluation is based on the method, if the evaluation fails, the system prompts the user to continue the exercise of the current rhythm chunk, and if the evaluation passes, the system is switched from the current rhythm chunk to the next rhythm chunk with errors to exercise.
By the method provided by the invention, prosody hierarchy analysis is carried out on the text of the input sentence, so that a linear word sequence of a whole sentence is converted into a prosody hierarchy structure, and a user can learn and master the method for carrying out prosody structure analysis on the text and use the method in pronunciation. By the method, the user can master the use of intonation and rereading when the sentence is read.
In addition, the prosodic chunks of the sentences of the user can be decomposed and analyzed, and errors of the user in each prosodic chunk of the sentences are determined, so that the user can do partial exercise aiming at each prosodic chunk and even aiming at a single word, and the pertinence and the learning efficiency of the reading-aloud learning are improved.
Corresponding to the method provided by the embodiment of the present invention, an embodiment of the present invention further provides a statement analysis processing system, and as shown in fig. 3, the present invention is a schematic structural diagram of a statement analysis processing system in the embodiment of the present invention, where the system includes:
the analysis module 301 is configured to perform prosody level analysis on the exercise sentences, determine chunk time boundaries of each prosodic chunk in each sentence, and set intonation marks for the exercise sentences according to the determined chunk time boundaries; setting a re-reading mark for the exercise sentence according to the determined chunk time boundary, wherein the prosodic chunk comprises at least one word, and the time boundary represents the pause position of the sentence;
a processing module 302, configured to use the determined chunk time boundary, the intonation flag, and the practice sentence with the re-reading flag as a standard prosody level sentence.
Further, in this embodiment of the present invention, the analysis module 301 is specifically configured to perform prosody level analysis on the practice sentence, and determine a word time boundary corresponding to each word in the practice sentence; determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
Further, in the embodiment of the present invention, the analysis module 301 is specifically configured to determine a sentence level in the practice sentence according to the word time boundary of each word; determining a intonation phrase layer in the sentence layer; determining a prosodic phrase layer in the intonation phrase layer; determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
Further, in this embodiment of the present invention, the analysis module 301 is specifically configured to obtain data in the exercise sentence and obtain a tone labeling set, where the data includes each line of text and a voice corresponding to each line of text, and the labeling set includes each tone; and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
Further, in this embodiment of the present invention, the analysis module 301 is specifically configured to obtain data in the exercise sentence and obtain a rereading annotation set; and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
Further, in this embodiment of the present invention, the processing module 302 is further configured to obtain an exercise sentence of the user based on the standard prosody level sentence; determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence; and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
Further, in this embodiment of the present invention, the processing module 302 is further configured to detect whether the prosody module currently trained by the user passes the evaluation; if not, prompting the user to continue training the current rhythm chunk; if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the application, including the use of specific symbols, labels, or other designations to identify the vertices.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (14)

1. A method of statement analysis processing, the method comprising:
performing prosodic hierarchy analysis on the exercise sentences to determine chunk time boundaries of each prosodic chunk in each sentence, wherein the prosodic chunk comprises at least one word, and the time boundaries represent pause positions of the sentences;
setting intonation marks for the exercise sentences according to the determined chunk time boundary;
setting a rereading mark for the exercise sentence according to the determined chunk time boundary;
and taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
2. The method of claim 1, wherein performing prosodic hierarchy analysis on the exercise sentences to determine chunk time boundaries for each prosodic chunk in each sentence comprises:
performing prosodic hierarchy analysis on the practice sentences to determine word time boundaries corresponding to all words in the practice sentences;
determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
3. The method of claim 2, wherein determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word comprises:
determining a sentence layer in the practice sentence according to the word time boundary of each word;
determining a intonation phrase layer in the sentence layer;
determining a prosodic phrase layer in the intonation phrase layer;
determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
4. The method of claim 2, wherein setting intonation flags to the exercise sentence according to the determined chunk time boundary comprises:
acquiring data in the exercise sentence and acquiring a tone labeling set, wherein the data comprises each line of text and voice corresponding to each line of text, and the labeling set comprises each tone;
and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
5. The method of claim 2, wherein setting a reread flag on the exercise sentence according to the determined chunk time boundary comprises:
acquiring data in the exercise sentence and acquiring a rereading label set;
and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
6. The method of claim 1, wherein after taking the determined chunk time boundaries, the intonation markers, and the re-read marked exercise sentences as standard prosody level sentences, the method further comprises:
acquiring an exercise sentence of the user based on the standard prosody level sentence;
determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence;
and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
7. The method of claim 6, wherein after outputting prompting information for prompting prosodic chunks for performing repetitive exercises, the method further comprises:
detecting whether a rhythm chunk currently trained by a user passes evaluation;
if not, prompting the user to continue training the current rhythm chunk;
if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
8. A system for sentence analysis processing, the system comprising:
the analysis module is used for carrying out prosody level analysis on the exercise sentences, determining the chunk time boundary of each prosody chunk in each sentence, and setting intonation marks for the exercise sentences according to the determined chunk time boundary; setting a re-reading mark for the exercise sentence according to the determined chunk time boundary, wherein the prosodic chunk comprises at least one word, and the time boundary represents the pause position of the sentence;
and the processing module is used for taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
9. The system of claim 8, wherein the analysis module is specifically configured to perform prosody level analysis on the exercise sentence to determine word time boundaries corresponding to each word in the exercise sentence; determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
10. The system of claim 9, wherein the analysis module is specifically configured to determine a sentence level in the exercise sentence based on the word time boundary of each word; determining a intonation phrase layer in the sentence layer; determining a prosodic phrase layer in the intonation phrase layer; determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
11. The system of claim 9, wherein the analysis module is specifically configured to obtain data in the exercise sentence and obtain a tone labeling set, wherein the data includes each line of text and a voice corresponding to each line of text, and the labeling set includes each tone; and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
12. The system of claim 9, wherein the analysis module is specifically configured to obtain data in the exercise sentence and obtain a re-reading annotation set; and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
13. The system of claim 8, wherein the processing module is further configured to obtain an exercise sentence of the user based on the standard prosody hierarchy sentence; determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence; and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
14. The system of claim 13, wherein the processing module is further configured to detect whether a prosodic chunk currently trained by the user passes the evaluation; if not, prompting the user to continue training the current rhythm chunk; if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
CN201910094372.8A 2019-01-30 2019-01-30 Statement analysis processing method and system Pending CN111508522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910094372.8A CN111508522A (en) 2019-01-30 2019-01-30 Statement analysis processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910094372.8A CN111508522A (en) 2019-01-30 2019-01-30 Statement analysis processing method and system

Publications (1)

Publication Number Publication Date
CN111508522A true CN111508522A (en) 2020-08-07

Family

ID=71868946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910094372.8A Pending CN111508522A (en) 2019-01-30 2019-01-30 Statement analysis processing method and system

Country Status (1)

Country Link
CN (1) CN111508522A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686018A (en) * 2020-12-23 2021-04-20 科大讯飞股份有限公司 Text segmentation method, device, equipment and storage medium
CN113327615A (en) * 2021-08-02 2021-08-31 北京世纪好未来教育科技有限公司 Voice evaluation method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333501A (en) * 2001-07-20 2002-01-30 北京捷通华声语音技术有限公司 Dynamic Chinese speech synthesizing method
US20030149558A1 (en) * 2000-04-12 2003-08-07 Martin Holsapfel Method and device for determination of prosodic markers
CN101000764A (en) * 2006-12-18 2007-07-18 黑龙江大学 Speech synthetic text processing method based on rhythm structure
CN104464751A (en) * 2014-11-21 2015-03-25 科大讯飞股份有限公司 Method and device for detecting pronunciation rhythm problem

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030149558A1 (en) * 2000-04-12 2003-08-07 Martin Holsapfel Method and device for determination of prosodic markers
CN1333501A (en) * 2001-07-20 2002-01-30 北京捷通华声语音技术有限公司 Dynamic Chinese speech synthesizing method
CN101000764A (en) * 2006-12-18 2007-07-18 黑龙江大学 Speech synthetic text processing method based on rhythm structure
CN104464751A (en) * 2014-11-21 2015-03-25 科大讯飞股份有限公司 Method and device for detecting pronunciation rhythm problem

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686018A (en) * 2020-12-23 2021-04-20 科大讯飞股份有限公司 Text segmentation method, device, equipment and storage medium
CN113327615A (en) * 2021-08-02 2021-08-31 北京世纪好未来教育科技有限公司 Voice evaluation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN101551947A (en) Computer system for assisting spoken language learning
Gao et al. A study on robust detection of pronunciation erroneous tendency based on deep neural network.
Liscombe Prosody and speaker state: paralinguistics, pragmatics, and proficiency
Cahill et al. Natural language processing for writing and speaking
James et al. Developing resources for te reo Māori text to speech synthesis system
Demenko et al. The use of speech technology in foreign language pronunciation training
Liao et al. A prototype of an adaptive Chinese pronunciation training system
CN111508522A (en) Statement analysis processing method and system
Tseng ILAS Chinese spoken language resources
Dai An automatic pronunciation error detection and correction mechanism in English teaching based on an improved random forest model
Kantor et al. Reading companion: The technical and social design of an automated reading tutor
CN115440193A (en) Pronunciation evaluation scoring method based on deep learning
Fata Is my stress right or wrong? Studying the production of stress by non-native speaking teachers of English
Bang et al. An automatic feedback system for English speaking integrating pronunciation and prosody assessments
Ibejih et al. EDUSTT: In-domain speech recognition for Nigerian accented educational contents in English
Pellegrini et al. Extension of the lectra corpus: classroom lecture transcriptions in european portuguese
Li et al. English sentence pronunciation evaluation using rhythm and intonation
Liu et al. Speech disorders classification in phonetic exams with MFCC and DTW
Ling et al. A research on guangzhou dialect's negative transfer on british english pronunciation by speech analyzer software Praat and ear recognition method
CN114783412B (en) Spanish spoken language pronunciation training correction method and system
CN111508523A (en) Voice training prompting method and system
Duan et al. An English pronunciation and intonation evaluation method based on the DTW algorithm
TWI731493B (en) Multi-lingual speech recognition and theme-semanteme analysis method and device
US20210304628A1 (en) Systems and Methods for Automatic Video to Curriculum Generation
Xu et al. Application of Multimodal NLP Instruction Combined with Speech Recognition in Oral English Practice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200807