CN111816157A - Music score intelligent video-singing method and system based on voice synthesis - Google Patents

Music score intelligent video-singing method and system based on voice synthesis Download PDF

Info

Publication number
CN111816157A
CN111816157A CN202010590726.0A CN202010590726A CN111816157A CN 111816157 A CN111816157 A CN 111816157A CN 202010590726 A CN202010590726 A CN 202010590726A CN 111816157 A CN111816157 A CN 111816157A
Authority
CN
China
Prior art keywords
music score
audio
abc
splicing
notes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010590726.0A
Other languages
Chinese (zh)
Other versions
CN111816157B (en
Inventor
刘昆宏
吴清强
吴苏悦
张敬峥
詹旺平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202010590726.0A priority Critical patent/CN111816157B/en
Publication of CN111816157A publication Critical patent/CN111816157A/en
Application granted granted Critical
Publication of CN111816157B publication Critical patent/CN111816157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Abstract

The invention provides a music score intelligent video-singing method and a music score intelligent video-singing system based on voice synthesis, wherein the method comprises the following steps: step one, data preparation, namely inputting and analyzing an abc music score to obtain pitch and duration information of each note in the specific abc music score; step two, training parameters, namely generating notes with the length within 5 when training data are made, namely dividing all notes into a group of 5 notes when a complete abc music score is processed; step three, synthesizing audio splicing, specifically comprising three substeps of music score segmentation identification, segment splicing, waveform alignment and blank segment filling; and fourthly, visually displaying the synthesized audio. The invention solves the technical problems of large calculation amount in the training process, obvious splicing trace in direct splicing, splicing noise and the like, and the difference is difficult to distinguish by comparing the effects of the generated audio and the original data.

Description

Music score intelligent video-singing method and system based on voice synthesis
Technical Field
The invention belongs to the field of computers, and particularly relates to a music score intelligent video-singing method and system based on voice synthesis.
Background
In recent years, with the continuous development of networks, network education is continuously developed due to the continuous updating and iteration of mobile phone application and computer application, subjects of art are special, communication and mutual feedback of teachers and students are emphasized, students need to receive feedback in time when learning the class of courses to correctly recognize own errors, for example, when music learning is performed, each student simply looks at a music score and sings the music score, the teachers cannot give consideration to each student at the same time, and when the students learn the class of music, the students preferably look at the eyes, listen to ears, sing in the mouth and cooperate with multiple senses, so that the better subjects of learning the music and other stimulating senses can be realized.
The invention researches a video-song teaching block of a music on-line teaching platform, hopes that music score video-song can be changed from the traditional professional teacher to the computer for simulating the real-person singing method to sing the music score, so that students can flexibly learn various music scores and play a great role in expanding a database for the teacher or the platform, and the research problem of the invention is that the computer is used for completing the music score video-song task.
Music score sing is a problem similar to speech synthesis, and the current speech synthesis technology reaches the level of putting on the market, the music score can be regarded as a special language, and all the computer needs to learn the pronunciation of each note. The music score gives information such as beats in the information of the beginning, so the information needs to be brought into all notes in the character string during processing, the character string contains information of the duration and the pitch of each note, but the technical problems that training cannot be performed due to too large training data, splicing traces are obvious when direct splicing is performed, noise is generated and the like are encountered during specific application.
Disclosure of Invention
The invention provides a music score intelligent video-singing method and system based on voice synthesis, which can translate a music score into a new language to be fit with the corresponding mapping relation of characters and pronunciations of a certain language, solves the technical problems of large calculation amount in the training process, obvious splicing trace in direct splicing, splicing noise and the like, and is difficult to distinguish the difference by comparing the effects of generated audio and original data.
In order to solve the above problems, the present invention provides a music score intelligent video-singing method based on speech synthesis, the method comprising:
step one, data preparation, namely inputting and analyzing an abc music score to obtain pitch and duration information of each note in the specific abc music score;
step two, training parameters, namely generating notes with the length within 5 when training data are made, namely dividing all notes into a group of 5 notes when a complete abc music score is processed;
step three, synthesizing audio splicing, specifically comprising three substeps of music score segmentation identification, segment splicing, waveform alignment and blank segment filling;
and fourthly, visually displaying the synthesized audio.
In a second aspect, an embodiment of the present application provides a music score intelligent video-song system based on speech synthesis, where the system includes:
the data preparation module is used for inputting and analyzing the abc music score to obtain the pitch and duration information of each note in the specific abc music score;
the training parameter module generates notes with the length within 5 when training data are produced, namely, all notes of a complete abc music score are divided into a group of 5 notes when the complete abc music score is processed;
the synthesis audio splicing module comprises a music score segmentation identification sub-module, a segment splicing sub-module and a waveform alignment and blank segment filling sub-module;
and the visualization module is used for visually displaying the synthesized audio.
In a third aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method described in the present application.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the computer program being configured to:
which when executed by a processor implements a method as described in embodiments of the present application.
Drawings
FIG. 1 is a flow chart of music score intelligent video-song based on speech synthesis according to the present invention;
fig. 2 is a diagram of the effect of synthesizing audio results based on a score according to the present invention.
Detailed Description
In order to make the objects, technical processes and technical innovation points of the present invention more clearly illustrated, the present invention is further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In order to achieve the above purpose, the invention provides a music score intelligent video-singing method based on voice synthesis. The main process is shown in fig. 1, and the method comprises the following steps:
step one, data preparation, namely inputting and analyzing an abc music score to obtain pitch and duration information of each note in the specific abc music score;
the data to be processed in the application is an abc file, the abc file comprises two parts of information, the first five parts of information are rhythm, tone and the like of a music score, the lower part of the information is notes, the music score is different from a language, the notation of the whole music score is specified in the head of the music score, and the language is that each word has an independent pronunciation rule, so when the abc music score is processed, a similar processing method needs to be considered, and the score information in the head is brought into each note.
Step two, training parameters, namely generating notes with the length within 5 when training data are made, namely dividing all notes into a group of 5 notes when a complete abc music score is processed;
step three, synthesizing audio splicing, specifically comprising three substeps of music score segmentation identification, segment splicing, waveform alignment and blank segment filling;
and fourthly, visually displaying the synthesized audio.
Preferably, the data preparing step further comprises: each note setting comprises three parts of duration, note and pitch, 1/8 beats are selected in the shortest duration unit, the duration of the note is 1/8, 1/4, 3/8, 1/2, 3/4 and 1 beat, the range of the note is f 3-f #5, the data is really difficult to collect due to the human voice, vocaloid software is used for synthesizing audio to replace the human voice, the half-tone decreasing is represented by the half-tone decreasing, and all tones represented by black keys are uniformly represented by the half-tone increasing for accurate labeling. By the combined mode, the music score header information can be brought into each note, and the abc music score translation problem is solved.
Preferably, the training process further comprises: this application has used the script to generate 11000 data totally, and the data set divides into 10000 data as training data, and 1000 data remain and are used for testing data test model effect, and the test mode is artifical listening, distinguishes whether the duration and the pronunciation of sound are accurate. The alignment condition of the check point and the current encoder and decoder can be generated every 1000 training steps, and the training can be recovered from the check point again after the training is interrupted in the midway; the number # and Arabic numerals are replaced by English letters, the lengths of the bars are still fixed, if the pronunciation words are short, data with unfixed lengths are used, and the 'r' sound is directly added into the synthesized audio in the subsequent synthesis.
The method adopts a Mel scale spectrogram with the bandwidth of 80 as target output of a decoder, the sample rate is 20000hz, the frame length is 50ms, the frame offset is 12.5ms, the optimizer adopts conventional adam, the batch size is 32, and the delayed learning rate can effectively reduce the overfitting condition of a model.
Preferably, the synthetic audio splicing specifically includes:
performing segmentation recognition on a music score, namely splitting an abc file, taking 5 notes as a group, splitting the abc file into a plurality of measures, splitting the abc file when the r symbol is met, taking the rest notes as a measure, processing each measure by using a trained model to synthesize audio, and generating a wav file corresponding to each measure;
segment splicing, namely splicing the audio synthesized by the segments;
waveform alignment and blank section filling, for a continuously sung music score, a rough splicing mark can be generated, firstly, software is utilized to carry out simple splicing observation, two basic problems need to be processed, firstly, the head and the tail are pinched and removed, short blank time exists at the head and the tail of a synthesized audio, therefore, the blank position needs to be trimmed, as the head and the tail blank time of the synthesized audio are almost the same, the processing can be uniformly carried out, the second problem is the problem of 'papa' sound of the rough splicing, the main reason is that the sound is received after a certain sound, namely, the four sounds of 'duo', 'Re', 'Mi' and 'La', namely, the situation can be generated, and the noise is found to be caused by sudden change of the waveform at the position after the waveform diagram is amplified. In order to solve the problems, the following scheme is adopted in the application: the 'r' in the abc music score represents an unvoiced empty sound segment, when the audio is synthesized, the audio with blank duration is directly added to the part needing to be added as a buffer zone, sox is used for synthesizing the measure, the r sound comprises two parts, namely pronunciation length and r, the corresponding 'r' audio is selected for processing according to the requirements of the abc music score, and each splicing part is processed and output by using a very short r sound which can eliminate the noise and does not influence the duration feeling. This method has unexpected results for the elimination of noise. Through experiments, as shown in fig. 2, after a music score synthesis audio result graph is amplified, a blank r section added at a splicing part can be seen, and no obvious perception is provided on audibility. The actual effect is the same as that of the audio synthesized directly by people.
As another aspect, the present application further provides a music score intelligent video-song system based on speech synthesis, the system including:
the data preparation module is used for inputting and analyzing the abc music score to obtain the pitch and duration information of each note in the specific abc music score;
the training parameter module generates notes with the length within 5 when training data are produced, namely, all notes of a complete abc music score are divided into a group of 5 notes when the complete abc music score is processed;
the synthesis audio splicing module comprises a music score segmentation identification sub-module, a segment splicing sub-module and a waveform alignment and blank segment filling sub-module;
and the visualization module is used for visually displaying the synthesized audio.
Preferably, the data preparation module further comprises: each note setting comprises three parts of duration, note and pitch, 1/8 beats are selected in the shortest duration unit, the duration of the note is 1/8, 1/4, 3/8, 1/2, 3/4 and 1 beat, the range of the note is f 3-f #5, the data is really difficult to collect due to the human voice, vocaloid software is used for synthesizing audio to replace the human voice, the half-tone decreasing is represented by the half-tone decreasing, and all tones represented by black keys are uniformly represented by the half-tone increasing for accurate labeling.
Preferably, the training parameter module is further configured to:
the alignment condition of the check point and the current encoder and decoder can be generated every 1000 training steps, and the training can be recovered from the check point again after the training is interrupted in the midway;
the number # and Arabic numerals are replaced by English letters, the lengths of the bars are still fixed, if the pronunciation words are short, data with unfixed lengths are used, and the 'r' sound is directly added into the synthesized audio in the subsequent synthesis.
Preferably, the synthesized audio splicing module specifically includes:
the music score segmentation recognition sub-module is used for splitting an abc file, splitting the abc file into a group of 5 notes and a plurality of measures, splitting the abc file when the r symbol is met, using the rest notes as a measure, processing each measure by using a trained model to synthesize audio, and generating a wav file corresponding to each measure;
the segment splicing submodule is used for splicing the audio synthesized by segments;
the waveform alignment and blank section filling submodule is used for 'r' in an abc music score to represent an unvoiced blank section, when the audio is synthesized, the audio with blank duration is directly added to a part needing to be added to serve as a buffer zone, sox is used for synthesizing a measure, r sound is composed of two parts, namely pronunciation length and r, the corresponding 'r' audio is selected according to the requirements of the abc music score to be processed, and each section of spliced part is processed and output by using a very short r sound which can eliminate the noise and does not influence the duration feeling.
As another aspect, the present application further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method as described in the embodiments of the present application when executing the computer program.
As another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the foregoing device in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for realizing a logic function for a data signal, an asic having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), and the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A music score intelligent video-singing method based on voice synthesis comprises the following steps:
step one, data preparation, namely inputting and analyzing an abc music score to obtain pitch and duration information of each note in the specific abc music score;
step two, training parameters, namely generating notes with the length within 5 when training data are made, namely dividing all notes into a group of 5 notes when a complete abc music score is processed;
step three, synthesizing audio splicing, specifically comprising three substeps of music score segmentation identification, segment splicing, waveform alignment and blank segment filling;
and fourthly, visually displaying the synthesized audio.
2. The method of claim 1, wherein the data preparation step further comprises: each note setting comprises three parts of duration, note and pitch, 1/8 beats are selected in the unit with the shortest duration, the duration of the note is 1/8, 1/4, 3/8, 1/2, 3/4 and 1 beat, the range of the note is f 3-f #5, the data is that because the human voice data is difficult to collect, audio is synthesized by using vocaloloid software to replace human voice, the half-tone decreasing is represented by the half-tone increasing of the half-tone decreasing, and in order to mark accurately, the tones represented by all black keys are uniformly represented by the half-tone increasing.
3. The method of claim 1, wherein the training process further comprises:
the alignment condition of the check point and the current encoder and decoder can be generated every 1000 training steps, and the training can be recovered from the check point again after the training is interrupted in the midway;
the number # and Arabic numerals are replaced by English letters, the lengths of the bars are still fixed, if the pronunciation words are short, data with unfixed lengths are used, and the 'r' sound is directly added into the synthesized audio in the subsequent synthesis.
4. The method of claim 1, wherein the synthetic audio concatenation specifically comprises:
performing segmentation recognition on a music score, namely splitting an abc file, taking 5 notes as a group, splitting the abc file into a plurality of measures, splitting the abc file when the r symbol is met, taking the rest notes as a measure, processing each measure by using a trained model to synthesize audio, and generating a wav file corresponding to each measure;
segment splicing, namely splicing the audio synthesized by the segments;
the waveform alignment and the blank section filling are carried out, r' in the abc music score represents an unvoiced blank sound section, when the audio is synthesized, the audio with blank duration is directly added to the part needing to be added to serve as a buffer zone, sox is used for synthesizing the measure, r sound is composed of two parts, namely pronunciation length and r, the corresponding r audio is selected according to the requirements of the abc music score for processing, and each section of splicing part is processed and output by using a very short r sound which can eliminate the noise and does not influence the duration feeling.
5. A music score intelligent video-song system based on speech synthesis, the system comprising:
the data preparation module is used for inputting and analyzing the abc music score to obtain the pitch and duration information of each note in the specific abc music score;
the training parameter module generates notes with the length within 5 when training data are produced, namely, all notes of a complete abc music score are divided into a group of 5 notes when the complete abc music score is processed;
the synthesis audio splicing module comprises a music score segmentation identification sub-module, a segment splicing sub-module and a waveform alignment and blank segment filling sub-module;
and the visualization module is used for visually displaying the synthesized audio.
6. The system of claim 5, wherein the data preparation module further comprises: each note setting comprises three parts of duration, note and pitch, 1/8 beats are selected in the unit with the shortest duration, the duration of the note is 1/8, 1/4, 3/8, 1/2, 3/4 and 1 beat, the range of the note is f 3-f #5, the data is that because the human voice data is difficult to collect, audio is synthesized by using vocaloloid software to replace human voice, the half-tone decreasing is represented by the half-tone increasing of the half-tone decreasing, and all the tones represented by the black keys are uniformly represented by the half-tone increasing in order to mark accurately.
7. The system of claim 5, wherein the training parameter module is further configured to:
the alignment condition of the check point and the current encoder and decoder can be generated every 1000 training steps, and the training can be recovered from the check point again after the training is interrupted in the midway;
the number # and Arabic numerals are replaced by English letters, the lengths of the bars are still fixed, if the pronunciation words are short, data with unfixed lengths are used, and the 'r' sound is directly added into the synthesized audio in the subsequent synthesis.
8. The system of claim 5, wherein the synthesized audio splicing module specifically comprises:
the music score segmentation recognition sub-module is used for splitting an abc file, splitting the abc file into a group of 5 notes and a plurality of measures, splitting the abc file when the r symbol is met, using the rest notes as a measure, processing each measure by using a trained model to synthesize audio, and generating a wav file corresponding to each measure;
the segment splicing submodule is used for splicing the audio synthesized by segments;
the waveform alignment and blank section filling submodule is used for 'r' in an abc music score to represent an unvoiced blank section, when the audio is synthesized, the audio with blank duration is directly added to a part needing to be added to serve as a buffer zone, sox is used for synthesizing a measure, r sound is composed of two parts, namely pronunciation length and r, the corresponding 'r' audio is selected according to the requirements of the abc music score to be processed, and each section of spliced part is processed and output by using a very short r sound which can eliminate the noise and does not influence the duration feeling.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium having stored thereon a computer program for: the computer program, when executed by a processor, implements the method of any one of claims 1-4.
CN202010590726.0A 2020-06-24 2020-06-24 Music score intelligent video-singing method and system based on voice synthesis Active CN111816157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010590726.0A CN111816157B (en) 2020-06-24 2020-06-24 Music score intelligent video-singing method and system based on voice synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010590726.0A CN111816157B (en) 2020-06-24 2020-06-24 Music score intelligent video-singing method and system based on voice synthesis

Publications (2)

Publication Number Publication Date
CN111816157A true CN111816157A (en) 2020-10-23
CN111816157B CN111816157B (en) 2023-01-31

Family

ID=72854997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010590726.0A Active CN111816157B (en) 2020-06-24 2020-06-24 Music score intelligent video-singing method and system based on voice synthesis

Country Status (1)

Country Link
CN (1) CN111816157B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816148A (en) * 2020-06-24 2020-10-23 厦门大学 Virtual human voice and video singing method and system based on generation countermeasure network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2101314U (en) * 1991-05-09 1992-04-08 北京电子专科学校 Numbered musical notation recorded automatically playing music apparatus
CN103902647A (en) * 2013-12-27 2014-07-02 上海斐讯数据通信技术有限公司 Music score identifying method used on intelligent equipment and intelligent equipment
CN104978884A (en) * 2015-07-18 2015-10-14 呼和浩特职业学院 Teaching system of preschool education profession student music theory and solfeggio learning
US20180308382A1 (en) * 2015-10-25 2018-10-25 Morel KOREN A system and method for computer-assisted instruction of a music language
CN110148394A (en) * 2019-04-26 2019-08-20 平安科技(深圳)有限公司 Song synthetic method, device, computer equipment and storage medium
JP2019219569A (en) * 2018-06-21 2019-12-26 カシオ計算機株式会社 Electronic music instrument, control method of electronic music instrument, and program
CN110738980A (en) * 2019-09-16 2020-01-31 平安科技(深圳)有限公司 Singing voice synthesis model training method and system and singing voice synthesis method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2101314U (en) * 1991-05-09 1992-04-08 北京电子专科学校 Numbered musical notation recorded automatically playing music apparatus
CN103902647A (en) * 2013-12-27 2014-07-02 上海斐讯数据通信技术有限公司 Music score identifying method used on intelligent equipment and intelligent equipment
CN104978884A (en) * 2015-07-18 2015-10-14 呼和浩特职业学院 Teaching system of preschool education profession student music theory and solfeggio learning
US20180308382A1 (en) * 2015-10-25 2018-10-25 Morel KOREN A system and method for computer-assisted instruction of a music language
JP2019219569A (en) * 2018-06-21 2019-12-26 カシオ計算機株式会社 Electronic music instrument, control method of electronic music instrument, and program
CN110148394A (en) * 2019-04-26 2019-08-20 平安科技(深圳)有限公司 Song synthetic method, device, computer equipment and storage medium
CN110738980A (en) * 2019-09-16 2020-01-31 平安科技(深圳)有限公司 Singing voice synthesis model training method and system and singing voice synthesis method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816148A (en) * 2020-06-24 2020-10-23 厦门大学 Virtual human voice and video singing method and system based on generation countermeasure network
CN111816148B (en) * 2020-06-24 2023-04-07 厦门大学 Virtual human voice and video singing method and system based on generation countermeasure network

Also Published As

Publication number Publication date
CN111816157B (en) 2023-01-31

Similar Documents

Publication Publication Date Title
US10347238B2 (en) Text-based insertion and replacement in audio narration
RU2690863C1 (en) System and method for computerized teaching of a musical language
CN102360543B (en) HMM-based bilingual (mandarin-english) TTS techniques
US6865533B2 (en) Text to speech
Svantesson et al. Tone production, tone perception and Kammu tonogenesis
US9196240B2 (en) Automated text to speech voice development
EP2337006A1 (en) Speech processing and learning
Munson et al. The phonetics of sex and gender
KR20190041105A (en) Learning system and method using sentence input and voice input of the learner
Heyne et al. Native language influence on brass instrument performance: An application of generalized additive mixed models (GAMMs) to midsagittal ultrasound images of the tongue
Yang et al. The perception of Mandarin Chinese tones and intonation by American learners
CN111816157B (en) Music score intelligent video-singing method and system based on voice synthesis
Zhang Current trends in research of Chinese sound acquisition
KR20070103095A (en) System for studying english using bandwidth of frequency and method using thereof
Moosmüller Vowels in Standard Austrian German
Ferris Techniques and challenges in speech synthesis
Pyshkin et al. Multimodal modeling of the mora-timed rhythm of Japanese and its application to computer-assisted pronunciation training
Murphy Controlling the voice quality dimension of prosody in synthetic speech using an acoustic glottal model
Sparvoli From phonological studies to teaching Mandarin tone: Some perspectives on the revision of the tonal inventory
TWI806703B (en) Auxiliary method and system for voice correction
KR102610871B1 (en) Speech Training System For Hearing Impaired Person
CN114758560B (en) Humming pitch evaluation method based on dynamic time warping
Hill et al. Unrestricted text-to-speech revisited: rhythm and intonation.
Hussein et al. Towards a computer-aided pronunciation training system for German learners of Mandarin-prosodic analysis
Gregová Comparative Phonetics and Phonology of the English and the Slovak Language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant