WO2014079258A1

WO2014079258A1 - Voice recognition based on phonetic symbols

Info

Publication number: WO2014079258A1
Application number: PCT/CN2013/082905
Authority: WO
Inventors: 高剑青
Original assignee: Gao Jianqing
Priority date: 2012-11-20
Filing date: 2013-09-04
Publication date: 2014-05-30

Abstract

A voice recognition based on phonetic symbols that is: voices having rhythms (such as: speeches, singing, performances, and ambient sounds) are recognized directly into phonetic symbols (scores), and, by utilizing an automatic correction function provided when a person is reading, the voice recognition is allowed to provide a high accuracy recognition, extensive adaptability, and fault tolerance.

Description

Phonetic recognition based on phonetic symbols

Technical field

Computer, speech recognition, music, acoustics.

Background technique

At present, speech recognition recognizes speech as a specific text, but due to inaccurate pronunciation, accent, and homophone, there is still a certain difference between existing text and pronunciation, resulting in inaccurate speech recognition.

Summary of the invention

The speech is directly recognized as the phonetic transcription, and the phonetic transcription is completely equal to the pronunciation. Therefore, the recognition accuracy is high. Even if the speech is inaccurate due to inaccurate pronunciation, and the text (Martian text) which is used in the usual way, the reader can automatically correct it correctly. The meaning is even new (eg: text - what = god horse). Dialects can also be used to identify dialects and arbitrary languages. For example: 1, 2, 3, 4 respectively represent the four tones of Chinese Pinyin, Yinping, Yangping, Shangsheng, Desheng; a) Chinese-Good morning; Phonetic-Zao3Shang4Hao3; Phonetic symbols with errors-Zhao2San1Hao4; b) Pingtan dialect - Good morning; phonetic symbols - Za3Xuan4Ho3; there is a mistake in the phonetic symbols - ZhanXian4Hou3; c) English - Good Morning; phonetic - GuMoning; there is a mistake in the phonetic - GudMolin. Phonetic recognition can also be applied to other voice recognitions with rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps). In addition to being applicable to ordinary voice recording, it can also be applied to replace sign language (the sign language is more difficult for the average person to learn, and then the recognized sound content is marked as the speaker's vocal position like a comic, which makes the sound more visualized). Can train deaf-mute people (speaking ability but lose their ability to speak because they have not heard the sound). Learn to speak, (language, singing-music and lyric phonetic combination, performance) training and evaluation scores through the training of phonetic pronunciation recognition ( Compare the sound to the correct phonetic), compress the sound, reproduce (speech synthesis), change the sound (speech synthesis), and directly recognize the rhythmic sound (such as: voice, singing, performance, ambient sound) into the phonetic (score) And then used for other purposes (such as: the voice control used for any language to control the corresponding device to perform certain instructions and actions).

Claims

A phonetic-based voice recognition, which is characterized in that a rhythmic sound (such as: voice, singing, performance, ambient sound) is directly recognized as a phonetic (music), and the automatic correction function that the person has when reading is used to make it With high accuracy identification, wide adaptability and fault tolerance.
Includes: Identifying rhythmic sounds (including but not limited to: voice, singing, performance, ambient sound) directly into phonetic symbols (music) Directly as input, including but not limited to music scores, voice recognition input.
Including: the use of the automatic correction function that the person has in reading, so that the faulty phonetic symbol can also be recognized by the fault tolerance.
Including: using letters and numbers (1, 2, 3, 4 for Yinping, Yangping, Shangsheng, Desheng) to represent and input Chinese Pinyin.
Including: by directly recognizing rhythmic sounds (such as: voice, singing, performance) into phonetic symbols (music) (multiple phonetic combinations - singing - music phonetic symbols and lyrics phonetic combination) for training and scoring evaluation (compared with the correct phonetic scale ).
Including: by recognizing a rhythmic sound (such as: voice, singing, performance, ambient sound) into a phonetic (score), and then through the phonetic transcription (speech synthesis), the sound is compressed, reproduced, and changed.
include: By distinguishing the recognized sounds (such as: voice, singing, performance, ambient sound) in the vocalization position, the sound is more visual and visual.
Including: directly recognizing rhythmic sounds (such as: voice, singing, performance, ambient sound) into phonetic symbols (scores), and then used for other purposes (including but not limited to: voice control for phonetic recognition of any language) The device performs certain instructions and actions).