WO2014079258A1 - Voice recognition based on phonetic symbols - Google Patents
Voice recognition based on phonetic symbols Download PDFInfo
- Publication number
- WO2014079258A1 WO2014079258A1 PCT/CN2013/082905 CN2013082905W WO2014079258A1 WO 2014079258 A1 WO2014079258 A1 WO 2014079258A1 CN 2013082905 W CN2013082905 W CN 2013082905W WO 2014079258 A1 WO2014079258 A1 WO 2014079258A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- phonetic
- voice
- singing
- sound
- performance
- Prior art date
Links
- 230000001020 rhythmical effect Effects 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 238000013518 transcription Methods 0.000 claims description 3
- 230000035897 transcription Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 claims 2
- 230000033764 rhythmic process Effects 0.000 abstract description 2
- 206010011878 Deafness Diseases 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Definitions
- speech recognition recognizes speech as a specific text, but due to inaccurate pronunciation, accent, and homophone, there is still a certain difference between existing text and pronunciation, resulting in inaccurate speech recognition.
- 1, 2, 3, 4 respectively represent the four tones of Chinese Pinyin, Yinping, Yangping, Shangsheng, Desheng; a) Chinese-Good morning; Phonetic-Zao3Shang4Hao3; Phonetic symbols with errors-Zhao2San1Hao4; b) Pingtan dialect - Good morning; phonetic symbols - Za3Xuan4Ho3; there is a mistake in the phonetic symbols - ZhanXian4Hou3; c) English - Good Morning; phonetic - GuMoning; there is a mistake in the phonetic - GudMolin.
- Phonetic recognition can also be applied to other voice recognitions with rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps).
- rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps).
- sign language the sign language is more difficult for the average person to learn, and then the recognized sound content is marked as the speaker's vocal position like a comic, which makes the sound more visualized.
- Can train deaf-mute people speaking ability but lose their ability to speak because they have not heard the sound).
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
A voice recognition based on phonetic symbols that is: voices having rhythms (such as: speeches, singing, performances, and ambient sounds) are recognized directly into phonetic symbols (scores), and, by utilizing an automatic correction function provided when a person is reading, the voice recognition is allowed to provide a high accuracy recognition, extensive adaptability, and fault tolerance.
Description
计算机、语音识别、音乐、声学。Computer, speech recognition, music, acoustics.
目前语音识别都是将语音识别为具体的文字,但是由于发音不准确、口音、同音字的存在,现有文字与发音还是有一定的差别,造成语音识别不准确。At present, speech recognition recognizes speech as a specific text, but due to inaccurate pronunciation, accent, and homophone, there is still a certain difference between existing text and pronunciation, resulting in inaccurate speech recognition.
直接将语音识别为音标文字,音标文字完全等于发音,因此识别准确率高,即使由于由于发音不准确,类似与平时使用的存在差错的文字(火星文字),读取的人可以自动修正获得正确的意思,甚至会带有新意(如:文字-什么=神马)。采用音标还可以实现方言、任意语言的识别。举例:1、2、3、4分别表示汉语拼音的4个声调阴平、阳平、上声、去声;a)中文-早上好;音标-Zao3Shang4Hao3;存在差错的音标-Zhao2San1Hao4;b)平潭方言-早上好;音标-Za3Xuan4Ho3;存在差错的音标-ZhanXian4Hou3;c)英文-Good
Morning;音标-GuMoning;存在差错的音标-GudMolin。音标识别还可以应用于具有节律的其它声音识别,如:音乐(乐谱/律)、乐器、环境声音(敲门声、脚步声)。除了可以应用于普通语音录入,还可以应用于取代手语(手语一般人学习难度比较大,再像漫画那样把识别的声音内容-标在说话人发声位置,会使声音更加视觉化形象化)、还可以训练聋哑人(有说话能力但由于没有听过声音无法学习而丧失说话能力)通过音标发音识别的训练而学习说话、(语言、歌唱-乐谱与歌词音标结合、演奏)训练和评价评分(比较与正确音标接近程度)、声音的压缩、重现(语音合成)、变音(语音合成)、将具有节律的声音(如:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱),然后用于其它用途(如:对任意语言进行音标识别用于的语音控制相应的设备执行一定的指令和动作)。
The speech is directly recognized as the phonetic transcription, and the phonetic transcription is completely equal to the pronunciation. Therefore, the recognition accuracy is high. Even if the speech is inaccurate due to inaccurate pronunciation, and the text (Martian text) which is used in the usual way, the reader can automatically correct it correctly. The meaning is even new (eg: text - what = god horse). Dialects can also be used to identify dialects and arbitrary languages. For example: 1, 2, 3, 4 respectively represent the four tones of Chinese Pinyin, Yinping, Yangping, Shangsheng, Desheng; a) Chinese-Good morning; Phonetic-Zao3Shang4Hao3; Phonetic symbols with errors-Zhao2San1Hao4; b) Pingtan dialect - Good morning; phonetic symbols - Za3Xuan4Ho3; there is a mistake in the phonetic symbols - ZhanXian4Hou3; c) English - Good
Morning; phonetic - GuMoning; there is a mistake in the phonetic - GudMolin. Phonetic recognition can also be applied to other voice recognitions with rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps). In addition to being applicable to ordinary voice recording, it can also be applied to replace sign language (the sign language is more difficult for the average person to learn, and then the recognized sound content is marked as the speaker's vocal position like a comic, which makes the sound more visualized). Can train deaf-mute people (speaking ability but lose their ability to speak because they have not heard the sound). Learn to speak, (language, singing-music and lyric phonetic combination, performance) training and evaluation scores through the training of phonetic pronunciation recognition ( Compare the sound to the correct phonetic), compress the sound, reproduce (speech synthesis), change the sound (speech synthesis), and directly recognize the rhythmic sound (such as: voice, singing, performance, ambient sound) into the phonetic (score) And then used for other purposes (such as: the voice control used for any language to control the corresponding device to perform certain instructions and actions).
Claims (8)
- 基于音标的声音识别,其特征是:将具有节律的声音(如:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱),并且利用人在读取时具有的自动补正功能,使其具有高准确度识别、广泛适应性、容错能力。 A phonetic-based voice recognition, which is characterized in that a rhythmic sound (such as: voice, singing, performance, ambient sound) is directly recognized as a phonetic (music), and the automatic correction function that the person has when reading is used to make it With high accuracy identification, wide adaptability and fault tolerance.
- 包括:将具有节律的声音(包括但不限于:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱) 直接作为输入内容,包括但不限于乐谱、语音识别输入。 Includes: Identifying rhythmic sounds (including but not limited to: voice, singing, performance, ambient sound) directly into phonetic symbols (music) Directly as input, including but not limited to music scores, voice recognition input.
- 包括:利用人在读取是具有的自动补正功能,使存在差错的音标也能被人识别的容错能力。Including: the use of the automatic correction function that the person has in reading, so that the faulty phonetic symbol can also be recognized by the fault tolerance.
- 包括:用字母和数字(1、2、3、4表示阴平、阳平、上声、去声)进行汉语拼音的表示和录入。Including: using letters and numbers (1, 2, 3, 4 for Yinping, Yangping, Shangsheng, Desheng) to represent and input Chinese Pinyin.
- 包括:通过直接识别出有节律声音(如:语音、歌唱、演奏)成音标(乐谱)(多种音标组合-歌唱-乐谱音标与歌词音标组合)进行训练和评分评价(比较与正确音标接近程度)。Including: by directly recognizing rhythmic sounds (such as: voice, singing, performance) into phonetic symbols (music) (multiple phonetic combinations - singing - music phonetic symbols and lyrics phonetic combination) for training and scoring evaluation (compared with the correct phonetic scale ).
- 包括:通过将有节律的声音(如:语音、歌唱、演奏、环境声音)识别成音标(乐谱),然后再通过音标播放(语音合成),实现声音的压缩、重现、变音。Including: by recognizing a rhythmic sound (such as: voice, singing, performance, ambient sound) into a phonetic (score), and then through the phonetic transcription (speech synthesis), the sound is compressed, reproduced, and changed.
- 包括: 通过将识别出来的声音(如:语音、歌唱、演奏、环境声音)标注在发声位置,使声音更加视觉化和形象性。include: By distinguishing the recognized sounds (such as: voice, singing, performance, ambient sound) in the vocalization position, the sound is more visual and visual.
- 包括:将具有节律的声音(如:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱),然后用于其它用途(包括但不限于:对任意语言进行音标识别用于的语音控制相应的设备执行一定的指令和动作)。Including: directly recognizing rhythmic sounds (such as: voice, singing, performance, ambient sound) into phonetic symbols (scores), and then used for other purposes (including but not limited to: voice control for phonetic recognition of any language) The device performs certain instructions and actions).
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210469832 | 2012-11-20 | ||
CN201210469832.9 | 2012-11-20 | ||
CN201310000418 | 2013-01-04 | ||
CN201310000418.8 | 2013-01-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014079258A1 true WO2014079258A1 (en) | 2014-05-30 |
Family
ID=50775486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/082905 WO2014079258A1 (en) | 2012-11-20 | 2013-09-04 | Voice recognition based on phonetic symbols |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014079258A1 (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7127397B2 (en) * | 2001-05-31 | 2006-10-24 | Qwest Communications International Inc. | Method of training a computer system via human voice input |
CN101257354A (en) * | 2008-04-15 | 2008-09-03 | 哈尔滨工程大学 | Underwater voice communication method of low code rate |
CN101286317A (en) * | 2008-05-30 | 2008-10-15 | 同济大学 | Speech recognition device, model training method and traffic information service platform |
CN102142247A (en) * | 2011-03-30 | 2011-08-03 | 东南大学 | Multifunctional electronic score |
CN102208186A (en) * | 2011-05-16 | 2011-10-05 | 南宁向明信息科技有限责任公司 | Chinese phonetic recognition method |
CN102568456A (en) * | 2011-12-23 | 2012-07-11 | 深圳市万兴软件有限公司 | Notation recording method and a notation recording device based on humming input |
CN103065621A (en) * | 2012-11-20 | 2013-04-24 | 高剑青 | Voice recognition based on phonetic symbols |
-
2013
- 2013-09-04 WO PCT/CN2013/082905 patent/WO2014079258A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7127397B2 (en) * | 2001-05-31 | 2006-10-24 | Qwest Communications International Inc. | Method of training a computer system via human voice input |
CN101257354A (en) * | 2008-04-15 | 2008-09-03 | 哈尔滨工程大学 | Underwater voice communication method of low code rate |
CN101286317A (en) * | 2008-05-30 | 2008-10-15 | 同济大学 | Speech recognition device, model training method and traffic information service platform |
CN102142247A (en) * | 2011-03-30 | 2011-08-03 | 东南大学 | Multifunctional electronic score |
CN102208186A (en) * | 2011-05-16 | 2011-10-05 | 南宁向明信息科技有限责任公司 | Chinese phonetic recognition method |
CN102568456A (en) * | 2011-12-23 | 2012-07-11 | 深圳市万兴软件有限公司 | Notation recording method and a notation recording device based on humming input |
CN103065621A (en) * | 2012-11-20 | 2013-04-24 | 高剑青 | Voice recognition based on phonetic symbols |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Witt | Automatic error detection in pronunciation training: Where we are and where we need to go | |
US6963841B2 (en) | Speech training method with alternative proper pronunciation database | |
US7280964B2 (en) | Method of recognizing spoken language with recognition of language color | |
RU2690863C1 (en) | System and method for computerized teaching of a musical language | |
CN1938756A (en) | Prosodic speech text codes and their use in computerized speech systems | |
KR19990044575A (en) | Interactive language training apparatus | |
Daniels et al. | The suitability of cloud-based speech recognition engines for language learning. | |
US20160321953A1 (en) | Pronunciation learning support system utilizing three-dimensional multimedia and pronunciation learning support method thereof | |
WO2012033547A1 (en) | System and method for teaching non-lexical speech effects | |
Ahsiah et al. | Tajweed checking system to support recitation | |
KR20160122542A (en) | Method and apparatus for measuring pronounciation similarity | |
KR20140071070A (en) | Method and apparatus for learning pronunciation of foreign language using phonetic symbol | |
Liang | Chinese learners' pronunciation problems and listening difficulties in English connected speech | |
JP7166580B2 (en) | language learning methods | |
Liao et al. | A prototype of an adaptive Chinese pronunciation training system | |
TWI574254B (en) | Speech synthesis method and apparatus for electronic system | |
Jang | Speech rhythm metrics for automatic scoring of English speech by Korean EFL learners | |
Hämäläinen et al. | A multimodal educational game for 3-10-year-old children: collecting and automatically recognising european portuguese children’s speech | |
KR20160001332A (en) | English connected speech learning system and method thereof | |
Kirkham et al. | Comparison of Vocalists and Instrumentalists on Lexical Tone Perception and Production Tasks. | |
CN111508522A (en) | Statement analysis processing method and system | |
Gupta et al. | Towards Automatic Mispronunciation Detection in Singing. | |
CN103065621A (en) | Voice recognition based on phonetic symbols | |
CN112951208B (en) | Method and device for speech recognition | |
Kim et al. | Automatic assessment of American English lexical stress using machine learning algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13856407 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13856407 Country of ref document: EP Kind code of ref document: A1 |