WO2014079258A1 - Voice recognition based on phonetic symbols - Google Patents

Voice recognition based on phonetic symbols Download PDF

Info

Publication number
WO2014079258A1
WO2014079258A1 PCT/CN2013/082905 CN2013082905W WO2014079258A1 WO 2014079258 A1 WO2014079258 A1 WO 2014079258A1 CN 2013082905 W CN2013082905 W CN 2013082905W WO 2014079258 A1 WO2014079258 A1 WO 2014079258A1
Authority
WO
WIPO (PCT)
Prior art keywords
phonetic
voice
singing
sound
performance
Prior art date
Application number
PCT/CN2013/082905
Other languages
French (fr)
Chinese (zh)
Inventor
高剑青
Original Assignee
Gao Jianqing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gao Jianqing filed Critical Gao Jianqing
Publication of WO2014079258A1 publication Critical patent/WO2014079258A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • speech recognition recognizes speech as a specific text, but due to inaccurate pronunciation, accent, and homophone, there is still a certain difference between existing text and pronunciation, resulting in inaccurate speech recognition.
  • 1, 2, 3, 4 respectively represent the four tones of Chinese Pinyin, Yinping, Yangping, Shangsheng, Desheng; a) Chinese-Good morning; Phonetic-Zao3Shang4Hao3; Phonetic symbols with errors-Zhao2San1Hao4; b) Pingtan dialect - Good morning; phonetic symbols - Za3Xuan4Ho3; there is a mistake in the phonetic symbols - ZhanXian4Hou3; c) English - Good Morning; phonetic - GuMoning; there is a mistake in the phonetic - GudMolin.
  • Phonetic recognition can also be applied to other voice recognitions with rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps).
  • rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps).
  • sign language the sign language is more difficult for the average person to learn, and then the recognized sound content is marked as the speaker's vocal position like a comic, which makes the sound more visualized.
  • Can train deaf-mute people speaking ability but lose their ability to speak because they have not heard the sound).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

A voice recognition based on phonetic symbols that is: voices having rhythms (such as: speeches, singing, performances, and ambient sounds) are recognized directly into phonetic symbols (scores), and, by utilizing an automatic correction function provided when a person is reading, the voice recognition is allowed to provide a high accuracy recognition, extensive adaptability, and fault tolerance.

Description

基于音标的声音识别  Phonetic recognition based on phonetic symbols 技术领域Technical field
计算机、语音识别、音乐、声学。Computer, speech recognition, music, acoustics.
背景技术Background technique
目前语音识别都是将语音识别为具体的文字,但是由于发音不准确、口音、同音字的存在,现有文字与发音还是有一定的差别,造成语音识别不准确。At present, speech recognition recognizes speech as a specific text, but due to inaccurate pronunciation, accent, and homophone, there is still a certain difference between existing text and pronunciation, resulting in inaccurate speech recognition.
发明内容Summary of the invention
直接将语音识别为音标文字,音标文字完全等于发音,因此识别准确率高,即使由于由于发音不准确,类似与平时使用的存在差错的文字(火星文字),读取的人可以自动修正获得正确的意思,甚至会带有新意(如:文字-什么=神马)。采用音标还可以实现方言、任意语言的识别。举例:1、2、3、4分别表示汉语拼音的4个声调阴平、阳平、上声、去声;a)中文-早上好;音标-Zao3Shang4Hao3;存在差错的音标-Zhao2San1Hao4;b)平潭方言-早上好;音标-Za3Xuan4Ho3;存在差错的音标-ZhanXian4Hou3;c)英文-Good Morning;音标-GuMoning;存在差错的音标-GudMolin。音标识别还可以应用于具有节律的其它声音识别,如:音乐(乐谱/律)、乐器、环境声音(敲门声、脚步声)。除了可以应用于普通语音录入,还可以应用于取代手语(手语一般人学习难度比较大,再像漫画那样把识别的声音内容-标在说话人发声位置,会使声音更加视觉化形象化)、还可以训练聋哑人(有说话能力但由于没有听过声音无法学习而丧失说话能力)通过音标发音识别的训练而学习说话、(语言、歌唱-乐谱与歌词音标结合、演奏)训练和评价评分(比较与正确音标接近程度)、声音的压缩、重现(语音合成)、变音(语音合成)、将具有节律的声音(如:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱),然后用于其它用途(如:对任意语言进行音标识别用于的语音控制相应的设备执行一定的指令和动作)。 The speech is directly recognized as the phonetic transcription, and the phonetic transcription is completely equal to the pronunciation. Therefore, the recognition accuracy is high. Even if the speech is inaccurate due to inaccurate pronunciation, and the text (Martian text) which is used in the usual way, the reader can automatically correct it correctly. The meaning is even new (eg: text - what = god horse). Dialects can also be used to identify dialects and arbitrary languages. For example: 1, 2, 3, 4 respectively represent the four tones of Chinese Pinyin, Yinping, Yangping, Shangsheng, Desheng; a) Chinese-Good morning; Phonetic-Zao3Shang4Hao3; Phonetic symbols with errors-Zhao2San1Hao4; b) Pingtan dialect - Good morning; phonetic symbols - Za3Xuan4Ho3; there is a mistake in the phonetic symbols - ZhanXian4Hou3; c) English - Good Morning; phonetic - GuMoning; there is a mistake in the phonetic - GudMolin. Phonetic recognition can also be applied to other voice recognitions with rhythms such as music (music/law), musical instruments, ambient sounds (knocking, footsteps). In addition to being applicable to ordinary voice recording, it can also be applied to replace sign language (the sign language is more difficult for the average person to learn, and then the recognized sound content is marked as the speaker's vocal position like a comic, which makes the sound more visualized). Can train deaf-mute people (speaking ability but lose their ability to speak because they have not heard the sound). Learn to speak, (language, singing-music and lyric phonetic combination, performance) training and evaluation scores through the training of phonetic pronunciation recognition ( Compare the sound to the correct phonetic), compress the sound, reproduce (speech synthesis), change the sound (speech synthesis), and directly recognize the rhythmic sound (such as: voice, singing, performance, ambient sound) into the phonetic (score) And then used for other purposes (such as: the voice control used for any language to control the corresponding device to perform certain instructions and actions).

Claims (8)

  1. 基于音标的声音识别,其特征是:将具有节律的声音(如:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱),并且利用人在读取时具有的自动补正功能,使其具有高准确度识别、广泛适应性、容错能力。 A phonetic-based voice recognition, which is characterized in that a rhythmic sound (such as: voice, singing, performance, ambient sound) is directly recognized as a phonetic (music), and the automatic correction function that the person has when reading is used to make it With high accuracy identification, wide adaptability and fault tolerance.
  2. 包括:将具有节律的声音(包括但不限于:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱) 直接作为输入内容,包括但不限于乐谱、语音识别输入。 Includes: Identifying rhythmic sounds (including but not limited to: voice, singing, performance, ambient sound) directly into phonetic symbols (music) Directly as input, including but not limited to music scores, voice recognition input.
  3. 包括:利用人在读取是具有的自动补正功能,使存在差错的音标也能被人识别的容错能力。Including: the use of the automatic correction function that the person has in reading, so that the faulty phonetic symbol can also be recognized by the fault tolerance.
  4. 包括:用字母和数字(1、2、3、4表示阴平、阳平、上声、去声)进行汉语拼音的表示和录入。Including: using letters and numbers (1, 2, 3, 4 for Yinping, Yangping, Shangsheng, Desheng) to represent and input Chinese Pinyin.
  5. 包括:通过直接识别出有节律声音(如:语音、歌唱、演奏)成音标(乐谱)(多种音标组合-歌唱-乐谱音标与歌词音标组合)进行训练和评分评价(比较与正确音标接近程度)。Including: by directly recognizing rhythmic sounds (such as: voice, singing, performance) into phonetic symbols (music) (multiple phonetic combinations - singing - music phonetic symbols and lyrics phonetic combination) for training and scoring evaluation (compared with the correct phonetic scale ).
  6. 包括:通过将有节律的声音(如:语音、歌唱、演奏、环境声音)识别成音标(乐谱),然后再通过音标播放(语音合成),实现声音的压缩、重现、变音。Including: by recognizing a rhythmic sound (such as: voice, singing, performance, ambient sound) into a phonetic (score), and then through the phonetic transcription (speech synthesis), the sound is compressed, reproduced, and changed.
  7. 包括: 通过将识别出来的声音(如:语音、歌唱、演奏、环境声音)标注在发声位置,使声音更加视觉化和形象性。include: By distinguishing the recognized sounds (such as: voice, singing, performance, ambient sound) in the vocalization position, the sound is more visual and visual.
  8. 包括:将具有节律的声音(如:语音、歌唱、演奏、环境声音)直接识别成音标(乐谱),然后用于其它用途(包括但不限于:对任意语言进行音标识别用于的语音控制相应的设备执行一定的指令和动作)。Including: directly recognizing rhythmic sounds (such as: voice, singing, performance, ambient sound) into phonetic symbols (scores), and then used for other purposes (including but not limited to: voice control for phonetic recognition of any language) The device performs certain instructions and actions).
PCT/CN2013/082905 2012-11-20 2013-09-04 Voice recognition based on phonetic symbols WO2014079258A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201210469832 2012-11-20
CN201210469832.9 2012-11-20
CN201310000418 2013-01-04
CN201310000418.8 2013-01-04

Publications (1)

Publication Number Publication Date
WO2014079258A1 true WO2014079258A1 (en) 2014-05-30

Family

ID=50775486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/082905 WO2014079258A1 (en) 2012-11-20 2013-09-04 Voice recognition based on phonetic symbols

Country Status (1)

Country Link
WO (1) WO2014079258A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127397B2 (en) * 2001-05-31 2006-10-24 Qwest Communications International Inc. Method of training a computer system via human voice input
CN101257354A (en) * 2008-04-15 2008-09-03 哈尔滨工程大学 Underwater voice communication method of low code rate
CN101286317A (en) * 2008-05-30 2008-10-15 同济大学 Speech recognition device, model training method and traffic information service platform
CN102142247A (en) * 2011-03-30 2011-08-03 东南大学 Multifunctional electronic score
CN102208186A (en) * 2011-05-16 2011-10-05 南宁向明信息科技有限责任公司 Chinese phonetic recognition method
CN102568456A (en) * 2011-12-23 2012-07-11 深圳市万兴软件有限公司 Notation recording method and a notation recording device based on humming input
CN103065621A (en) * 2012-11-20 2013-04-24 高剑青 Voice recognition based on phonetic symbols

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127397B2 (en) * 2001-05-31 2006-10-24 Qwest Communications International Inc. Method of training a computer system via human voice input
CN101257354A (en) * 2008-04-15 2008-09-03 哈尔滨工程大学 Underwater voice communication method of low code rate
CN101286317A (en) * 2008-05-30 2008-10-15 同济大学 Speech recognition device, model training method and traffic information service platform
CN102142247A (en) * 2011-03-30 2011-08-03 东南大学 Multifunctional electronic score
CN102208186A (en) * 2011-05-16 2011-10-05 南宁向明信息科技有限责任公司 Chinese phonetic recognition method
CN102568456A (en) * 2011-12-23 2012-07-11 深圳市万兴软件有限公司 Notation recording method and a notation recording device based on humming input
CN103065621A (en) * 2012-11-20 2013-04-24 高剑青 Voice recognition based on phonetic symbols

Similar Documents

Publication Publication Date Title
Witt Automatic error detection in pronunciation training: Where we are and where we need to go
US6963841B2 (en) Speech training method with alternative proper pronunciation database
US7280964B2 (en) Method of recognizing spoken language with recognition of language color
RU2690863C1 (en) System and method for computerized teaching of a musical language
CN1938756A (en) Prosodic speech text codes and their use in computerized speech systems
KR19990044575A (en) Interactive language training apparatus
Daniels et al. The suitability of cloud-based speech recognition engines for language learning.
US20160321953A1 (en) Pronunciation learning support system utilizing three-dimensional multimedia and pronunciation learning support method thereof
WO2012033547A1 (en) System and method for teaching non-lexical speech effects
Ahsiah et al. Tajweed checking system to support recitation
KR20160122542A (en) Method and apparatus for measuring pronounciation similarity
KR20140071070A (en) Method and apparatus for learning pronunciation of foreign language using phonetic symbol
Liang Chinese learners' pronunciation problems and listening difficulties in English connected speech
JP7166580B2 (en) language learning methods
Liao et al. A prototype of an adaptive Chinese pronunciation training system
TWI574254B (en) Speech synthesis method and apparatus for electronic system
Jang Speech rhythm metrics for automatic scoring of English speech by Korean EFL learners
Hämäläinen et al. A multimodal educational game for 3-10-year-old children: collecting and automatically recognising european portuguese children’s speech
KR20160001332A (en) English connected speech learning system and method thereof
Kirkham et al. Comparison of Vocalists and Instrumentalists on Lexical Tone Perception and Production Tasks.
CN111508522A (en) Statement analysis processing method and system
Gupta et al. Towards Automatic Mispronunciation Detection in Singing.
CN103065621A (en) Voice recognition based on phonetic symbols
CN112951208B (en) Method and device for speech recognition
Kim et al. Automatic assessment of American English lexical stress using machine learning algorithms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13856407

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13856407

Country of ref document: EP

Kind code of ref document: A1