WO2021223232A1 - 一种基于Gaia AI语音控制的智能电视多语种识别系统 - Google Patents

一种基于Gaia AI语音控制的智能电视多语种识别系统 Download PDF

Info

Publication number
WO2021223232A1
WO2021223232A1 PCT/CN2020/089239 CN2020089239W WO2021223232A1 WO 2021223232 A1 WO2021223232 A1 WO 2021223232A1 CN 2020089239 W CN2020089239 W CN 2020089239W WO 2021223232 A1 WO2021223232 A1 WO 2021223232A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
control system
voice
recognition
storage module
Prior art date
Application number
PCT/CN2020/089239
Other languages
English (en)
French (fr)
Inventor
黄国桂
吴文弘
康许坤
Original Assignee
赣州市牧士电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 赣州市牧士电子有限公司 filed Critical 赣州市牧士电子有限公司
Priority to PCT/CN2020/089239 priority Critical patent/WO2021223232A1/zh
Priority to CN202010737633.6A priority patent/CN111800657B/zh
Publication of WO2021223232A1 publication Critical patent/WO2021223232A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/42222Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to the field of voice control, in particular to a smart TV multilingual recognition system based on Gaia AI voice control.
  • Chinese Patent Document Publication No. CN109817213A discloses a method for adaptive language speech recognition, which includes: extracting phoneme features representing pronunciation phoneme information based on the acquired speech data; and inputting the phoneme features to obtain training based on multilingual corpus in advance
  • the language discrimination model is used to obtain the language discrimination result of the speech data; and according to the language discrimination result, the speech recognition result of the speech data is obtained based on the language acoustic model of the corresponding language.
  • the recognition speed of existing voice-controlled TVs is relatively slow, and the recognition speed and recognition accuracy of recognition devices that support multiple languages are relatively low, which affects customer experience.
  • the technical problem to be solved by the present invention is to propose a smart TV multilingual recognition system based on Gaia AI voice control, which preferentially selects the optimal language and keyword proofreading recognition, so that different languages can be quickly recognized Accurate identification.
  • the present invention adopts the following technical solutions:
  • the present invention provides a smart TV multilingual recognition system based on Gaia AI voice control, which includes a remote control for receiving voice signals and a control system for voice signal recognition and processing.
  • the control system is provided with a first language storage module, The control system preferentially compares the languages in the first language storage module; the control system extracts the information of the TV interface during voice input to preferentially compare the keywords with the highest probability of being used in the interface. Choose the most probable language, and choose the most probable keywords according to the interface situation, so that you can quickly and accurately proofread and recognize your voice.
  • the preferred technical solution of the present invention is that the remote control receives a specific activation language and transmits it to the control system, and the control system compares the activation language with the supported languages to store the recognized language in the first language storage Module. It is convenient to accurately determine the language used by the user, so as to facilitate rapid identification.
  • the preferred technical solution of the present invention is that, when in use, when the control system recognizes a language that is different from the first language storage module, the control system replaces the language in the first language storage module with the New language.
  • the control system when the user switches the control language, the language can also be determined quickly, and the language can be identified more quickly and accurately when the language is subsequently used.
  • control system is provided with a standard language library of several languages, and after the control system receives a voice command, the standard language library is preferentially based on the storage module of the first language. Compare and identify languages.
  • the TV supports multiple languages, but the language in the first language storage module is preferentially called when in use, which can improve the efficiency of speech recognition.
  • control system is also provided with a modified language library, and the control system controls the TV according to the comparison with the standard language library, and after confirming that the recognition is correct, the operation instruction and receiving The received voice commands are stored in the modified language library, and the control system prioritizes comparison with the modified language library during voice recognition. This makes it possible to adapt to the problem of accurate recognition caused by differences in accents of users in various places.
  • the preferred technical solution of the present invention is that after the control system operates the TV, if the user does not perform a return operation for 5 seconds, the operation is considered to be correct and effective to determine that the speech recognition is correct to complete the storage of the modified language library . Avoid storing the wrongly recognized voice commands, so as to avoid subsequent misoperations.
  • control system extracts TV interface information to divide the standard language library into several language layers according to the interface, and the control system recognizes the interface where the TV is located, and preferentially selects the language layer corresponding to the interface. Perform voice recognition comparisons. According to the interface information to proofread according to the optimal keywords, thereby improving the efficiency of speech recognition.
  • the preferred technical solution of the present invention is that when the interface information extracted by the control system uses the language in the first language storage module to compare with the corresponding language layer in the standard language library, when the appropriate command is not compared, Priority is given to comparisons in the language layer corresponding to the interface of other languages. Further improve the efficiency of speech recognition.
  • the present invention provides a smart TV multilingual recognition system based on Gaia AI voice control, which includes a remote control for receiving voice signals and a control system for voice signal recognition and processing.
  • the control system is provided with a first language storage module, The control system preferentially compares the languages in the first language storage module; the control system extracts the information of the TV interface during voice input to preferentially compare the keywords with the highest probability of being used in the interface. Directly select the language library in the first language storage module for identification and proofreading, and at the same time proofread related keywords according to the interface information, so as to improve the speed and accuracy of speech recognition.
  • FIG. 1 is a schematic diagram of the principle of a smart TV multilingual recognition system based on Gaia AI voice control provided in a specific embodiment of the present invention
  • the control system 2 when using the voice smart TV for the first time, you only need to press the voice input button of the remote control 1, and the activation language can be checked and recognized by the control system 2 in a number of standard language libraries 22.
  • the activation language can be used Uncommon words such as "voice wizard" and "voice assistant". In this way, it is determined which language the user uses, and the language information is stored in the first language storage module 21.
  • the user’s voice is transmitted to the control system 2 for voice signal recognition and processing through the remote control 1 for receiving voice signals.
  • the control system 2 is provided with a first language storage module 21, and the control system 2 preferentially follows The language comparison in the first language storage module 21; the control system 2 is provided with a standard language library 22 of several languages. After the control system 2 receives the voice command, the standard language library 22 is given priority according to the first language storage module 21 To compare and identify the languages.
  • the control system 2 will preferentially process the voice according to the English voice after receiving the voice instruction, so as to collate and recognize the voice and the English standard language database 22. This can speed up the recognition of voice commands.
  • the control system 2 recognizes a language that is different from the first language storage module 21, the control system 2 replaces the language in the first language storage module 21 with the new language.
  • the control system 2 can also be determined quickly, and the language can be identified more quickly and accurately when the language is subsequently used.
  • a guest comes to a foreign country at home, he may use a different language from the first language storage module 21 when operating the TV.
  • the control system 2 cannot accurately recognize when using the language in the first language storage module 21 for matching and recognition.
  • the control system 2 preferentially uses the keywords that may appear under the interface, and calls the standard language library 22 of other languages for proofreading and recognition.
  • the control system 2 performs related operations. If the user does not return to the operation again after 10s, the operation is considered valid and the speech recognition is correct. At this time, the control system 2 replaces the recognized new language with the first
  • the language type stored in the language type module 21 can be used to identify the new language type more quickly and accurately during subsequent operations.
  • the control system 2 extracts the information of the TV interface during the voice input to prioritize the keywords with the highest probability of being used in the interface. Choose the most probable language, and choose the most probable keywords according to the interface situation, so that you can quickly and accurately proofread and recognize your voice. For example, in the initial interface of the TV, the user's general operation may be to open a certain TV program or play a certain song. The most likely keywords are "open" and "play”.
  • the control system 2 judges the interface where the TV is currently located, and preferentially selects the keyword proofreading recognition in this direction, which can speed up the recognition speed and accuracy.
  • the control system 2 is provided with a standard language library 22 of several languages. After the control system 2 receives a voice command, the standard language library 22 preferentially performs comparison and recognition based on the language in the first language storage module 21 .
  • the TV supports multiple languages, but the language in the first language storage module 21 is preferentially called when in use, which can improve the efficiency of speech recognition.
  • the control system 2 is also provided with a modified language library 23.
  • the control system 2 controls the TV according to the comparison with the standard language library 22, and after confirming that the recognition is correct, the operation instruction and the received
  • the voice command of is stored in the modified language library 23, and the control system 2 will prioritize the comparison with the modified language library 23 during voice recognition.
  • control system 2 operates the TV, if the user does not perform a return operation for 5 seconds, the operation is considered to be correct and effective, so as to determine that the speech recognition is correct, so as to complete the storage of the corrected language library 23. Avoid storing the wrongly recognized voice commands, so as to avoid subsequent misoperations.
  • the control system 2 first retrieves the language proofreading recognition in the standard language library 22 according to the language in the first language storage module 21, and preferentially selects keywords with high probability for operation according to the TV interface. For example, in the video playback interface, the user's most likely operation is "pause, fast forward, next episode, increase volume" and other operations. Therefore, after the corresponding keyword is recognized, the recognition is considered successful, and the control is controlled after the recognition is successful The system 2 performs corresponding operations. After the operation is completed, if the user does not return to the operation again in 5s, the operation is considered to be correct and effective to determine that the voice recognition is correct, and then the voice instructions and the corresponding operation instructions are stored in the correction language library 23. When the control system 2 receives the voice command, it will give priority to the proofreading and recognition with the correction language library 23, so that it can be quickly and accurately recognized when the user's pronunciation is not accurate enough.
  • the control system 2 is provided with a first language storage module 21, The control system 2 preferentially compares the languages in the first language storage module 21; the control system 2 is provided with a standard language library 22 of several languages. After the control system 2 receives a voice command, it will give priority to the standard language library 22 according to the first language.
  • the languages in the language storage module 21 are compared and identified. For example, when the language stored in the first language storage module 21 is English, the control system 2 will preferentially process the voice according to the English voice after receiving the voice instruction, so as to collate and recognize the voice and the English standard language library 22.
  • control system 2 When in use, when the control system 2 recognizes a language that is different from the first language storage module 21, the control system 2 replaces the language in the first language storage module 21 with the new language.
  • the control system 2 When the user switches the control language, the language can also be determined quickly, and the language can be identified more quickly and accurately when the language is subsequently used.
  • the control system 2 extracts the TV interface information to divide the standard language library 22 into several language layers 221 according to the interface.
  • the control system 2 recognizes the interface where the TV is located, and preferentially performs voice recognition comparison from the language layer 221 corresponding to the interface.
  • the interface information to proofread according to the optimal keywords thereby improving the efficiency of speech recognition.
  • users generally use keywords such as "search”, “open”, and "play”, so these words are listed as the first-level proofreading recognition keywords corresponding to the homepage interface.
  • the keywords most used by users are related keywords such as "next song” and "increase the sound”, so these words are listed as the first-level proofreading recognition keywords corresponding to the song playing interface.
  • the control system 2 When the control system 2 receives a voice command, it will retrieve the interface information of the TV at the same time, so that the language layer 221 corresponding to the relevant interface is preferentially retrieved for recognition, which can further accelerate the recognition speed and recognition accuracy.
  • control system 2 extracts interface information and uses the language in the first language storage module 21 to compare with the corresponding language layer 221 in the standard language library 22, if no appropriate instructions are compared, the interface corresponding to other languages is given priority.
  • the language layer 221 is compared. Further improve the efficiency of speech recognition.
  • the control system 2 first performs proofreading and recognition in the language layer 221 of keywords such as "search”, "open”, and "play”.
  • keywords such as "search”, "open”, and "play”
  • the relevant operation is performed, and the recognition is correct when the user 5s does not return to the operation, so as to determine that the user has changed the language, and the language in the first language storage module 21 is replaced.

Abstract

本发明公开了一种基于Gaia AI语音控制的智能电视多语种识别系统,属于语音控制领域,本发明公开的一种基于Gaia AI语音控制的智能电视多语种识别系统,包括用于接收语音信号的遥控器、以及语音信号识别处理的控制系统,所述控制系统设置有第一语种存储模块,所述控制系统优先按照所述第一语种存储模块中的语种比对;所述控制系统提取语音输入时电视界面的信息,以优先比对在该界面下使用概率最大的关键词。直接选择第一语种存储模块中的语种库进行识别校对,同时根据所在界面信息进行相关关键词校对,从而提高语音的识别速度和识别准确率。

Description

一种基于Gaia AI语音控制的智能电视多语种识别系统 技术领域
本发明涉及语音控制领域,尤其涉及一种基于Gaia AI语音控制的智能电视多语种识别系统。
背景技术
现阶段普通液晶智能电视基本都是按键与遥控器操作控制为主,因现在智能电视机功能强大,软件搜索,电影搜索等常用功能,但遥控器打字与功能控制的速度慢等缺点,随着人们对电视的高效操作需求,为此开发语音控制产品。随着语音识别技术的飞速发展,目前的语音识别准确度已达到实际应用的水平,从而成为人机交互的重要接口之一,被广泛应用于各类场景,例如语音输入、语音搜索、语音翻译、智能家居等等。同时,使用语音识别技术的用户也越来越多,这些用户可能来自不同的国家,使用不同的语种,因此传统的单一语音识别模型很难适用于所有用户,需要针对不同语种的用户训练相应的声学模型。
中国专利文献公开号CN109817213A公开的一种用于自适应语种进行语音识别的方法,包括:基于获取的语音数据提取表示发音音素信息的音素特征;将所述音素特征输入预先基于多语种语料训练得到的语种判别模型,得到所述语音数据的语种判别结果;以及根据所述语种判别结果,基于相应语种的语言声学模型获取所述语音数据的语音识别结果。
现有的语音控制电视识别速度较慢,对于支持多国语言的识别设备其识别速度和识别准确率相对较低,从而影响客户使用体验。
发明内容
为了克服现有技术的缺陷,本发明所要解决的技术问题在于提出一种基于 Gaia AI语音控制的智能电视多语种识别系统,优先选取最优的语种和关键词校对识别,从而对于不同语种可以快速准确的识别。
为达此目的,本发明采用以下技术方案:
本发明提供的一种基于Gaia AI语音控制的智能电视多语种识别系统,包括用于接收语音信号的遥控器、以及语音信号识别处理的控制系统,所述控制系统设置有第一语种存储模块,所述控制系统优先按照所述第一语种存储模块中的语种比对;所述控制系统提取语音输入时电视界面的信息,以优先比对在该界面下使用概率最大的关键词。选择最可能的语种,同时根据界面情况选择最可能出现的关键词,从而可以快速准确的校对识别语音。
本发明优选地技术方案在于,所述遥控器接收特定激活语传输至所述控制系统,所述控制系统对激活语与支持的语种对比,以将识别出的语种储存至所述第一语种存储模块。方便准确确定使用者所使用的语言,从而方便快速识别。
本发明优选地技术方案在于,在使用时,所述控制系统识别出不同于所述第一语种存储模块中的语种时,所述控制系统将所述第一语种存储模块中的语种替换为该新的语种。在使用者切换控制语言时也可以很快确定其语种,在后续使用该语种时可以更快速准确的识别。
本发明优选地技术方案在于,所述控制系统中设置有若干种语言的标准语言库,所述控制系统接收到语音指令后,从所述标准语言库优先根据所述第一语种存储模块中的语种进行比对识别。电视支持多种语言,但是使用时优先调用第一语种存储模块中的语种,可以提高语音识别效率。
本发明优选地技术方案在于,所述控制系统中还设置有修正语言库,所述控制系统根据与所述标准语言库比对后对电视进行控制,在确认识别正确后将 该操作指令以及接收到的语音指令储存至所述修正语言库,所述控制系统在语音识别时优先与所述修正语言库比对。使得可以适应各地使用者在使用时口音差异而导致的识别准确的问题。
本发明优选地技术方案在于,所述控制系统对电视进行操作之后,若用户5秒未进行返回操作,则认为该操作正确有效,以判定语音识别正确,以完成对所述修正语言库的存储。避免将识别错误的语音指令储存,从而避免后续再次误操作。
本发明优选地技术方案在于,所述控制系统提取电视界面信息,以将所述标准语言库按界面划分为若干语言层,所述控制系统识别电视所处界面,优先从该界面对应的语言层进行语音识别比对。根据界面信息来按最优关键词校对,从而提高语音识别效率。
本发明优选地技术方案在于,所述控制系统提取界面信息采用所述第一语种存储模块中的语种在所述标准语言库中对应的语言层比对时,未比对到合适的指令时,优先在其他语种的界面对应的语言层比对。进一步的提高语音识别效率。
本发明的有益效果为:
本发明提供的一种基于Gaia AI语音控制的智能电视多语种识别系统,包括用于接收语音信号的遥控器、以及语音信号识别处理的控制系统,所述控制系统设置有第一语种存储模块,所述控制系统优先按照所述第一语种存储模块中的语种比对;所述控制系统提取语音输入时电视界面的信息,以优先比对在该界面下使用概率最大的关键词。直接选择第一语种存储模块中的语种库进行识 别校对,同时根据所在界面信息进行相关关键词校对,从而提高语音的识别速度和识别准确率。
附图说明
图1是本发明具体实施方式中提供的基于Gaia AI语音控制的智能电视多语种识别系统原理示意图;
图中:
1、遥控器;2、控制系统;21、第一语种存储模块;22、标准语言库;23、修正语言库;221、语言层。
具体实施方式
下面结合附图并通过具体实施方式来进一步说明本发明的技术方案。
实施例一
如图1所示,初次使用语音智能电视时,只需按下遥控器1的语音输入按键,说出激活语则可由控制系统2在若干的标准语言库22中进行校对识别,激活语可以采用如“语音精灵”、“语音小助手”等不常用的词。从而判断使用者所使用的语言时哪一种,并将该语种信息储存至第一语种存储模块21中。在后续使用的过程中,通过用于接收语音信号的遥控器1将使用者的语音传输至语音信号识别处理的控制系统2,控制系统2设置有第一语种存储模块21,控制系统2优先按照第一语种存储模块21中的语种比对;控制系统2中设置有若干种语言的标准语言库22,控制系统2接收到语音指令后,从标准语言库22优先根据第一语种存储模块21中的语种进行比对识别。
例如当第一语种存储模块21中储存的语种为英语时,则控制系统2在接收到语音指令之后,优先将语音按照英文语音处理,从而将语音与英文的标准语 言库22中校对识别。从而可以加快语音指令的识别。在使用时,所述控制系统2识别出不同于所述第一语种存储模块21中的语种时,所述控制系统2将所述第一语种存储模块21中的语种替换为该新的语种。在使用者切换控制语言时也可以很快确定其语种,在后续使用该语种时可以更快速准确的识别。在家中来外国的客人时,其操作电视时可能使用不同于第一语种存储模块21中的语种,此时控制系统2采用第一语种存储模块21中的语种进行匹配识别时不能准确的识别,此时控制系统2优先采用该界面下可能出现的关键词,调取其他语种的标准语言库22进行校对识别。当识别成功之后,则有控制系统2进行相关的操作,在用户10s后未返回重新操作,则认为该操作有效,语音识别正确,此时控制系统2将识别出来的新的语种替换掉第一语种存储模块21中的语种,从而在后续操作时,使用该新的语种识别会更加迅速准确。
控制系统2提取语音输入时电视界面的信息,以优先比对在该界面下使用概率最大的关键词。选择最可能的语种,同时根据界面情况选择最可能出现的关键词,从而可以快速准确的校对识别语音。例如在电视的初始界面时,用户一般会进行的操作可能是打开某个电视节目,或者播放某首歌曲。最有可能出现的关键词就是“打开”和“播放”。在语音识别时,通过控制系统2判断电视目前所处界面,从而优先选择该方向的关键词校对识别,从而可以加快识别的速度和准确率。
实施例二
如图1所示,控制系统2中设置有若干种语言的标准语言库22,控制系统2接收到语音指令后,从标准语言库22优先根据第一语种存储模块21中的语种进行比对识别。电视支持多种语言,但是使用时优先调用第一语种存储模块21 中的语种,可以提高语音识别效率。为了提高发音不准时的识别率,控制系统2中还设置有修正语言库23,控制系统2根据与标准语言库22比对后对电视进行控制,在确认识别正确后将该操作指令以及接收到的语音指令储存至修正语言库23,控制系统2在语音识别时优先与修正语言库23比对。使得可以适应各地使用者在使用时口音差异而导致的识别准确的问题。进一步地,控制系统2对电视进行操作之后,若用户5秒未进行返回操作,则认为该操作正确有效,以判定语音识别正确,以完成对修正语言库23的存储。避免将识别错误的语音指令储存,从而避免后续再次误操作。
在使用者发音不准确的情况下,控制系统2首先根据第一语种存储模块21中的语种调取标准语言库22中的语种校对识别,优先根据电视所处界面选择概率大的关键词进行操作,例如在视频播放界面时,用户最可能进行的操作时“暂停、快进、下一集、加大音量”等操作,因此在识别相应的关键词后,则认为识别成功,识别成功后控制系统2进行相应的操作,在操作完成之后若用户5s没有返回重新操作,则认为该操作正确有效,以判定语音识别正确,然后将语音指令以及相对应的操作指令储存在修正语言库23。控制系统2接收到语音指令时优先与修正语言库23进行校对识别,从而在使用者发音不够准确的情况,也可以快速准确的识别。
实施例三
如图1所示,在使用的过程中,通过用于接收语音信号的遥控器1将使用者的语音传输至语音信号识别处理的控制系统2,控制系统2设置有第一语种存储模块21,控制系统2优先按照第一语种存储模块21中的语种比对;控制系统2中设置有若干种语言的标准语言库22,控制系统2接收到语音指令后,从标 准语言库22优先根据第一语种存储模块21中的语种进行比对识别。例如当第一语种存储模块21中储存的语种为英语时,则控制系统2在接收到语音指令之后,优先将语音按照英文语音处理,从而将语音与英文的标准语言库22中校对识别。从而可以加快语音指令的识别。在使用时,所述控制系统2识别出不同于所述第一语种存储模块21中的语种时,所述控制系统2将所述第一语种存储模块21中的语种替换为该新的语种。在使用者切换控制语言时也可以很快确定其语种,在后续使用该语种时可以更快速准确的识别。
为了进一步地的提高识别速度和准确率。控制系统2提取电视界面信息,以将标准语言库22按界面划分为若干语言层221,控制系统2识别电视所处界面,优先从该界面对应的语言层221进行语音识别比对。根据界面信息来按最优关键词校对,从而提高语音识别效率。例如在首页时,用户一般会使用“搜索”、“打开”、“播放”等关键词,因此这些词列为首页界面所对应的第一层级校对识别关键词。而在歌曲播放界面,用户使用较多的关键词为“下一首”、“加大声音”等相关关键词,因此这些词列为歌曲播放界面所对应的第一层级校对识别关键词。在控制系统2接收到语音指令时,同时会调取电视所处界面信息,从而优先调取相关界面对应的语言层221进行识别,可以进一步加快识别速度和识别准确率。
进一步地,控制系统2提取界面信息采用第一语种存储模块21中的语种在标准语言库22中对应的语言层221比对时,未比对到合适的指令时,优先在其他语种的界面对应的语言层221比对。进一步的提高语音识别效率。在首页进行语音操作时,控制系统2首先在“搜索”、“打开”、“播放”等关键词的语言层221进行校对识别,当在该语言层221未识别出相应的语音指令时,优 先采用其他语种下的“搜索”、“打开”、“播放”等关键词的语言层221进行校对识别。在校对到合适的语音指令时,则进行相关的操作,并在用户5s未返回操作时认定识别正确,从而判断用户改变了使用语种,从而将第一语种存储模块21中的语种替换。
本发明是通过优选实施例进行描述的,本领域技术人员知悉,在不脱离本发明的精神和范围的情况下,可以对这些特征和实施例进行各种改变或等效替换。本发明不受此处所公开的具体实施例的限制,其他落入本申请的权利要求内的实施例都属于本发明保护的范围。

Claims (8)

  1. 一种基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    包括用于接收语音信号的遥控器(1)、以及语音信号识别处理的控制系统(2),所述控制系统(2)设置有第一语种存储模块(21),所述控制系统(2)优先按照所述第一语种存储模块(21)中的语种比对;
    所述控制系统(2)提取语音输入时电视界面的信息,以优先比对在该界面下使用概率最大的关键词。
  2. 根据权利要求1所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    所述遥控器(1)接收特定激活语传输至所述控制系统(2),所述控制系统(2)对激活语与支持的语种对比,以将识别出的语种储存至所述第一语种存储模块(21)。
  3. 根据权利要求2所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    在使用时,所述控制系统(2)识别出不同于所述第一语种存储模块(21)中的语种时,所述控制系统(2)将所述第一语种存储模块(21)中的语种替换为该新的语种。
  4. 根据权利要求1所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    所述控制系统(2)中设置有若干种语言的标准语言库(22),所述控制系统(2)接收到语音指令后,从所述标准语言库(22)优先根据所述第一语种存储模块(21)中的语种进行比对识别。
  5. 根据权利要求4所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    所述控制系统(2)中还设置有修正语言库(23),所述控制系统(2)根据与所述标准语言库(22)比对后对电视进行控制,在确认识别正确后将该操作指令以及接收到的语音指令储存至所述修正语言库(23),所述控制系统(2)在语音识别时优先与所述修正语言库(23)比对。
  6. 根据权利要求5所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    所述控制系统(2)对电视进行操作之后,若用户5秒未进行返回操作,则认为该操作正确有效,以判定语音识别正确,以完成对所述修正语言库(23)的存储。
  7. 根据权利要4或5所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    所述控制系统(2)提取电视界面信息,以将所述标准语言库(22)按界面划分为若干语言层(221),所述控制系统(2)识别电视所处界面,优先从该界面对应的语言层(221)进行语音识别比对。
  8. 根据权利要求7所述的基于Gaia AI语音控制的智能电视多语种识别系统,其特征在于:
    所述控制系统(2)提取界面信息采用所述第一语种存储模块(21)中的语种在所述标准语言库(22)中对应的语言层(221)比对时,未比对到合适的指令时,优先在其他语种的界面对应的语言层(221)比对。
PCT/CN2020/089239 2020-05-08 2020-05-08 一种基于Gaia AI语音控制的智能电视多语种识别系统 WO2021223232A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/089239 WO2021223232A1 (zh) 2020-05-08 2020-05-08 一种基于Gaia AI语音控制的智能电视多语种识别系统
CN202010737633.6A CN111800657B (zh) 2020-05-08 2020-07-28 一种基于Gaia AI语音控制的智能电视多语种识别系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/089239 WO2021223232A1 (zh) 2020-05-08 2020-05-08 一种基于Gaia AI语音控制的智能电视多语种识别系统

Publications (1)

Publication Number Publication Date
WO2021223232A1 true WO2021223232A1 (zh) 2021-11-11

Family

ID=72827976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089239 WO2021223232A1 (zh) 2020-05-08 2020-05-08 一种基于Gaia AI语音控制的智能电视多语种识别系统

Country Status (2)

Country Link
CN (1) CN111800657B (zh)
WO (1) WO2021223232A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260071A (zh) * 2012-08-29 2013-08-21 四川长虹电器股份有限公司 一种自动选择菜单语言和伴音语言的机顶盒及实现方法
US20170286049A1 (en) * 2014-08-27 2017-10-05 Samsung Electronics Co., Ltd. Apparatus and method for recognizing voice commands
CN108172212A (zh) * 2017-12-25 2018-06-15 横琴国际知识产权交易中心有限公司 一种基于置信度的语音语种识别方法及系统
CN109785832A (zh) * 2018-12-20 2019-05-21 安徽声讯信息技术有限公司 一种适用于重口音的老人机顶盒智能语音识别方法
CN110148399A (zh) * 2019-05-06 2019-08-20 北京猎户星空科技有限公司 一种智能设备的控制方法、装置、设备及介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102036033A (zh) * 2010-12-31 2011-04-27 Tcl集团股份有限公司 一种语音遥控电视机的方法及语音遥控器
CN103871437B (zh) * 2012-12-11 2017-08-22 比亚迪股份有限公司 车载多媒体装置及其语音控制方法
KR101936640B1 (ko) * 2017-03-31 2019-01-09 엘지전자 주식회사 홈 어플라이언스, 및 음성 인식 모듈
CN110910872B (zh) * 2019-09-30 2023-06-02 华为终端有限公司 语音交互方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260071A (zh) * 2012-08-29 2013-08-21 四川长虹电器股份有限公司 一种自动选择菜单语言和伴音语言的机顶盒及实现方法
US20170286049A1 (en) * 2014-08-27 2017-10-05 Samsung Electronics Co., Ltd. Apparatus and method for recognizing voice commands
CN108172212A (zh) * 2017-12-25 2018-06-15 横琴国际知识产权交易中心有限公司 一种基于置信度的语音语种识别方法及系统
CN109785832A (zh) * 2018-12-20 2019-05-21 安徽声讯信息技术有限公司 一种适用于重口音的老人机顶盒智能语音识别方法
CN110148399A (zh) * 2019-05-06 2019-08-20 北京猎户星空科技有限公司 一种智能设备的控制方法、装置、设备及介质

Also Published As

Publication number Publication date
CN111800657A (zh) 2020-10-20
CN111800657B (zh) 2022-12-02

Similar Documents

Publication Publication Date Title
US10056078B1 (en) Output of content based on speech-based searching and browsing requests
JP3333123B2 (ja) 音声認識中に認識されたワードをバッファする方法及びシステム
US4829576A (en) Voice recognition system
DK179111B1 (en) INTELLIGENT AUTOMATED ASSISTANT IN A MEDIUM ENVIRONMENT
KR101537370B1 (ko) 녹취된 음성 데이터에 대한 핵심어 추출 기반 발화 내용 파악 시스템과, 이 시스템을 이용한 인덱싱 방법 및 발화 내용 파악 방법
US8954329B2 (en) Methods and apparatus for acoustic disambiguation by insertion of disambiguating textual information
US10339920B2 (en) Predicting pronunciation in speech recognition
JP4446312B2 (ja) 音声認識中に可変数の代替ワードを表示する方法及びシステム
US6430531B1 (en) Bilateral speech system
US5794189A (en) Continuous speech recognition
JP3662780B2 (ja) 自然言語を用いた対話システム
US20120016671A1 (en) Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions
US11093110B1 (en) Messaging feedback mechanism
JPH10133684A (ja) 音声認識中に代替ワードを選択する方法及びシステム
CN109360563B (zh) 一种语音控制方法、装置、存储介质及空调
JP2011209787A (ja) 情報処理装置、および情報処理方法、並びにプログラム
JP2011209786A (ja) 情報処理装置、および情報処理方法、並びにプログラム
JP2009047920A (ja) ユーザと音声により対話する装置および方法
JP2021529337A (ja) 音声認識技術を利用した多者間対話記録/出力方法及びこのため装置
JP2011504624A (ja) 自動同時通訳システム
US10783876B1 (en) Speech processing using contextual data
CN110781649A (zh) 一种字幕编辑方法、装置及计算机存储介质、电子设备
US11263852B2 (en) Method, electronic device, and computer readable storage medium for creating a vote
CA3185271A1 (en) Voice identification for optimizing voice search results
CN114155854B (zh) 语音数据的处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20934834

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20934834

Country of ref document: EP

Kind code of ref document: A1