WO2021017978A1 - Smart television speech recognition method, system and readable storage medium - Google Patents

Smart television speech recognition method, system and readable storage medium Download PDF

Info

Publication number
WO2021017978A1
WO2021017978A1 PCT/CN2020/103545 CN2020103545W WO2021017978A1 WO 2021017978 A1 WO2021017978 A1 WO 2021017978A1 CN 2020103545 W CN2020103545 W CN 2020103545W WO 2021017978 A1 WO2021017978 A1 WO 2021017978A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voiceprint
smart
voice
dialect
Prior art date
Application number
PCT/CN2020/103545
Other languages
French (fr)
Chinese (zh)
Inventor
鲍舰
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2021017978A1 publication Critical patent/WO2021017978A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/441Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
    • H04N21/4415Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present disclosure relates to the field of speech recognition technology, and in particular to a method, system and readable storage medium.
  • the application of voice recognition technology on smart TVs has become widespread. Users can select movies, play music, and even control various household appliances by speaking. For some vast countries, such as China, the pronunciation of various local dialects is very different.
  • the voice recognition technology on smart TVs can recognize local dialects, the prerequisite is that the user needs to set the dialects to use on the TV in advance. It is not possible to perform voice recognition randomly based on the dialect spoken by the user. In other words, the dialect of the user needs to be preset in the smart TV before the smart TV can recognize the user's dialect, otherwise the smart TV voice AI technology cannot automatically recognize what the user said Local dialects.
  • the TV is an electrical equipment shared by the whole family.
  • the elderly may speak the native dialect, and the children only speak Mandarin due to school education.
  • the present disclosure proposes a smart TV automatic dialect matching technology, which enables the smart TV to automatically match the dialect spoken by the user without setting the dialect in advance to achieve automatic identification of the dialect.
  • a smart TV voice recognition method for smart TV to recognize the dialect of a user including the following steps:
  • Smart TV receives voice instructions for user interaction
  • the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command operated by the user;
  • the voice recognition module directly converts the voice commands of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice commands.
  • Smart TV creates corresponding voiceprint profile for each user in advance
  • the user selects and confirms the dialect type in the corresponding voiceprint profile.
  • the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command of the user's interactive operation, including:
  • the voiceprint recognition module performs voiceprint recognition on the voice instructions of the user's interactive operation, confirms the user according to the voiceprint profile, and determines the type of dialect used by the user.
  • the voiceprint recognition module determines that the voiceprint feature of the voice command is not in the corresponding voiceprint profile created by the smart TV in advance for each user, the smart TV is the voiceprint feature of the voiceprint.
  • the user newly creates a corresponding voiceprint profile, and the user selects and confirms the dialect type in the corresponding voiceprint profile.
  • the voiceprint recognition module may be implemented by a voiceprint recognition server connected to a smart TV network.
  • the voice recognition module may be implemented by a voice recognition server connected to a smart TV network.
  • the present disclosure also provides a smart TV voice recognition system for smart TV to recognize the dialect of a user.
  • the smart TV voice recognition system includes a voice receiving module, a voiceprint recognition module, and a voice recognition module;
  • the voice receiving module is used for the smart TV to receive voice instructions for user interaction operations
  • the voiceprint recognition module is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module and determine the type of dialect used by the user;
  • the voice recognition module is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice voiceprint feature of the user interaction operation recognized by the voiceprint recognition module.
  • the above system further includes a user voiceprint feature module, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and include the dialect type corresponding to the user's voiceprint feature.
  • the voiceprint recognition module is configured to perform voiceprint recognition on the voice instructions of the user's interactive operation, confirm the user according to the voiceprint profile, and determine the type of dialect used by the user.
  • the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interaction operation is not the user voiceprint feature in the user voiceprint feature module
  • the user voiceprint feature module is The user of the voiceprint feature newly creates a corresponding voiceprint profile and at the same time determines the corresponding dialect type.
  • the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network;
  • the voice recognition module may be implemented by a voice recognition server connected to a smart TV network.
  • the present disclosure also provides a readable storage medium that stores a smart TV voice recognition program, and when the smart TV voice recognition program is executed by a processor, the steps of the smart TV voice recognition method are realized.
  • the present disclosure uses the voiceprint feature recognition module to pre-file the user's voiceprint features of the smart TV and the corresponding dialect types.
  • the voiceprint feature recognition module pre-recognizes the user’s voiceprint features to determine the user’s voiceprint features and the preset dialect types, and then directly calls the voice recognition module to directly convert the voice commands for dialect user interaction operations into Text, in the entire operation process of the user operating the smart TV by voice, the smart TV recognizes the user’s voice and performs recognition feedback, without the user’s choice of the type of dialect, which is smart for families that use smart TVs and multiple dialects
  • the TV can automatically recognize the dialect spoken by the user and directly recognize the voice command of the user's interactive operation based on the voice recognition technology of the dialect.
  • the present disclosure greatly reduces the number of dialect selections of smart TV users, and improves the user's experience of using voice operations.
  • Fig. 1 is a flowchart of an embodiment of a smart TV voice recognition method of the present disclosure.
  • Fig. 2 is a schematic structural diagram of an embodiment of a smart TV voice recognition system of the present disclosure.
  • the process of the smart TV voice recognition method provided by the present disclosure is shown in FIG. 1.
  • the smart TV voice recognition method of the present disclosure includes the following implementation steps:
  • step S100 the smart TV receives a voice command of the user's interactive operation.
  • the existing smart TVs can recognize dialects in the voice recognition function, the user is in the process of operation.
  • the smart TV’s voice recognition technology cannot directly determine the user’s dialect type, but the user needs to choose the dialect used by himself, which means that the smart TV cannot directly recognize the dialect of each user to perform Speech Recognition.
  • the method of the present disclosure can directly receive the user’s dialect interactive voice command during the human-computer interaction operation using the voice recognition of the smart TV.
  • the smart TV can create corresponding voiceprint features for the user in advance File to automatically select the user’s dialect and directly recognize it. Before the smart TV receives the user’s interactive operation voice instructions, the following steps may also be included:
  • the smart TV creates a corresponding voiceprint profile for each user in advance; the user chooses to confirm the dialect type in the corresponding voiceprint profile.
  • Smart TV sets up voiceprint feature files according to their respective dialects for family members in advance to ensure that the subsequent smart TV voice recognition process can directly select the corresponding dialect voice recognition scheme for recognition. Therefore, when establishing user voice It is also necessary to select the dialect type used for the pattern profile.
  • step S200 the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint feature of the voice command of the user's interactive operation.
  • the voiceprint recognition module performs voiceprint recognition on the voice commands of the user's interactive operation, and confirms the user according to the user's voiceprint profile established in the smart TV in the above process, and can directly determine whether the user is using Which dialect, unlike the prior art smart TV, when receiving the interactive operation voice of the user’s dialect, the user needs to select the dialect for the next step of voice recognition.
  • the method of the present disclosure can directly perform a voice recognition solution based on the user’s dialect Confirmation, skip the process of dialect selection, and improve the user’s experience in using voice recognition technology. For example, a dialect-speaking user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time.
  • the TV interface will pop up various dialects: Cantonese, Sichuan
  • the recognition results of dialects such as Chinese dialect, Hunan dialect, etc. are given to the user.
  • the user needs to further determine the type of Cantonese dialect before the TV can perform subsequent voice recognition operations.
  • a dialect user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time.
  • the TV interface will not pop up various dialects for the user to choose and confirm the dialect type Only then can the next step of speech recognition be performed, but after confirming the user through the voiceprint recognition module, the user’s dialect type is directly selected and the Cantonese speech recognition scheme is used for recognition.
  • the voiceprint recognition module determines that the voiceprint feature of the voice command of the user's interactive operation is not in the corresponding voiceprint profile created by the smart TV in advance for each user, the smart TV creates a new corresponding voiceprint feature for the user with the voiceprint feature.
  • the user chooses to confirm the dialect type in the corresponding voiceprint profile.
  • the voiceprint recognition module can also be implemented by using a voiceprint recognition server connected to a smart TV network.
  • a voiceprint recognition server connected to a smart TV network can make the smart TV save more User voiceprint feature information.
  • step S300 the voice recognition module directly converts the voice command of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice command.
  • the voice recognition module can also be implemented by a voice recognition server connected to the smart TV network.
  • a voice recognition server connected to the smart TV network can make the smart TV save more voice recognition solutions. It can also be continuously expanded and updated as needed.
  • the embodiments of the present disclosure use voiceprint feature recognition technology to distinguish the smart TV users in the home, and directly perform voice recognition according to the user dialect set in advance, so as to realize automatic dialect voice matching in the smart TV voice recognition process.
  • the present disclosure also provides a smart TV voice recognition system.
  • the smart TV voice recognition system 60 includes a voice receiving module 61 and voiceprint recognition. Module 62 and voice recognition module 63.
  • the voice receiving module 61 is used for the smart TV to receive voice instructions for user interaction operations.
  • the accents of the users of each family member are different, and they may even use different dialects.
  • the existing smart TVs can recognize dialects in the voice recognition function, the user is in the process of operation.
  • the smart TV’s voice recognition technology cannot directly determine the user’s dialect type, but the user needs to choose the dialect used by himself, which means that the smart TV cannot directly recognize the dialect of each user to perform Speech Recognition.
  • the system of the present disclosure can directly receive the user’s dialect interactive voice command during the human-computer interaction operation using the voice recognition of the smart TV.
  • the smart TV can create corresponding voiceprint features for the user in advance.
  • the system 60 also includes a user voiceprint feature module 64, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and contains the corresponding voiceprint feature of the user Types of dialects.
  • Smart TV sets up voiceprint feature files according to their respective dialects for family members in advance to ensure that the subsequent smart TV voice recognition process can directly select the corresponding dialect voice recognition scheme for recognition. Therefore, when establishing user voice It is also necessary to select the dialect type used for the pattern profile.
  • the voiceprint recognition module 62 is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module 61 and determine the type of dialect used by the user.
  • the voiceprint recognition module 62 performs voiceprint recognition on the voice commands of the user's interactive operation, and confirms the user according to the user's voiceprint profile established in the smart TV in the above process, so as to directly determine the user's use Which dialect, unlike the prior art smart TV, when receiving the user’s interactive operation voice in the user’s dialect, the user is required to select the dialect for the next step of voice recognition.
  • the disclosed system can directly perform voice recognition based on the user’s dialect Confirmation of the plan, thereby skipping the process of dialect selection, and improving the user experience in using voice recognition technology. For example, a dialect-speaking user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time.
  • the TV interface will pop up various dialects: Cantonese, Sichuan
  • the recognition results of dialects such as Chinese dialect, Hunan dialect, etc. are given to the user.
  • the user needs to further determine the type of Cantonese dialect before the TV can perform subsequent voice recognition operations.
  • a dialect user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time.
  • the TV interface will not pop up various dialects for the user to choose and confirm the dialect type Only then can the next step of speech recognition be performed, but after confirming the user through the voiceprint recognition module, the user’s dialect type is directly selected and the Cantonese speech recognition scheme is used for recognition.
  • the voiceprint recognition module determines that the voiceprint feature of the user interactive operation voice command is not the user's voiceprint feature in the user's voiceprint feature module
  • the user's voiceprint feature module is the user of the voiceprint feature Create a new corresponding voiceprint profile, and determine the corresponding dialect type.
  • the voiceprint recognition module can also be implemented by using a voiceprint recognition server connected to a smart TV network.
  • a voiceprint recognition server connected to a smart TV network can make the smart TV save more User voiceprint feature information.
  • the voice recognition module 63 is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice command voiceprint feature of the user interactive operation recognized by the voiceprint recognition module 62.
  • the voice recognition module 63 can also be implemented by a voice recognition server connected to the smart TV network. Similarly, the use of a voice recognition server connected to the smart TV network can make the smart TV save more voice recognition solutions , Can also be continuously expanded and updated as needed.
  • the present disclosure also provides a readable storage medium that stores a smart TV voice recognition program, and when the smart TV voice recognition program is executed by a processor, the steps of the smart TV voice recognition method are implemented .
  • the specific execution process of the program is the same as the above embodiment of the smart TV voice recognition method, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided in the present disclosure are a smart television speech recognition method, a system and a storage medium, which are used by a smart television to recognize the dialect of a user. A smart television receives a voice command of a user interactive operation; a voiceprint recognition module determines the dialect type used by the user according to voiceprint characteristics of the voice command of the user interactive operation; and according to the dialect type used by the user, a voice recognition module directly converts the voice command of the user interactive operation into writing so as to recognize the voice command of the user. In the present disclosure, the user operates the smart television by means of voice operation. During the entire operation process of the smart television recognizing the voice of the user and performing recognition feedback, the user does not need to choose a dialect type. For families that use a smart television and have many dialects, the dialect spoked by a user can be automatically identified and a voice command of the user interactive operation can be directly recognized according to speech recognition technology for dialects, which greatly reduces the number of times users choose dialects and improves the experience of a user using voice operation.

Description

一种智能电视语音识别方法、系统及可读存储介质Smart TV voice recognition method, system and readable storage medium
优先权priority
本公开要求于申请日为2019年07月26日提交中国专利局、申请号为“201910682661.X”、申请名称为“一种智能电视语音识别方法、系统及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires a Chinese patent application filed with the Chinese Patent Office with the application number "201910682661.X" and the application name "A smart TV voice recognition method, system and readable storage medium" on July 26, 2019. The priority of, the entire content of which is incorporated into this disclosure by reference.
技术领域Technical field
本公开涉及语音识别技术领域,尤其涉及一种方法、系统及可读存储介质。The present disclosure relates to the field of speech recognition technology, and in particular to a method, system and readable storage medium.
背景技术Background technique
目前,智能电视上的语音识别技术应用已经普遍,用户可以通过说话来选择影片,播放音乐,甚至是控制各种家用电器。对于一些幅员辽阔的国家,比如我们中国,各种地方方言发音区别很大,虽然智能电视上的语音识别技术能够识别地方方言,但是其前提条件是需要用户在电视上事先设置好使用的方言,而无法随机根据用户所说的方言来进行语音识别,换言之,用户的方言需要首先预置于智能电视中,智能电视才能实现对用户方言的识别,否则智能电视语音AI技术无法自动识别用户所说的地方方言。At present, the application of voice recognition technology on smart TVs has become widespread. Users can select movies, play music, and even control various household appliances by speaking. For some vast countries, such as China, the pronunciation of various local dialects is very different. Although the voice recognition technology on smart TVs can recognize local dialects, the prerequisite is that the user needs to set the dialects to use on the TV in advance. It is not possible to perform voice recognition randomly based on the dialect spoken by the user. In other words, the dialect of the user needs to be preset in the smart TV before the smart TV can recognize the user's dialect, otherwise the smart TV voice AI technology cannot automatically recognize what the user said Local dialects.
对于一个家庭来说,电视是全家公用的电器设备,老人可能会讲家乡话,小孩由于学校教育只讲普通话,在一个家庭中,可能会存在多种方言的可能性,电视中为家庭内每一个成员预置对应的方言也是不太现实的,即便是电视内预置了多种方言,用户在使用时都需要进行方言设置无疑为电视用户的使用带来了诸多不便,使得智能电视的用户体验不佳。For a family, the TV is an electrical equipment shared by the whole family. The elderly may speak the native dialect, and the children only speak Mandarin due to school education. In a family, there may be the possibility of multiple dialects. It is not realistic for a member to preset the corresponding dialect. Even if there are multiple dialects preset in the TV, the user needs to set the dialect when using it, which undoubtedly brings a lot of inconvenience to the use of TV users, making smart TV users The experience is not good.
现有技术也存在一些解决这种需要预置方言的语音识别技术,比如根据智能电视的地理位置来进行判断,即根据智能电视联网的IP地址,通过IP地址于地理位置的关系来判断用户的地理位置,再根据地理位置来确定智能电视首选的方言种类,但是这种根据地理位置确定方言的问题在于,对于一些移民城市或者外来人口较多的城市而言,地 理位置的设定显然并不能解决该问题。There are also some voice recognition technologies in the prior art that solve this need to preset dialects, such as judging based on the geographic location of the smart TV, that is, based on the IP address of the smart TV networked, and judging the user’s status based on the relationship between the IP address and the geographic location Geographical location, and then determine the preferred dialect type for smart TVs based on geographic location, but the problem with determining dialects based on geographic location is that for some immigrant cities or cities with large foreign populations, the geographic location setting is obviously not Solve the problem.
因此,现有技术还有待于改进和发展。Therefore, the existing technology needs to be improved and developed.
公开内容Public content
鉴于上述现有技术的不足之处,本公开提出一种智能电视自动方言匹配技术,使得智能电视在没有事先设定方言的情况下自动匹配用户所说的方言,达到对方言的自动识别。In view of the above-mentioned shortcomings of the prior art, the present disclosure proposes a smart TV automatic dialect matching technology, which enables the smart TV to automatically match the dialect spoken by the user without setting the dialect in advance to achieve automatic identification of the dialect.
本公开解决技术问题所采用的技术方案如下:The technical solutions adopted by the present disclosure to solve the technical problems are as follows:
一种智能电视语音识别方法,用于智能电视识别用户的方言,包括如下步骤:A smart TV voice recognition method for smart TV to recognize the dialect of a user, including the following steps:
智能电视接收用户交互操作的语音指令;Smart TV receives voice instructions for user interaction;
声纹识别模块根据用户操作的语音指令的声纹特征确定用户使用的方言种类;The voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command operated by the user;
语音识别模块根据该用户使用的方言种类直接将用户交互操作的语音指令转化为文字以识别出用户的语音指令。The voice recognition module directly converts the voice commands of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice commands.
在一个实施例中,在智能电视接收用户交互操作的语音指令之前还包括如下步骤:In an embodiment, before the smart TV receives the voice instruction for the user interaction operation, the following steps are further included:
智能电视预先为每个用户创建对应的声纹特征档;Smart TV creates corresponding voiceprint profile for each user in advance;
用户选择确认对应的声纹特征档中的方言种类。The user selects and confirms the dialect type in the corresponding voiceprint profile.
在一个实施例中,声纹识别模块根据用户交互操作的语音指令的声纹特征确定用户使用的方言种类,包括:In one embodiment, the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command of the user's interactive operation, including:
声纹识别模块对用户交互操作的语音指令进行声纹识别,并根据所述声纹特征档进行用户的确认,并确定该用户使用的方言种类。The voiceprint recognition module performs voiceprint recognition on the voice instructions of the user's interactive operation, confirms the user according to the voiceprint profile, and determines the type of dialect used by the user.
在一个实施例中,,当所述声纹识别模块判断所述语音指令的声纹特征不在智能电视预先为每个用户创建的对应声纹特征档中时,则智能电视为该声纹特征的用户新创建对应的声纹特征档,用户选择确认对应的声纹特征档中的方言种类。In one embodiment, when the voiceprint recognition module determines that the voiceprint feature of the voice command is not in the corresponding voiceprint profile created by the smart TV in advance for each user, the smart TV is the voiceprint feature of the voiceprint. The user newly creates a corresponding voiceprint profile, and the user selects and confirms the dialect type in the corresponding voiceprint profile.
在一个实施例中,所述声纹识别模块可采用与智能电视网络连接的声纹识别服务器来实现。In one embodiment, the voiceprint recognition module may be implemented by a voiceprint recognition server connected to a smart TV network.
在一个实施例中,所述语音识别模块可采用与智能电视网络连接的语音识别服务器 来实现。In an embodiment, the voice recognition module may be implemented by a voice recognition server connected to a smart TV network.
本公开还提供一种智能电视语音识别系统,用于智能电视识别用户的方言,该智能电视语音识别系统包括语音接收模块、声纹识别模块和语音识别模块;The present disclosure also provides a smart TV voice recognition system for smart TV to recognize the dialect of a user. The smart TV voice recognition system includes a voice receiving module, a voiceprint recognition module, and a voice recognition module;
所述语音接收模块用于智能电视接收用户交互操作的语音指令;The voice receiving module is used for the smart TV to receive voice instructions for user interaction operations;
所述声纹识别模块用于判断所述语音接收模块接收到的用户交互操作的语音指令的声纹特征并确定用户使用的方言种类;The voiceprint recognition module is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module and determine the type of dialect used by the user;
所述语音识别模块用于根据所述声纹识别模块识别出的用户交互操作的语音声纹特征对应的方言种类,并直接将用户的语音转化为文字以识别出用户的语音指令。The voice recognition module is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice voiceprint feature of the user interaction operation recognized by the voiceprint recognition module.
在一个实施例中,上述系统还包括用户声纹特征模块,用于预先为每个智能电视用户创建对应的声纹特征档,并包含用户声纹特征对应的方言种类。In one embodiment, the above system further includes a user voiceprint feature module, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and include the dialect type corresponding to the user's voiceprint feature.
在一个实施例中,所述声纹识别模块,用于对用户交互操作的语音指令进行声纹识别,并根据所述声纹特征档进行用户的确认,并确定该用户使用的方言种类。In one embodiment, the voiceprint recognition module is configured to perform voiceprint recognition on the voice instructions of the user's interactive operation, confirm the user according to the voiceprint profile, and determine the type of dialect used by the user.
在一个实施例中,当所述声纹识别模块判断用户交互操作的语音指令的声纹特征不在所述用户声纹特征模块中的用户声纹特征时,则由所述用户声纹特征模块为该声纹特征的用户新创建对应的声纹特征档,同时确定对应使用的方言种类。In one embodiment, when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interaction operation is not the user voiceprint feature in the user voiceprint feature module, the user voiceprint feature module is The user of the voiceprint feature newly creates a corresponding voiceprint profile and at the same time determines the corresponding dialect type.
在一个实施例中,所述声纹识别模块可采用与智能电视网络连接的声纹识别服务器来实现;In one embodiment, the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network;
在一个实施例中,所述语音识别模块可采用与智能电视网络连接的语音识别服务器来实现。In an embodiment, the voice recognition module may be implemented by a voice recognition server connected to a smart TV network.
本公开还提供一种可读存储介质,所述可读存储介质存储有智能电视语音识别的程序,该智能电视语音识别的程序被处理器执行时实现上述智能电视语音识别方法的步骤。The present disclosure also provides a readable storage medium that stores a smart TV voice recognition program, and when the smart TV voice recognition program is executed by a processor, the steps of the smart TV voice recognition method are realized.
与现有技术相比较,本公开采用了声纹特征识别模块对智能电视的用户声纹特征及对应使用的方言种类进行了预先建档,当用户通过智能电视的语音操作功能操作智能电视时,声纹特征识别模块对用户的声纹特征预先进行识别,以确定该用户的声纹特征及其预先设置的方言种类,然后直接调用语音识别模块将方言类的用户交互操作的语音指令直接转换为文本,在用户通过语音操作智能电视,智能电视识别用户语音并进行识别 反馈的整个操作过程中,无需用户对方言的种类进行选择,这对于使用智能电视并且多种方言存在的家庭而言,智能电视可以自动识别用户所说的方言并直接根据该方言的语音识别技术识别该用户交互操作的语音指令。本公开大幅减少了智能电视用户对方言的选择次数,提升了用户使用语音操作的体验。Compared with the prior art, the present disclosure uses the voiceprint feature recognition module to pre-file the user's voiceprint features of the smart TV and the corresponding dialect types. When the user operates the smart TV through the voice operation function of the smart TV, The voiceprint feature recognition module pre-recognizes the user’s voiceprint features to determine the user’s voiceprint features and the preset dialect types, and then directly calls the voice recognition module to directly convert the voice commands for dialect user interaction operations into Text, in the entire operation process of the user operating the smart TV by voice, the smart TV recognizes the user’s voice and performs recognition feedback, without the user’s choice of the type of dialect, which is smart for families that use smart TVs and multiple dialects The TV can automatically recognize the dialect spoken by the user and directly recognize the voice command of the user's interactive operation based on the voice recognition technology of the dialect. The present disclosure greatly reduces the number of dialect selections of smart TV users, and improves the user's experience of using voice operations.
附图说明Description of the drawings
下面结合附图对本公开的具体实施方式作进一步的说明,其中:The specific embodiments of the present disclosure will be further described below in conjunction with the accompanying drawings, in which:
图1是本公开一种智能电视语音识别方法的实施例流程图。Fig. 1 is a flowchart of an embodiment of a smart TV voice recognition method of the present disclosure.
图2是本公开一种智能电视语音识别系统的实施例原理结构图。Fig. 2 is a schematic structural diagram of an embodiment of a smart TV voice recognition system of the present disclosure.
具体实施方式Detailed ways
为使本公开的目的、技术方案及优点更加清楚、明确,以下参照附图并举实施例对本公开进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。In order to make the objectives, technical solutions and advantages of the present disclosure clearer and clearer, the present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure, but not used to limit the present disclosure.
本公开提供的智能电视语音识别方法的流程如图1所示,本公开智能电视语音识别方法包括如下实现步骤:The process of the smart TV voice recognition method provided by the present disclosure is shown in FIG. 1. The smart TV voice recognition method of the present disclosure includes the following implementation steps:
步骤S100,智能电视接收用户交互操作的语音指令。In step S100, the smart TV receives a voice command of the user's interactive operation.
在使用智能电视的家庭中,各个家庭成员用户的口音各自不同,甚至由可能使用不同的方言,现有智能电视在语音识别功能上虽然对于方言也可以识别,但是在操作的过程中,用户在使用方言与智能电视进行交互时,智能电视的语音识别技术无法直接确定用户的方言种类,而需要由用户自行进行选择所采用的方言,也就意味着智能电视无法直接识别各个用户的方言从而进行语音识别。本公开方法在使用智能电视的语音识别进行人机交互操作过程中,可以直接接收用户的方言交互语音指令,当然,作为另一实施方式,智能电视可在其中预先为用户创建对应的声纹特征档来自动选择用户的方言并直接进行识别,在智能电视接收用户交互操作的语音指令之前还可以包括如下步骤:In families using smart TVs, the accents of the users of each family member are different, and they may even use different dialects. Although the existing smart TVs can recognize dialects in the voice recognition function, the user is in the process of operation. When using dialects to interact with smart TVs, the smart TV’s voice recognition technology cannot directly determine the user’s dialect type, but the user needs to choose the dialect used by himself, which means that the smart TV cannot directly recognize the dialect of each user to perform Speech Recognition. The method of the present disclosure can directly receive the user’s dialect interactive voice command during the human-computer interaction operation using the voice recognition of the smart TV. Of course, as another embodiment, the smart TV can create corresponding voiceprint features for the user in advance File to automatically select the user’s dialect and directly recognize it. Before the smart TV receives the user’s interactive operation voice instructions, the following steps may also be included:
智能电视预先为每个用户创建对应的声纹特征档;用户选择确认对应的声纹特征档 中的方言种类。The smart TV creates a corresponding voiceprint profile for each user in advance; the user chooses to confirm the dialect type in the corresponding voiceprint profile.
智能电视为家庭成员用户事先均根据各自的方言进行声纹特征建档,以保证后续智能电视语音的识别过程中能够直接根据该方言选择对应的方言语音识别方案进行识别,因此,在建立用户声纹特征档时同时还需要为其对应选择所使用的方言种类。Smart TV sets up voiceprint feature files according to their respective dialects for family members in advance to ensure that the subsequent smart TV voice recognition process can directly select the corresponding dialect voice recognition scheme for recognition. Therefore, when establishing user voice It is also necessary to select the dialect type used for the pattern profile.
步骤S200,声纹识别模块根据用户交互操作的语音指令的声纹特征确定用户使用的方言种类。In step S200, the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint feature of the voice command of the user's interactive operation.
具体而言,声纹识别模块是对用户交互操作的语音指令进行声纹识别,并根据上述过程中智能电视中建立的用户声纹特征档进行用户的确认,进而能够直接确定该用户使用的是何种方言,不像现有技术智能电视接收到用户方言的交互操作语音时,需要用户再进行方言的选择以进行下一步的语音识别,本公开方法能够直接根据该用户的方言进行语音识别方案的确认,从而跳过方言选择的过程,提高用户使用语音识别技术中的体验。比如,一个说方言的用户(广东话)第一次在电视前使用方言说“我想看XX节目”,此时,按照现有技术的操作,电视界面会弹出各种方言:广东话、四川话、湖南话等等的方言识别结果给用户,用户需要进一步判断其方言为广东话的种类后,电视才能进行后续的语音识别操作。采用本公开方法时,一个说方言的用户(广东话)第一次在电视前使用方言说“我想看XX节目”,此时,电视界面不会弹出各种方言供用户进行选择确认方言种类后才能进行下一步的语音识别,而是通过声纹识别模块确认用户后直接选择该用户的方言种类匹配后采用广东话的语音识别方案进行识别。当所述声纹识别模块判断用户交互操作的语音指令的声纹特征不在智能电视预先为每个用户创建的对应声纹特征档中时,则智能电视为该声纹特征的用户新创建对应的声纹特征档,用户选择确认对应的声纹特征档中的方言种类。Specifically, the voiceprint recognition module performs voiceprint recognition on the voice commands of the user's interactive operation, and confirms the user according to the user's voiceprint profile established in the smart TV in the above process, and can directly determine whether the user is using Which dialect, unlike the prior art smart TV, when receiving the interactive operation voice of the user’s dialect, the user needs to select the dialect for the next step of voice recognition. The method of the present disclosure can directly perform a voice recognition solution based on the user’s dialect Confirmation, skip the process of dialect selection, and improve the user’s experience in using voice recognition technology. For example, a dialect-speaking user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, according to the existing technology, the TV interface will pop up various dialects: Cantonese, Sichuan The recognition results of dialects such as Chinese dialect, Hunan dialect, etc. are given to the user. The user needs to further determine the type of Cantonese dialect before the TV can perform subsequent voice recognition operations. When using the disclosed method, a dialect user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, the TV interface will not pop up various dialects for the user to choose and confirm the dialect type Only then can the next step of speech recognition be performed, but after confirming the user through the voiceprint recognition module, the user’s dialect type is directly selected and the Cantonese speech recognition scheme is used for recognition. When the voiceprint recognition module determines that the voiceprint feature of the voice command of the user's interactive operation is not in the corresponding voiceprint profile created by the smart TV in advance for each user, the smart TV creates a new corresponding voiceprint feature for the user with the voiceprint feature. For the voiceprint profile, the user chooses to confirm the dialect type in the corresponding voiceprint profile.
当然,作为另一种实施方式,声纹识别模块还可以采用与智能电视网络连接的声纹识别服务器来实现,采用与智能电视网络连接的声纹识别服务器的方式能够使得智能电视保存更多的用户声纹特征信息。Of course, as another implementation manner, the voiceprint recognition module can also be implemented by using a voiceprint recognition server connected to a smart TV network. Using a voiceprint recognition server connected to a smart TV network can make the smart TV save more User voiceprint feature information.
步骤S300,语音识别模块根据该用户使用的方言种类直接将用户交互操作的语音指令转化为文字以识别出用户的语音指令。In step S300, the voice recognition module directly converts the voice command of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice command.
同上声纹识别模块道理,语音识别模块也可以采用与智能电视网络连接的语音识别服务器来实现,同理,采用与智能电视网络连接的语音识别服务器能够使得智能电视保存更多的语音识别方案,也可以根据需要不断扩展更新。Same as the voiceprint recognition module, the voice recognition module can also be implemented by a voice recognition server connected to the smart TV network. Similarly, the use of a voice recognition server connected to the smart TV network can make the smart TV save more voice recognition solutions. It can also be continuously expanded and updated as needed.
本公开的实施方式使用声纹特征识别技术将智能电视使用家庭中的用户进行区别,根据事先设置的用户方言直接进行语音识别,实现了智能电视语音识别过程中自动对方言语音的匹配。The embodiments of the present disclosure use voiceprint feature recognition technology to distinguish the smart TV users in the home, and directly perform voice recognition according to the user dialect set in advance, so as to realize automatic dialect voice matching in the smart TV voice recognition process.
本公开还提供一种智能电视语音识别系统,如图2本公开一种智能电视语音识别系统的实施例的原理结构图所示,该智能电视语音识别系统60包括语音接收模块61、声纹识别模块62和语音识别模块63。The present disclosure also provides a smart TV voice recognition system. As shown in FIG. 2 of the principle structure diagram of an embodiment of the smart TV voice recognition system, the smart TV voice recognition system 60 includes a voice receiving module 61 and voiceprint recognition. Module 62 and voice recognition module 63.
所述语音接收模块61用于智能电视接收用户交互操作的语音指令。在使用智能电视的家庭中,各个家庭成员用户的口音各自不同,甚至由可能使用不同的方言,现有智能电视在语音识别功能上虽然对于方言也可以识别,但是在操作的过程中,用户在使用方言与智能电视进行交互时,智能电视的语音识别技术无法直接确定用户的方言种类,而需要由用户自行进行选择所采用的方言,也就意味着智能电视无法直接识别各个用户的方言从而进行语音识别。本公开系统在使用智能电视的语音识别进行人机交互操作过程中,可以直接接收用户的方言交互语音指令,当然,作为另一实施方式,智能电视可在其中预先为用户创建对应的声纹特征档来自动选择用户的方言并直接进行识别,即该系统60还包括用户声纹特征模块64,用于预先为每个智能电视用户创建对应的声纹特征档,并包含用户声纹特征对应的方言种类。The voice receiving module 61 is used for the smart TV to receive voice instructions for user interaction operations. In families using smart TVs, the accents of the users of each family member are different, and they may even use different dialects. Although the existing smart TVs can recognize dialects in the voice recognition function, the user is in the process of operation. When using dialects to interact with smart TVs, the smart TV’s voice recognition technology cannot directly determine the user’s dialect type, but the user needs to choose the dialect used by himself, which means that the smart TV cannot directly recognize the dialect of each user to perform Speech Recognition. The system of the present disclosure can directly receive the user’s dialect interactive voice command during the human-computer interaction operation using the voice recognition of the smart TV. Of course, as another embodiment, the smart TV can create corresponding voiceprint features for the user in advance. File to automatically select the user’s dialect and directly recognize it, that is, the system 60 also includes a user voiceprint feature module 64, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and contains the corresponding voiceprint feature of the user Types of dialects.
智能电视为家庭成员用户事先均根据各自的方言进行声纹特征建档,以保证后续智能电视语音的识别过程中能够直接根据该方言选择对应的方言语音识别方案进行识别,因此,在建立用户声纹特征档时同时还需要为其对应选择所使用的方言种类。Smart TV sets up voiceprint feature files according to their respective dialects for family members in advance to ensure that the subsequent smart TV voice recognition process can directly select the corresponding dialect voice recognition scheme for recognition. Therefore, when establishing user voice It is also necessary to select the dialect type used for the pattern profile.
所述声纹识别模块62用于判断所述语音接收模块61接收到的用户交互操作的语音指令的声纹特征并确定用户使用的方言种类。The voiceprint recognition module 62 is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module 61 and determine the type of dialect used by the user.
具体而言,声纹识别模块62是对用户交互操作的语音指令进行声纹识别,并根据上述过程中智能电视中建立的用户声纹特征档进行用户的确认,进而能够直接确定该用 户使用的是何种方言,不像现有技术智能电视接收到用户方言的交互操作语音时,需要用户再进行方言的选择以进行下一步的语音识别,本公开系统能够直接根据该用户的方言进行语音识别方案的确认,从而跳过方言选择的过程,提高用户使用语音识别技术中的体验。比如,一个说方言的用户(广东话)第一次在电视前使用方言说“我想看XX节目”,此时,按照现有技术的操作,电视界面会弹出各种方言:广东话、四川话、湖南话等等的方言识别结果给用户,用户需要进一步判断其方言为广东话的种类后,电视才能进行后续的语音识别操作。采用本公开系统时,一个说方言的用户(广东话)第一次在电视前使用方言说“我想看XX节目”,此时,电视界面不会弹出各种方言供用户进行选择确认方言种类后才能进行下一步的语音识别,而是通过声纹识别模块确认用户后直接选择该用户的方言种类匹配后采用广东话的语音识别方案进行识别。当所述声纹识别模块判断用户交互操作的语音指令的声纹特征不在所述用户声纹特征模块中的用户声纹特征时,则由所述用户声纹特征模块为该声纹特征的用户新创建对应的声纹特征档,同时确定对应使用的方言种类。Specifically, the voiceprint recognition module 62 performs voiceprint recognition on the voice commands of the user's interactive operation, and confirms the user according to the user's voiceprint profile established in the smart TV in the above process, so as to directly determine the user's use Which dialect, unlike the prior art smart TV, when receiving the user’s interactive operation voice in the user’s dialect, the user is required to select the dialect for the next step of voice recognition. The disclosed system can directly perform voice recognition based on the user’s dialect Confirmation of the plan, thereby skipping the process of dialect selection, and improving the user experience in using voice recognition technology. For example, a dialect-speaking user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, according to the existing technology, the TV interface will pop up various dialects: Cantonese, Sichuan The recognition results of dialects such as Chinese dialect, Hunan dialect, etc. are given to the user. The user needs to further determine the type of Cantonese dialect before the TV can perform subsequent voice recognition operations. When using the disclosed system, a dialect user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, the TV interface will not pop up various dialects for the user to choose and confirm the dialect type Only then can the next step of speech recognition be performed, but after confirming the user through the voiceprint recognition module, the user’s dialect type is directly selected and the Cantonese speech recognition scheme is used for recognition. When the voiceprint recognition module determines that the voiceprint feature of the user interactive operation voice command is not the user's voiceprint feature in the user's voiceprint feature module, the user's voiceprint feature module is the user of the voiceprint feature Create a new corresponding voiceprint profile, and determine the corresponding dialect type.
当然,作为另一种实施方式,声纹识别模块还可以采用与智能电视网络连接的声纹识别服务器来实现,采用与智能电视网络连接的声纹识别服务器的方式能够使得智能电视保存更多的用户声纹特征信息。Of course, as another implementation manner, the voiceprint recognition module can also be implemented by using a voiceprint recognition server connected to a smart TV network. Using a voiceprint recognition server connected to a smart TV network can make the smart TV save more User voiceprint feature information.
所述语音识别模块63用于根据所述声纹识别模块62识别出的用户交互操作的语音指令声纹特征对应的方言种类,并直接将用户的语音转化为文字以识别出用户的语音指令。The voice recognition module 63 is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice command voiceprint feature of the user interactive operation recognized by the voiceprint recognition module 62.
同上声纹识别模块道理,语音识别模块63也可以采用与智能电视网络连接的语音识别服务器来实现,同理,采用与智能电视网络连接的语音识别服务器能够使得智能电视保存更多的语音识别方案,也可以根据需要不断扩展更新。Same as the voiceprint recognition module, the voice recognition module 63 can also be implemented by a voice recognition server connected to the smart TV network. Similarly, the use of a voice recognition server connected to the smart TV network can make the smart TV save more voice recognition solutions , Can also be continuously expanded and updated as needed.
本公开还提供了一种可读存储介质,所述可读存储介质存储有智能电视语音识别的程序,该智能电视语音识别的程序被处理器执行时实现上所述智能电视语音识别方法的步骤。程序具体执行的过程与上述智能电视语音识别方法的实施例相同,此处不赘述。The present disclosure also provides a readable storage medium that stores a smart TV voice recognition program, and when the smart TV voice recognition program is executed by a processor, the steps of the smart TV voice recognition method are implemented . The specific execution process of the program is the same as the above embodiment of the smart TV voice recognition method, and will not be repeated here.
应当理解的是,以上所述仅为本公开的较佳实施例而已,并不足以限制本公开的技 术方案,对本领域普通技术人员来说,在本公开的精神和原则之内,可以根据上述说明加以增减、替换、变换或改进,而所有这些增减、替换、变换或改进后的技术方案,都应属于本公开所附权利要求的保护范围。It should be understood that the above descriptions are only preferred embodiments of the present disclosure and are not sufficient to limit the technical solutions of the present disclosure. For those of ordinary skill in the art, within the spirit and principles of the present disclosure, the above The description adds or reduces, replaces, changes or improves, and all the technical solutions after adding or reducing, replacing, changing or improving should belong to the protection scope of the appended claims of the present disclosure.

Claims (13)

  1. 一种智能电视语音识别方法,用于智能电视识别用户的方言,其中,包括如下步骤:A smart TV voice recognition method for smart TV to recognize the dialect of a user, which includes the following steps:
    智能电视接收用户交互操作的语音指令;Smart TV receives voice instructions for user interaction;
    声纹识别模块根据用户交互操作的语音指令的声纹特征确定用户使用的方言种类;The voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command of the user's interactive operation;
    语音识别模块根据该用户使用的方言种类直接将用户交互操作的语音指令转化为文字以识别出用户的语音指令。The voice recognition module directly converts the voice commands of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice commands.
  2. 根据权利要求1所述的一种智能电视语音识别方法,其中,在智能电视接收用户交互操作的语音指令之前还包括如下步骤:A smart TV voice recognition method according to claim 1, wherein before the smart TV receives a voice command for user interaction operations, the method further comprises the following steps:
    智能电视预先为每个用户创建对应的声纹特征档;Smart TV creates corresponding voiceprint profile for each user in advance;
    用户选择确认对应的声纹特征档中的方言种类。The user selects and confirms the dialect type in the corresponding voiceprint profile.
  3. 根据权利要求2所述的一种智能电视语音识别方法,其中,所述声纹识别模块根据用户交互操作的语音指令的声纹特征确定用户使用的方言种类,包括:A smart TV voice recognition method according to claim 2, wherein the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice instructions of the user's interactive operation, including:
    声纹识别模块对用户交互操作的语音指令进行声纹识别,并根据所述声纹特征档进行用户的确认,并确定该用户使用的方言种类。The voiceprint recognition module performs voiceprint recognition on the voice instructions of the user's interactive operation, confirms the user according to the voiceprint profile, and determines the type of dialect used by the user.
  4. 根据权利要求2所述的一种智能电视语音识别方法,其中,当所述声纹识别模块判断用户交互操作的语音指令的声纹特征不在智能电视预先为每个用户创建的对应声纹特征档中时,则智能电视为该声纹特征的用户新创建对应的声纹特征档,用户选择确认对应的声纹特征档中的方言种类。A smart TV voice recognition method according to claim 2, wherein when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user's interactive operation is not in the corresponding voiceprint feature file created by the smart TV in advance for each user In the middle, the smart TV newly creates a corresponding voiceprint profile for the user with the voiceprint profile, and the user chooses to confirm the dialect type in the corresponding voiceprint profile.
  5. 根据权利要求1至4任一项所述的一种智能电视语音识别方法,其中,所述声纹识别模块可采用与智能电视网络连接的声纹识别服务器来实现。A smart TV voice recognition method according to any one of claims 1 to 4, wherein the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network.
  6. 根据权利要求1至4任一项所述的一种智能电视语音识别方法,其中,所述语音识别模块可采用与智能电视网络连接的语音识别服务器来实现。A smart TV voice recognition method according to any one of claims 1 to 4, wherein the voice recognition module can be implemented by a voice recognition server connected to a smart TV network.
  7. 一种智能电视语音识别系统,用于智能电视识别用户的方言,其中,该智能电视语音识别系统包括语音接收模块、声纹识别模块和语音识别模块;A smart TV voice recognition system for smart TV to recognize the dialect of a user, wherein the smart TV voice recognition system includes a voice receiving module, a voiceprint recognition module, and a voice recognition module;
    所述语音接收模块用于接收用户交互操作的语音指令;The voice receiving module is used to receive voice instructions for user interaction operations;
    所述声纹识别模块用于判断所述语音接收模块接收到的用户交互操作的语音指令 的声纹特征并确定用户使用的方言种类;The voiceprint recognition module is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module and determine the type of dialect used by the user;
    所述语音识别模块用于根据所述声纹识别模块识别出的用户交互操作的语音指令声纹特征对应的方言种类,并直接将用户的语音转化为文字以识别出用户的语音指令。The voice recognition module is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice command voiceprint feature of the user interactive operation recognized by the voiceprint recognition module.
  8. 根据权利要求7所述的一种智能电视语音识别系统,其中,还包括用户声纹特征模块,用于预先为每个智能电视用户创建对应的声纹特征档,并包含用户声纹特征对应的方言种类。A smart TV voice recognition system according to claim 7, further comprising a user voiceprint feature module, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and contains the corresponding voiceprint feature of the user Types of dialects.
  9. 根据权利要求8所述的一种智能电视语音识别系统,其中,所述声纹识别模块,用于对用户交互操作的语音指令进行声纹识别,并根据所述声纹特征档进行用户的确认,并确定该用户使用的方言种类。A smart TV voice recognition system according to claim 8, wherein the voiceprint recognition module is configured to perform voiceprint recognition on the voice commands of the user's interactive operation, and perform user confirmation according to the voiceprint profile , And determine the type of dialect used by the user.
  10. 根据权利要求8所述的一种智能电视语音识别系统,其中,当所述声纹识别模块判断用户交互操作的语音指令的声纹特征不在所述用户声纹特征模块中的用户声纹特征时,则由所述用户声纹特征模块为该声纹特征的用户新创建对应的声纹特征档,同时确定对应使用的方言种类。A smart TV voice recognition system according to claim 8, wherein when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interactive operation is not the user voiceprint feature in the user voiceprint feature module , The user voiceprint feature module creates a new corresponding voiceprint profile for the user of the voiceprint feature, and at the same time determines the corresponding dialect type.
  11. 根据权利要求7至10任一项所述的一种智能电视语音识别系统,其中,所述声纹识别模块可采用与智能电视网络连接的声纹识别服务器来实现。A smart TV voice recognition system according to any one of claims 7 to 10, wherein the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network.
  12. 根据权利要求7至10任一项所述的一种智能电视语音识别系统,其中,所述语音识别模块可采用与智能电视网络连接的语音识别服务器来实现。A smart TV voice recognition system according to any one of claims 7 to 10, wherein the voice recognition module can be implemented by a voice recognition server connected to a smart TV network.
  13. 一种可读存储介质,其中,所述可读存储介质存储有智能电视语音识别的程序,该智能电视语音识别的程序被处理器执行时实现权利要求1至6任一项所述智能电视语音识别方法的步骤。A readable storage medium, wherein the readable storage medium stores a smart TV voice recognition program, and the smart TV voice recognition program realizes the smart TV voice of any one of claims 1 to 6 when executed by a processor Identify the steps of the method.
PCT/CN2020/103545 2019-07-26 2020-07-22 Smart television speech recognition method, system and readable storage medium WO2021017978A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910682661.XA CN112312181A (en) 2019-07-26 2019-07-26 Smart television voice recognition method, system and readable storage medium
CN201910682661.X 2019-07-26

Publications (1)

Publication Number Publication Date
WO2021017978A1 true WO2021017978A1 (en) 2021-02-04

Family

ID=74229363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/103545 WO2021017978A1 (en) 2019-07-26 2020-07-22 Smart television speech recognition method, system and readable storage medium

Country Status (2)

Country Link
CN (1) CN112312181A (en)
WO (1) WO2021017978A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593340A (en) * 2013-10-28 2014-02-19 茵鲁维夫有限公司 Natural expression information processing method, natural expression information processing and responding method, equipment and system
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106847281A (en) * 2017-02-26 2017-06-13 上海新柏石智能科技股份有限公司 Intelligent household voice control system and method based on voice fuzzy identification technology
CN107809667A (en) * 2017-10-26 2018-03-16 深圳创维-Rgb电子有限公司 Television voice exchange method, interactive voice control device and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100032140A (en) * 2008-09-17 2010-03-25 주식회사 현대오토넷 Method of interactive voice recognition and apparatus for interactive voice recognition
CN102638605A (en) * 2011-02-14 2012-08-15 苏州巴米特信息科技有限公司 Speech system for recognizing dialect background mandarin
KR20140089863A (en) * 2013-01-07 2014-07-16 삼성전자주식회사 Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
CN105872687A (en) * 2016-03-31 2016-08-17 乐视控股(北京)有限公司 Method and device for controlling intelligent equipment through voice
CN206117701U (en) * 2016-09-30 2017-04-19 无锡小天鹅股份有限公司 Domestic appliance and control system thereof
CN107170454B (en) * 2017-05-31 2022-04-05 Oppo广东移动通信有限公司 Speech recognition method and related product
CN107580237A (en) * 2017-09-05 2018-01-12 深圳Tcl新技术有限公司 Operating method, device, system and the storage medium of TV
CN108172223A (en) * 2017-12-14 2018-06-15 深圳市欧瑞博科技有限公司 Voice instruction recognition method, device and server and computer readable storage medium
CN109785832A (en) * 2018-12-20 2019-05-21 安徽声讯信息技术有限公司 A kind of old man's set-top box Intelligent voice recognition method suitable for accent again

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593340A (en) * 2013-10-28 2014-02-19 茵鲁维夫有限公司 Natural expression information processing method, natural expression information processing and responding method, equipment and system
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106847281A (en) * 2017-02-26 2017-06-13 上海新柏石智能科技股份有限公司 Intelligent household voice control system and method based on voice fuzzy identification technology
CN107809667A (en) * 2017-10-26 2018-03-16 深圳创维-Rgb电子有限公司 Television voice exchange method, interactive voice control device and storage medium

Also Published As

Publication number Publication date
CN112312181A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
KR102373905B1 (en) Shortened voice user interface for assistant applications
US10372831B2 (en) Auto-translation for multi user audio and video
WO2020029500A1 (en) Voice command customization method, device, apparatus, and computer storage medium
KR100679043B1 (en) Apparatus and method for spoken dialogue interface with task-structured frames
US9350862B2 (en) System and method for processing speech
US8064573B2 (en) Computer generated prompting
CN107977236B (en) Question-answering system generation method, terminal device, storage medium and question-answering system
CN102855874A (en) Method and system for controlling household appliance on basis of voice interaction of internet
CN110019716B (en) Multi-turn question and answer method, terminal equipment and storage medium
CN109360563A (en) Voice control method and device, storage medium and air conditioner
JP2003263188A (en) Voice command interpreter with dialog focus tracking function, its method and computer readable recording medium with the method recorded
US20170270909A1 (en) Method for correcting false recognition contained in recognition result of speech of user
US8725505B2 (en) Verb error recovery in speech recognition
CN109616111A (en) A kind of scene interactivity control method based on speech recognition
CN112866086A (en) Information pushing method, device, equipment and storage medium for intelligent outbound
CN111933135A (en) Terminal control method and device, intelligent terminal and computer readable storage medium
KR20060014369A (en) Speaker-dependent voice recognition method and voice recognition system
WO2021017978A1 (en) Smart television speech recognition method, system and readable storage medium
CN109767775A (en) Voice control method and device and air conditioner
CN111866883A (en) Router WIFI setting method and system based on voice module
CN111292749A (en) Session control method and device of intelligent voice platform
CN109960489B (en) Method, device, equipment, medium and question-answering system for generating intelligent question-answering system
CN105120116A (en) Method for creating language recognition menu and mobile terminal
CN106653026A (en) Intelligent robot home theater system based on voice control and control method of intelligent robot home theater system
CN105118507B (en) Voice activated control and its control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20847317

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20847317

Country of ref document: EP

Kind code of ref document: A1