WO2021017978A1

WO2021017978A1 - Smart television speech recognition method, system and readable storage medium

Info

Publication number: WO2021017978A1
Application number: PCT/CN2020/103545
Authority: WO
Inventors: 鲍舰
Original assignee: 深圳Tcl新技术有限公司
Priority date: 2019-07-26
Filing date: 2020-07-22
Publication date: 2021-02-04
Also published as: CN112312181A

Abstract

Provided in the present disclosure are a smart television speech recognition method, a system and a storage medium, which are used by a smart television to recognize the dialect of a user. A smart television receives a voice command of a user interactive operation; a voiceprint recognition module determines the dialect type used by the user according to voiceprint characteristics of the voice command of the user interactive operation; and according to the dialect type used by the user, a voice recognition module directly converts the voice command of the user interactive operation into writing so as to recognize the voice command of the user. In the present disclosure, the user operates the smart television by means of voice operation. During the entire operation process of the smart television recognizing the voice of the user and performing recognition feedback, the user does not need to choose a dialect type. For families that use a smart television and have many dialects, the dialect spoked by a user can be automatically identified and a voice command of the user interactive operation can be directly recognized according to speech recognition technology for dialects, which greatly reduces the number of times users choose dialects and improves the experience of a user using voice operation.

Description

Smart TV voice recognition method, system and readable storage medium

priority

This disclosure requires a Chinese patent application filed with the Chinese Patent Office with the application number "201910682661.X" and the application name "A smart TV voice recognition method, system and readable storage medium" on July 26, 2019. The priority of, the entire content of which is incorporated into this disclosure by reference.

Technical field

The present disclosure relates to the field of speech recognition technology, and in particular to a method, system and readable storage medium.

Background technique

At present, the application of voice recognition technology on smart TVs has become widespread. Users can select movies, play music, and even control various household appliances by speaking. For some vast countries, such as China, the pronunciation of various local dialects is very different. Although the voice recognition technology on smart TVs can recognize local dialects, the prerequisite is that the user needs to set the dialects to use on the TV in advance. It is not possible to perform voice recognition randomly based on the dialect spoken by the user. In other words, the dialect of the user needs to be preset in the smart TV before the smart TV can recognize the user's dialect, otherwise the smart TV voice AI technology cannot automatically recognize what the user said Local dialects.

For a family, the TV is an electrical equipment shared by the whole family. The elderly may speak the native dialect, and the children only speak Mandarin due to school education. In a family, there may be the possibility of multiple dialects. It is not realistic for a member to preset the corresponding dialect. Even if there are multiple dialects preset in the TV, the user needs to set the dialect when using it, which undoubtedly brings a lot of inconvenience to the use of TV users, making smart TV users The experience is not good.

There are also some voice recognition technologies in the prior art that solve this need to preset dialects, such as judging based on the geographic location of the smart TV, that is, based on the IP address of the smart TV networked, and judging the user’s status based on the relationship between the IP address and the geographic location Geographical location, and then determine the preferred dialect type for smart TVs based on geographic location, but the problem with determining dialects based on geographic location is that for some immigrant cities or cities with large foreign populations, the geographic location setting is obviously not Solve the problem.

Therefore, the existing technology needs to be improved and developed.

Public content

In view of the above-mentioned shortcomings of the prior art, the present disclosure proposes a smart TV automatic dialect matching technology, which enables the smart TV to automatically match the dialect spoken by the user without setting the dialect in advance to achieve automatic identification of the dialect.

The technical solutions adopted by the present disclosure to solve the technical problems are as follows:

A smart TV voice recognition method for smart TV to recognize the dialect of a user, including the following steps:

Smart TV receives voice instructions for user interaction;

The voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command operated by the user;

The voice recognition module directly converts the voice commands of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice commands.

In an embodiment, before the smart TV receives the voice instruction for the user interaction operation, the following steps are further included:

Smart TV creates corresponding voiceprint profile for each user in advance;

The user selects and confirms the dialect type in the corresponding voiceprint profile.

In one embodiment, the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command of the user's interactive operation, including:

The voiceprint recognition module performs voiceprint recognition on the voice instructions of the user's interactive operation, confirms the user according to the voiceprint profile, and determines the type of dialect used by the user.

In one embodiment, when the voiceprint recognition module determines that the voiceprint feature of the voice command is not in the corresponding voiceprint profile created by the smart TV in advance for each user, the smart TV is the voiceprint feature of the voiceprint. The user newly creates a corresponding voiceprint profile, and the user selects and confirms the dialect type in the corresponding voiceprint profile.

In one embodiment, the voiceprint recognition module may be implemented by a voiceprint recognition server connected to a smart TV network.

In an embodiment, the voice recognition module may be implemented by a voice recognition server connected to a smart TV network.

The present disclosure also provides a smart TV voice recognition system for smart TV to recognize the dialect of a user. The smart TV voice recognition system includes a voice receiving module, a voiceprint recognition module, and a voice recognition module;

The voice receiving module is used for the smart TV to receive voice instructions for user interaction operations;

The voiceprint recognition module is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module and determine the type of dialect used by the user;

The voice recognition module is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice voiceprint feature of the user interaction operation recognized by the voiceprint recognition module.

In one embodiment, the above system further includes a user voiceprint feature module, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and include the dialect type corresponding to the user's voiceprint feature.

In one embodiment, the voiceprint recognition module is configured to perform voiceprint recognition on the voice instructions of the user's interactive operation, confirm the user according to the voiceprint profile, and determine the type of dialect used by the user.

In one embodiment, when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interaction operation is not the user voiceprint feature in the user voiceprint feature module, the user voiceprint feature module is The user of the voiceprint feature newly creates a corresponding voiceprint profile and at the same time determines the corresponding dialect type.

In one embodiment, the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network;

The present disclosure also provides a readable storage medium that stores a smart TV voice recognition program, and when the smart TV voice recognition program is executed by a processor, the steps of the smart TV voice recognition method are realized.

Compared with the prior art, the present disclosure uses the voiceprint feature recognition module to pre-file the user's voiceprint features of the smart TV and the corresponding dialect types. When the user operates the smart TV through the voice operation function of the smart TV, The voiceprint feature recognition module pre-recognizes the user’s voiceprint features to determine the user’s voiceprint features and the preset dialect types, and then directly calls the voice recognition module to directly convert the voice commands for dialect user interaction operations into Text, in the entire operation process of the user operating the smart TV by voice, the smart TV recognizes the user’s voice and performs recognition feedback, without the user’s choice of the type of dialect, which is smart for families that use smart TVs and multiple dialects The TV can automatically recognize the dialect spoken by the user and directly recognize the voice command of the user's interactive operation based on the voice recognition technology of the dialect. The present disclosure greatly reduces the number of dialect selections of smart TV users, and improves the user's experience of using voice operations.

Description of the drawings

The specific embodiments of the present disclosure will be further described below in conjunction with the accompanying drawings, in which:

Fig. 1 is a flowchart of an embodiment of a smart TV voice recognition method of the present disclosure.

Fig. 2 is a schematic structural diagram of an embodiment of a smart TV voice recognition system of the present disclosure.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present disclosure clearer and clearer, the present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure, but not used to limit the present disclosure.

The process of the smart TV voice recognition method provided by the present disclosure is shown in FIG. 1. The smart TV voice recognition method of the present disclosure includes the following implementation steps:

In step S100, the smart TV receives a voice command of the user's interactive operation.

In families using smart TVs, the accents of the users of each family member are different, and they may even use different dialects. Although the existing smart TVs can recognize dialects in the voice recognition function, the user is in the process of operation. When using dialects to interact with smart TVs, the smart TV’s voice recognition technology cannot directly determine the user’s dialect type, but the user needs to choose the dialect used by himself, which means that the smart TV cannot directly recognize the dialect of each user to perform Speech Recognition. The method of the present disclosure can directly receive the user’s dialect interactive voice command during the human-computer interaction operation using the voice recognition of the smart TV. Of course, as another embodiment, the smart TV can create corresponding voiceprint features for the user in advance File to automatically select the user’s dialect and directly recognize it. Before the smart TV receives the user’s interactive operation voice instructions, the following steps may also be included:

The smart TV creates a corresponding voiceprint profile for each user in advance; the user chooses to confirm the dialect type in the corresponding voiceprint profile.

Smart TV sets up voiceprint feature files according to their respective dialects for family members in advance to ensure that the subsequent smart TV voice recognition process can directly select the corresponding dialect voice recognition scheme for recognition. Therefore, when establishing user voice It is also necessary to select the dialect type used for the pattern profile.

In step S200, the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint feature of the voice command of the user's interactive operation.

Specifically, the voiceprint recognition module performs voiceprint recognition on the voice commands of the user's interactive operation, and confirms the user according to the user's voiceprint profile established in the smart TV in the above process, and can directly determine whether the user is using Which dialect, unlike the prior art smart TV, when receiving the interactive operation voice of the user’s dialect, the user needs to select the dialect for the next step of voice recognition. The method of the present disclosure can directly perform a voice recognition solution based on the user’s dialect Confirmation, skip the process of dialect selection, and improve the user’s experience in using voice recognition technology. For example, a dialect-speaking user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, according to the existing technology, the TV interface will pop up various dialects: Cantonese, Sichuan The recognition results of dialects such as Chinese dialect, Hunan dialect, etc. are given to the user. The user needs to further determine the type of Cantonese dialect before the TV can perform subsequent voice recognition operations. When using the disclosed method, a dialect user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, the TV interface will not pop up various dialects for the user to choose and confirm the dialect type Only then can the next step of speech recognition be performed, but after confirming the user through the voiceprint recognition module, the user’s dialect type is directly selected and the Cantonese speech recognition scheme is used for recognition. When the voiceprint recognition module determines that the voiceprint feature of the voice command of the user's interactive operation is not in the corresponding voiceprint profile created by the smart TV in advance for each user, the smart TV creates a new corresponding voiceprint feature for the user with the voiceprint feature. For the voiceprint profile, the user chooses to confirm the dialect type in the corresponding voiceprint profile.

Of course, as another implementation manner, the voiceprint recognition module can also be implemented by using a voiceprint recognition server connected to a smart TV network. Using a voiceprint recognition server connected to a smart TV network can make the smart TV save more User voiceprint feature information.

In step S300, the voice recognition module directly converts the voice command of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice command.

Same as the voiceprint recognition module, the voice recognition module can also be implemented by a voice recognition server connected to the smart TV network. Similarly, the use of a voice recognition server connected to the smart TV network can make the smart TV save more voice recognition solutions. It can also be continuously expanded and updated as needed.

The embodiments of the present disclosure use voiceprint feature recognition technology to distinguish the smart TV users in the home, and directly perform voice recognition according to the user dialect set in advance, so as to realize automatic dialect voice matching in the smart TV voice recognition process.

The present disclosure also provides a smart TV voice recognition system. As shown in FIG. 2 of the principle structure diagram of an embodiment of the smart TV voice recognition system, the smart TV voice recognition system 60 includes a voice receiving module 61 and voiceprint recognition. Module 62 and voice recognition module 63.

The voice receiving module 61 is used for the smart TV to receive voice instructions for user interaction operations. In families using smart TVs, the accents of the users of each family member are different, and they may even use different dialects. Although the existing smart TVs can recognize dialects in the voice recognition function, the user is in the process of operation. When using dialects to interact with smart TVs, the smart TV’s voice recognition technology cannot directly determine the user’s dialect type, but the user needs to choose the dialect used by himself, which means that the smart TV cannot directly recognize the dialect of each user to perform Speech Recognition. The system of the present disclosure can directly receive the user’s dialect interactive voice command during the human-computer interaction operation using the voice recognition of the smart TV. Of course, as another embodiment, the smart TV can create corresponding voiceprint features for the user in advance. File to automatically select the user’s dialect and directly recognize it, that is, the system 60 also includes a user voiceprint feature module 64, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and contains the corresponding voiceprint feature of the user Types of dialects.

The voiceprint recognition module 62 is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module 61 and determine the type of dialect used by the user.

Specifically, the voiceprint recognition module 62 performs voiceprint recognition on the voice commands of the user's interactive operation, and confirms the user according to the user's voiceprint profile established in the smart TV in the above process, so as to directly determine the user's use Which dialect, unlike the prior art smart TV, when receiving the user’s interactive operation voice in the user’s dialect, the user is required to select the dialect for the next step of voice recognition. The disclosed system can directly perform voice recognition based on the user’s dialect Confirmation of the plan, thereby skipping the process of dialect selection, and improving the user experience in using voice recognition technology. For example, a dialect-speaking user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, according to the existing technology, the TV interface will pop up various dialects: Cantonese, Sichuan The recognition results of dialects such as Chinese dialect, Hunan dialect, etc. are given to the user. The user needs to further determine the type of Cantonese dialect before the TV can perform subsequent voice recognition operations. When using the disclosed system, a dialect user (Cantonese) uses a dialect to say "I want to watch XX program" in front of the TV for the first time. At this time, the TV interface will not pop up various dialects for the user to choose and confirm the dialect type Only then can the next step of speech recognition be performed, but after confirming the user through the voiceprint recognition module, the user’s dialect type is directly selected and the Cantonese speech recognition scheme is used for recognition. When the voiceprint recognition module determines that the voiceprint feature of the user interactive operation voice command is not the user's voiceprint feature in the user's voiceprint feature module, the user's voiceprint feature module is the user of the voiceprint feature Create a new corresponding voiceprint profile, and determine the corresponding dialect type.

The voice recognition module 63 is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice command voiceprint feature of the user interactive operation recognized by the voiceprint recognition module 62.

Same as the voiceprint recognition module, the voice recognition module 63 can also be implemented by a voice recognition server connected to the smart TV network. Similarly, the use of a voice recognition server connected to the smart TV network can make the smart TV save more voice recognition solutions , Can also be continuously expanded and updated as needed.

The present disclosure also provides a readable storage medium that stores a smart TV voice recognition program, and when the smart TV voice recognition program is executed by a processor, the steps of the smart TV voice recognition method are implemented . The specific execution process of the program is the same as the above embodiment of the smart TV voice recognition method, and will not be repeated here.

It should be understood that the above descriptions are only preferred embodiments of the present disclosure and are not sufficient to limit the technical solutions of the present disclosure. For those of ordinary skill in the art, within the spirit and principles of the present disclosure, the above The description adds or reduces, replaces, changes or improves, and all the technical solutions after adding or reducing, replacing, changing or improving should belong to the protection scope of the appended claims of the present disclosure.

Claims

A smart TV voice recognition method for smart TV to recognize the dialect of a user, which includes the following steps:

Smart TV receives voice instructions for user interaction;

The voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice command of the user's interactive operation;

The voice recognition module directly converts the voice commands of the user's interactive operation into text according to the type of dialect used by the user to recognize the user's voice commands.
A smart TV voice recognition method according to claim 1, wherein before the smart TV receives a voice command for user interaction operations, the method further comprises the following steps:

Smart TV creates corresponding voiceprint profile for each user in advance;

The user selects and confirms the dialect type in the corresponding voiceprint profile.
A smart TV voice recognition method according to claim 2, wherein the voiceprint recognition module determines the type of dialect used by the user according to the voiceprint characteristics of the voice instructions of the user's interactive operation, including:

The voiceprint recognition module performs voiceprint recognition on the voice instructions of the user's interactive operation, confirms the user according to the voiceprint profile, and determines the type of dialect used by the user.
A smart TV voice recognition method according to claim 2, wherein when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user's interactive operation is not in the corresponding voiceprint feature file created by the smart TV in advance for each user In the middle, the smart TV newly creates a corresponding voiceprint profile for the user with the voiceprint profile, and the user chooses to confirm the dialect type in the corresponding voiceprint profile.
A smart TV voice recognition method according to any one of claims 1 to 4, wherein the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network.
A smart TV voice recognition method according to any one of claims 1 to 4, wherein the voice recognition module can be implemented by a voice recognition server connected to a smart TV network.
A smart TV voice recognition system for smart TV to recognize the dialect of a user, wherein the smart TV voice recognition system includes a voice receiving module, a voiceprint recognition module, and a voice recognition module;

The voice receiving module is used to receive voice instructions for user interaction operations;

The voiceprint recognition module is used to determine the voiceprint characteristics of the voice instructions of the user interaction operation received by the voice receiving module and determine the type of dialect used by the user;

The voice recognition module is configured to directly convert the user's voice into text to recognize the user's voice instruction according to the dialect type corresponding to the voice command voiceprint feature of the user interactive operation recognized by the voiceprint recognition module.
A smart TV voice recognition system according to claim 7, further comprising a user voiceprint feature module, which is used to create a corresponding voiceprint profile for each smart TV user in advance, and contains the corresponding voiceprint feature of the user Types of dialects.
A smart TV voice recognition system according to claim 8, wherein the voiceprint recognition module is configured to perform voiceprint recognition on the voice commands of the user's interactive operation, and perform user confirmation according to the voiceprint profile , And determine the type of dialect used by the user.
A smart TV voice recognition system according to claim 8, wherein when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interactive operation is not the user voiceprint feature in the user voiceprint feature module , The user voiceprint feature module creates a new corresponding voiceprint profile for the user of the voiceprint feature, and at the same time determines the corresponding dialect type.
A smart TV voice recognition system according to any one of claims 7 to 10, wherein the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart TV network.
A smart TV voice recognition system according to any one of claims 7 to 10, wherein the voice recognition module can be implemented by a voice recognition server connected to a smart TV network.
A readable storage medium, wherein the readable storage medium stores a smart TV voice recognition program, and the smart TV voice recognition program realizes the smart TV voice of any one of claims 1 to 6 when executed by a processor Identify the steps of the method.