CN112312181A

CN112312181A - Smart television voice recognition method, system and readable storage medium

Info

Publication number: CN112312181A
Application number: CN201910682661.XA
Authority: CN
Inventors: 鲍舰
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2021-02-02
Also published as: WO2021017978A1

Abstract

The invention provides a voice recognition method, a voice recognition system and a storage medium for a smart television, which are used for recognizing dialects of users by the smart television, and receiving voice instructions of user interactive operation by the smart television; the voice print recognition module determines the dialect type used by the user according to the voice print characteristics of the voice command of the user interactive operation; the voice recognition module directly converts the voice instruction of the user interactive operation into characters according to the dialect type used by the user so as to recognize the voice instruction of the user. In the invention, the user operates the intelligent television through voice, the intelligent television identifies the voice of the user and identifies and feeds back the voice, the user does not need to select dialect types, and for families using the intelligent television and having various dialects, the dialects spoken by the user can be automatically identified and voice instructions of the user interactive operation can be directly identified according to the voice identification technology of the dialects, so that the selection times of the dialects by the user are greatly reduced, and the experience of the user in voice operation is improved.

Description

Smart television voice recognition method, system and readable storage medium

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a method, a system, and a readable storage medium.

Background

At present, the application of voice recognition technology on smart televisions is widespread, and a user can select a movie, play music and even control various household appliances by speaking. For some countries with broad breadth, such as China, pronunciations of various local dialects are greatly different, although the voice recognition technology on the smart television can recognize the local dialects, the precondition is that the dialects used by the user are preset on the television, and voice recognition cannot be randomly performed according to the dialects spoken by the user, in other words, the dialects of the user need to be preset in the smart television firstly, the smart television can recognize the dialects spoken by the user, otherwise, the voice AI technology of the smart television cannot automatically recognize the local dialects spoken by the user.

For a family, a television is a public electrical appliance in the whole family, an old person may speak a hometown word, a child only speaks a mandarin for school education, and in the family, the possibility of multiple dialects may exist, and it is not practical to preset the corresponding dialects for each member in the family in the television.

The prior art also has some voice recognition technologies for solving the problem that dialects need to be preset, for example, the judgment is carried out according to the geographic position of the smart television, that is, the geographic position of a user is judged according to the IP address of the smart television network, and then the preferred dialect type of the smart television is determined according to the geographic position.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the defects of the prior art, the invention provides an automatic dialect matching technology for a smart television, so that the dialect spoken by a user is automatically matched under the condition that the dialect is not preset by the smart television, and the dialect is automatically identified.

The technical scheme adopted by the invention for solving the technical problem is as follows:

a speech recognition method for a smart television is used for the smart television to recognize dialects of users and comprises the following steps:

the intelligent television receives a voice instruction of user interactive operation;

the voice print recognition module determines the dialect type used by the user according to the voice print characteristics of the voice command operated by the user;

the voice recognition module directly converts the voice instruction of the user interactive operation into characters according to the dialect type used by the user so as to recognize the voice instruction of the user.

As a further improved technical scheme, the method also comprises the following steps:

the method comprises the steps that the smart television creates a corresponding voiceprint feature file for each user in advance;

and the user selects and confirms the dialect type in the corresponding voiceprint feature file.

As a further improved technical solution, when the voiceprint recognition module determines that the voiceprint feature of the voice instruction is not in the corresponding voiceprint feature file previously created for each user by the smart television, the smart television newly creates the corresponding voiceprint feature file for the user with the voiceprint feature, and the user selects and confirms the dialect type in the corresponding voiceprint feature file.

As a further improved technical solution, the voiceprint recognition module can be implemented by using a voiceprint recognition server connected to the smart television network.

As a further improved technical solution, the voice recognition module may be implemented by using a voice recognition server connected to an intelligent television network.

The invention also provides an intelligent television voice recognition system which is used for recognizing the dialect of the user by the intelligent television and comprises a voice receiving module, a voiceprint recognition module and a voice recognition module;

the voice receiving module is used for receiving a voice instruction of user interactive operation by the intelligent television;

the voice print recognition module is used for judging the voice print characteristics of the voice command of the user interactive operation received by the voice receiving module and determining the dialect type used by the user;

the voice recognition module is used for directly converting the voice of the user into characters according to the dialect type corresponding to the voice voiceprint characteristics of the user interaction operation recognized by the voiceprint recognition module so as to recognize the voice instruction of the user.

As a further improved technical solution, the system further includes a user voiceprint feature module, which is used for creating a corresponding voiceprint feature file for each smart television user in advance, and includes a dialect category corresponding to the user voiceprint feature.

As a further improved technical solution, when the voiceprint recognition module determines that the voiceprint feature of the voice instruction of the user interactive operation is not the user voiceprint feature in the user voiceprint feature module, the user voiceprint feature module creates a corresponding voiceprint feature file for the user of the voiceprint feature, and determines the dialect type used correspondingly.

As a further improved technical scheme, the voiceprint recognition module can be realized by adopting a voiceprint recognition server connected with an intelligent television network; the voice recognition module can be realized by adopting a voice recognition server connected with an intelligent television network.

The invention also provides a readable storage medium, wherein the readable storage medium stores a program for intelligent television voice recognition, and the steps of the intelligent television voice recognition method are realized when the program for intelligent television voice recognition is executed by a processor.

Compared with the prior art, the invention adopts the voiceprint feature recognition module to pre-document the voiceprint features of the user of the intelligent television and the dialect types correspondingly used, when the user operates the intelligent television through the voice operation function of the intelligent television, the voiceprint feature recognition module recognizes the voiceprint feature of the user in advance to determine the voiceprint feature of the user and the dialect type preset by the voiceprint feature recognition module, then directly calling a voice recognition module to directly convert the voice instruction of the dialect-like user interaction operation into a text, in the whole operation process that the user operates the intelligent television through voice and the intelligent television identifies the voice of the user and carries out identification feedback, the user does not need to select the dialect type, for a household using the intelligent television and with a plurality of dialects, the intelligent television can automatically recognize the dialect spoken by the user and directly recognize the voice instruction of the user interaction operation according to the voice recognition technology of the dialect. The invention greatly reduces the dialect selection times of the intelligent television user and improves the experience of the user in voice operation.

Drawings

The embodiments of the invention will be further described with reference to the accompanying drawings, in which:

fig. 1 is a flowchart of a speech recognition method for a smart television according to a preferred embodiment of the present invention.

Fig. 2 is a schematic structure diagram of a speech recognition system of a smart television according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The flow of the speech recognition method for the smart television provided by the invention is as shown in fig. 1, the flow of the preferred embodiment of the speech recognition method for the smart television of the invention is as shown in fig. 1, and the speech recognition method for the smart television of the invention comprises the following implementation steps:

and step S100, the intelligent television receives a voice instruction of user interactive operation.

In a family using the smart television, accents of users who are members of the family are different from each other, even different dialects may be used, although the dialects can be recognized by the existing smart television in the voice recognition function, in the operation process, when the users use the dialects to interact with the smart television, the voice recognition technology of the smart television cannot directly determine the dialects of the users, and the dialects which are selected by the users need to be selected by the users, that is, the smart television cannot directly recognize the dialects of the users, so that voice recognition is performed. The method of the invention can directly receive dialect interactive voice instruction of the user in the process of using voice recognition of the intelligent television to carry out man-machine interactive operation, of course, as another preferred embodiment, the intelligent television can create a corresponding voiceprint characteristic file for the user in advance to automatically select the dialect of the user and directly carry out recognition, and before the intelligent television receives the voice instruction of the user interactive operation, the method can also comprise the following steps:

the method comprises the steps that the smart television creates a corresponding voiceprint feature file for each user in advance; and the user selects and confirms the dialect type in the corresponding voiceprint feature file.

The smart television establishes voice print characteristic files for family member users in advance according to respective dialects so as to ensure that a corresponding dialect voice recognition scheme can be directly selected for recognition according to the dialects in the subsequent smart television voice recognition process, and therefore, the user voice print characteristic files also need to be correspondingly selected for the dialects.

Step S200, the voiceprint recognition module determines the dialect type used by the user according to the voiceprint characteristics of the voice command of the user interactive operation.

Specifically, the voiceprint recognition module performs voiceprint recognition on a voice instruction of user interactive operation, and confirms the user according to a voiceprint feature file of the user established in the intelligent television in the process, so that the dialect used by the user can be directly determined. For example, a dialect speaking user (Cantonese) says "I want to watch XX programs" in front of the TV for the first time, and then the TV interface pops up various dialects according to the prior art: the dialect recognition results of the Cantonese, the Sichuan and the Hunan languages are given to the user, and the user needs to further judge the dialect to be the type of the Cantonese, and then the television can carry out the subsequent voice recognition operation. When the method is adopted, a dialect-speaking user (Guangdong dialect) says that the user wants to watch XX programs before a television for the first time, at the moment, various dialects cannot be popped up on a television interface for the user to select and confirm the dialect types before carrying out next voice recognition, and the user is confirmed through a voiceprint recognition module, directly selects the dialect types of the user to be matched, and then adopts a voice recognition scheme of the Guangdong dialect for recognition.

Certainly, as another preferred embodiment, the voiceprint recognition module can also be implemented by using a voiceprint recognition server connected to the smart television network, and the smart television can store more user voiceprint feature information by using the voiceprint recognition server connected to the smart television network.

Step S300, the voice recognition module directly converts the voice command of the user interactive operation into characters according to the dialect type used by the user so as to recognize the voice command of the user.

In the same way as the voiceprint recognition module, the voice recognition module can also be realized by adopting a voice recognition server connected with the intelligent television network, and in the same way, the intelligent television can store more voice recognition schemes by adopting the voice recognition server connected with the intelligent television network, and can also be continuously expanded and updated as required.

The method of the invention adopts the voiceprint feature recognition technology to distinguish the users in the family using the intelligent television, and directly carries out voice recognition according to the preset dialect of the user, thereby realizing the automatic dialect voice matching in the voice recognition process of the intelligent television.

Fig. 2 shows a schematic structure diagram of a preferred embodiment of the speech recognition system of the smart television, where the speech recognition system 60 of the smart television includes a speech receiving module 61, a voiceprint recognition module 62, and a speech recognition module 63.

The voice receiving module 61 is used for the smart television to receive a voice instruction of user interactive operation. In a family using the smart television, accents of users who are members of the family are different from each other, even different dialects may be used, although the dialects can be recognized by the existing smart television in the voice recognition function, in the operation process, when the users use the dialects to interact with the smart television, the voice recognition technology of the smart television cannot directly determine the dialects of the users, and the dialects which are selected by the users need to be selected by the users, that is, the smart television cannot directly recognize the dialects of the users, so that voice recognition is performed. The system of the present invention can directly receive dialect interactive voice instructions of the user during the process of performing human-computer interaction operation by using voice recognition of the smart television, and certainly, as another preferred embodiment, the smart television can create corresponding voiceprint feature files for the user in advance to automatically select the dialect of the user and directly perform recognition, that is, the system 60 further includes a user voiceprint feature module 64 for creating corresponding voiceprint feature files for each smart television user in advance and including dialect types corresponding to the voiceprint features of the user.

The voiceprint recognition module 62 is configured to determine a voiceprint feature of the voice instruction of the user interaction operation received by the voice receiving module 61 and determine a dialect category used by the user.

Specifically, the voiceprint recognition module 62 performs voiceprint recognition on a voice instruction of user interactive operation, and confirms the user according to the voiceprint feature file of the user established in the smart television in the above process, so as to directly determine what dialect the user uses, unlike the prior art that when the smart television receives the interactive operation voice of the dialect of the user, the user needs to select the dialect again to perform voice recognition of the next step, the system of the present invention can directly confirm the voice recognition scheme according to the dialect of the user, thereby skipping the dialect selection process and improving the experience of the user in using the voice recognition technology. For example, a dialect speaking user (Cantonese) says "I want to watch XX programs" in front of the TV for the first time, and then the TV interface pops up various dialects according to the prior art: the dialect recognition results of the Cantonese, the Sichuan and the Hunan languages are given to the user, and the user needs to further judge the dialect to be the type of the Cantonese, and then the television can carry out the subsequent voice recognition operation. When the system is adopted, a dialect-speaking user (Guangdong dialect) says that the user wants to watch XX programs before a television for the first time, at the moment, various dialects cannot be popped up on a television interface for the user to select and confirm the dialect types before carrying out next voice recognition, and the user is confirmed through a voiceprint recognition module, directly selects the dialect types of the user to be matched, and then adopts a voice recognition scheme of the Guangdong dialect for recognition.

The voice recognition module 63 is configured to directly convert the voice of the user into characters according to the dialect type corresponding to the voice command voiceprint feature of the user interaction operation recognized by the voiceprint recognition module 62, so as to recognize the voice command of the user.

In the same way as the voiceprint recognition module, the voice recognition module 63 can also be implemented by using a voice recognition server connected with the smart television network, and in the same way, the smart television can store more voice recognition schemes by using the voice recognition server connected with the smart television network, and can also be continuously expanded and updated as required.

The invention also provides a readable storage medium, wherein the readable storage medium stores a program for intelligent television voice recognition, and the steps of the intelligent television voice recognition method are realized when the program for intelligent television voice recognition is executed by a processor. The specific execution process of the program is the same as the preferred implementation of the above-mentioned speech recognition method for the smart television, and is not described herein again.

It should be understood that the above-mentioned embodiments are merely preferred examples of the present invention, and not restrictive, but rather, all the changes, substitutions, alterations and modifications that come within the spirit and scope of the invention as described above may be made by those skilled in the art, and all the changes, substitutions, alterations and modifications that fall within the scope of the appended claims should be construed as being included in the present invention.

Claims

1. A speech recognition method for an intelligent television is used for the intelligent television to recognize dialects of users, and is characterized by comprising the following steps:

the voice print recognition module determines the dialect type used by the user according to the voice print characteristics of the voice command of the user interactive operation;

2. The speech recognition method for the smart television set according to claim 1, wherein before the smart television set receives the speech command of the user interaction operation, the method further comprises the following steps:

3. The voice recognition method for the smart television as claimed in claim 2, wherein when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interactive operation is not in the corresponding voiceprint feature file previously created for each user by the smart television, the smart television newly creates the corresponding voiceprint feature file for the user with the voiceprint feature, and the user selects and confirms the dialect category in the corresponding voiceprint feature file.

4. The voice recognition method for the smart television as claimed in any one of claims 1 to 3, wherein the voiceprint recognition module can be implemented by a voiceprint recognition server connected to a smart television network.

5. The intelligent television voice recognition method according to any one of claims 1 to 3, wherein the voice recognition module is implemented by a voice recognition server connected to an intelligent television network.

6. A speech recognition system of an intelligent television is used for the intelligent television to recognize dialect of a user and is characterized by comprising a speech receiving module, a voiceprint recognition module and a speech recognition module;

the voice receiving module is used for receiving a voice instruction of user interactive operation;

the voice recognition module is used for directly converting the voice of the user into characters according to the dialect type corresponding to the voice command voiceprint characteristics of the user interactive operation recognized by the voiceprint recognition module so as to recognize the voice command of the user.

7. The system according to claim 6, further comprising a user voiceprint feature module, configured to create a corresponding voiceprint feature file for each smart tv user in advance, and include a dialect category corresponding to the user voiceprint feature.

8. The system according to claim 7, wherein when the voiceprint recognition module determines that the voiceprint feature of the voice command of the user interactive operation is not the user voiceprint feature in the user voiceprint feature module, the user voiceprint feature module creates a corresponding voiceprint feature file for the user with the voiceprint feature, and determines the dialect type to be used correspondingly.

9. The intelligent television voice recognition system according to any one of claims 6 to 8, wherein the voiceprint recognition module can be implemented by a voiceprint recognition server connected to an intelligent television network; the voice recognition module can be realized by adopting a voice recognition server connected with an intelligent television network.

10. A readable storage medium, characterized in that the readable storage medium stores a program for smart tv voice recognition, and the program for smart tv voice recognition realizes the steps of the smart tv voice recognition method according to any one of claims 1 to 5 when being executed by a processor.