CN111741369B

CN111741369B - Smart television set top box based on voice recognition

Info

Publication number: CN111741369B
Application number: CN202010662605.2A
Authority: CN
Inventors: 王利平; 李重; 李瑞生; 汤永哲
Original assignee: Anhui Xinzhi Technology Co ltd
Current assignee: Anhui Xinzhi Technology Co ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2021-11-16
Anticipated expiration: 2040-07-10
Also published as: CN111741369A

Abstract

The invention discloses a voice recognition-based smart television set top box, which comprises a recognition analysis module, a database, an automatic recommendation module, a voice recording module, a user registration module, a camera unit, a processor, an alarm module, an external acquisition module and a control module, wherein the recognition analysis module is used for recognizing the voice of a user; the processor is used for comparing real-time sound pressure, recognition voice and real-time gestures in the database analyzed and calculated by the recognition and analysis module, and the user registration module is used for submitting the number of users and data information of the users for registration through the mobile phone terminal and sending the number of the users who are successfully registered and the data information of the users to the database for storage; the intelligent control system has strong intelligent performance, can greatly reduce the waiting time of a user, is more convenient for the user to use due to intelligent control, and brings convenience to the life of the user.

Description

Smart television set top box based on voice recognition

Technical Field

The invention relates to the field of intelligent television set-top boxes, in particular to an intelligent television set-top box based on voice recognition.

Background

The television set top box serving as important video and audio equipment meets the requirements of most of the families and hotels on entertainment and life at present; but limited by the complexity of remote controller operation, the functions of searching, switching stations and the like which are commonly seen by people can be directly controlled through voice, and the set top box can also be used as a central control of an intelligent family to control intelligent household equipment such as lamplight, air conditioners and the like, so that the people can liberate both hands and can finish the operation which is wanted by people only through one voice instruction; at present, the mainstream intelligent home central control scheme provides related services by externally connecting other central control devices, the whole function is combined in the set top box, and the living room is used as the area with the longest daily activity time of family members, so that better services can be provided undoubtedly. The utilization rate of the equipment can also be improved.

Application No. CN201410482292.7 discloses a set-top box system, which includes: the first tuner, the second tuner and the third tuner are used for receiving satellite signals; the fourth tuner is used for receiving the ground signal; the three first demodulation chips are respectively used for demodulating the intermediate frequency signals output by the first tuner, the second tuner and the third tuner; a second demodulation chip, for demodulating the intermediate frequency signal outputted by the fourth tuner; and the central processing unit is used for processing the transmission streams output by the three first demodulation chips and the three second demodulation chips.

However, in this patent, the user cannot be identified by voice, nor by keywords or gestures, and the protection of the privacy of the user is low.

Disclosure of Invention

The invention aims to provide an intelligent television set top box based on voice recognition;

the purpose of the invention can be realized by the following technical scheme:

an intelligent television set top box based on voice recognition comprises a recognition analysis module, a database, an automatic recommendation module, a voice recording module, a user registration module, a camera unit, a processor, an alarm module, an external acquisition module and a control module;

the external acquisition module is used for acquiring the sound pressure and the recognition voice of the current user in real time and marking the sound pressure as real-time sound pressure, and is also used for acquiring the real-time gesture input by the user when in use, and the external acquisition module is used for transmitting the real-time sound pressure, the recognition voice and the real-time gesture to the recognition analysis module;

the system comprises a recognition and analysis module, a database and a database, wherein the recognition and analysis module receives real-time sound pressure, recognition voice and real-time gestures transmitted by an external acquisition module, and is used for carrying out identity verification processing on the real-time sound pressure, the recognition voice and the real-time gestures by combining sound pressure information, keywords and verification information in the database, and the identity verification processing comprises the following specific steps:

the method comprises the following steps: acquiring real-time sound pressure, recognition voice and real-time gestures transmitted by a corresponding external acquisition module;

step two: comparing the recognized voice with the verification information stored in the corresponding database, wherein the specific comparison process comprises the following steps:

s1: converting the recognized voice into text information through a voice-to-text technology, and marking the text information as a word to be verified;

s2: matching the words to be verified with all the keywords, and finding out the user information corresponding to the keywords consistent with the words to be verified; if no corresponding user information exists, generating an authentication failure signal;

step three: acquiring corresponding user information, and acquiring verification gesture and sound pressure information corresponding to the user information;

step four: acquiring real-time sound pressure and real-time gestures; the method comprises the following steps of verifying a real-time gesture, and assigning a verification potential value according to a verification processing result, wherein the specific processing steps are as follows:

s01: comparing the real-time gesture with the corresponding verification gesture, and if the real-time gesture is consistent with the corresponding verification gesture, giving a verification potential value Yz to be 1, otherwise giving a verification potential value Yz to be 0;

s02: acquiring real-time sound pressure of a user, marking the real-time sound pressure as Sy, and marking corresponding sound pressure information as Y;

s03: obtaining a similarity value Xs by using a formula; the concrete formula is as follows:

Xs＝{|Sy-Y|/Y}×Yz；

in the formula, the absolute value of the difference between the Sy and the Y is obtained;

s04: obtaining a similarity value Xs;

step five: when the Xs is lower than X1, generating a kernel communication signal, and marking corresponding user information as a user; here X1 is a preset value;

the processor is used for comparing the real-time sound pressure, the recognized voice and the real-time gesture transmitted by the external acquisition module with the sound pressure information, the key words and the verification information in the database;

SS 1: if the processor receives the authentication failure signal, generating an alarm signal and sending the alarm signal to an alarm module;

SS 2: if the verification potential value Yz is given as 1, generating an accurate identification information signal and sending the accurate identification information signal to an automatic recommendation module;

if the verification potential value Yz is given as 0, generating an alarm signal and sending the alarm signal to an alarm module;

SS 3: if the processor receives the kernel communication signal, generating an identification information accurate signal and sending the identification information accurate signal to an automatic recommendation module; otherwise, generating an alarm signal and sending the alarm signal to an alarm module;

SS 4: after processing and generating the accurate identification information signal, managing the use of the user by combining the authority of the user in the database, wherein the specific management steps are as follows:

m1: if the age of the user is less than the set threshold value N1, the processor generates a time limit signal and sends the signal to the control unit;

m2: if the age of the user is greater than the set threshold N2, the processor generates a reduced brightness signal and sends the signal to the control unit.

Further, the automatic recommendation module is used for recommending a proper program type for a user, the processor compares real-time sound pressure, recognized voice and real-time gestures transmitted by the external acquisition module with sound pressure information, keywords and verification information in the database, the processor generates accurate voice recognition information after the user information is recognized accurately and sends the accurate voice recognition information to the automatic recommendation module, and the specific automatic recommendation steps are as follows:

p1: acquiring the times of program collection of a user using a television currently within ten days, and marking the times as Gu;

p2: acquiring the times of watching programs within ten days of a user who uses the television at present, and marking the times as Wu;

p3: acquiring the time of browsing programs within ten days of a user who uses a television at present, and marking the time as Ru;

p4: acquiring the times of program participation comment of a user using a television currently within ten days, and marking the times as Yu;

p5: using formulas

Obtaining an automatic recommendation coefficient Xu, wherein mu is an error correction factor and takes the value of 0.95682; d7, d8, d9 and d10 are all preset proportionality coefficients, d7+ d8+ d9+ d10 is 1, d7 is more than d8 is more than d9 is more than d10, and T1 and T2 are time proportionality coefficients;

p6: after the automatic recommendation coefficient Xu is calculated, the system automatically selects the program with the high first three recommendation coefficients for recommendation.

Further, the user registration module is used for the user to submit the number of the users and the data information of the users through the mobile phone terminal for registration and send the number of the users who are successfully registered and the data information of the users to the database for storage, wherein the data information of the users comprises names, ages and photos; the user registration module transmits an end signal to the database after the user registration is successful;

the database generates a voice recording start signal and sends the voice recording start signal to the voice recording module after receiving an end signal transmitted by the user registration module, the voice recording module is used for binding a user with voice key information to form registered user information and storing the registered user information in the database, the voice recording module informs the user of voice recording when receiving the voice recording start signal transmitted by the database, and at the moment, the voice recording module can record the voice key information of each user, wherein the voice key information comprises the sound pressure information and the key words of the voice of the user; the keywords are words set by a user in a self-defined way and are used for verifying the identity of the user;

the camera shooting unit is used for collecting a verification gesture of the user during registration, and the verification gesture is a gesture preset by the user and used for further verifying the identity of the user; the camera shooting unit is used for matching the verification gesture with the corresponding user to obtain verification information; and transmitting the verification information to a database for storage.

Furthermore, the control module is used for controlling the work of the television through voice, after the processor compares the resolution, the accurate information of voice recognition is generated and sent to the control module, and the control module comprises: the system comprises a television control unit, a home control unit, an entertainment unit, a weather query unit and a stock query unit, wherein the specific working processes of the parts are as follows:

a. the television control unit can operate the television by voice:

a1, the user can control the start of the television, the broadcast of the live broadcast and the channel change of the program through voice operation;

a2, controlling the sound of the television by increasing and decreasing the volume through the remote controller in any scene through voice operation by a user;

b. the home control unit takes the set-top box as a central control entrance to control all networked intelligent devices in a user family:

b1, the user can directly control the temperature, mode, wind speed and other functional settings of the air conditioner through the set-top box;

b2, the user can directly turn on or off the lamp through the set-top box;

b3, the user can directly operate the curtain through the set-top box;

c. the entertainment unit may conduct entertainment through the set-top box:

c1, the user can listen to favorite news programs through the set-top box;

c2, the user can listen to favorite music through the set-top box, and controls the music to be cut, paused and continued through voice;

c3, the user can play Tang poems through the set-top box;

c4, the user can communicate in a man-machine chatting mode through the set top box;

d. the weather query unit can directly query the weather of the current city through the voice of the user, can also query weather messages of other cities and other time, and plays and informs the user through TTS after the query is successful;

e. the stock inquiring unit may inquire the quotation of stocks for the user.

Compared with the prior art, the invention has the beneficial effects that:

1. the recognition and analysis module receives real-time sound pressure, recognition voice and real-time gestures transmitted by the external acquisition module, the recognition and analysis module is used for carrying out identity verification processing on the real-time sound pressure, the recognition voice and the real-time gestures by combining sound pressure information, keywords and verification information in a database, and the specific steps of the identity verification processing are as follows: acquiring real-time sound pressure, recognition voice and real-time gestures transmitted by a corresponding external acquisition module; the processor is used for comparing the real-time sound pressure, the recognized voice and the real-time gesture transmitted by the external acquisition module with the sound pressure information, the key words and the verification information in the database; if the processor receives the authentication failure signal, generating an alarm signal and sending the alarm signal to an alarm module; if the verification potential value Yz is given as 1, generating an accurate identification information signal and sending the accurate identification information signal to an automatic recommendation module; if the verification potential value Yz is given as 0, generating an alarm signal and sending the alarm signal to an alarm module; if the processor receives the kernel communication signal, generating an identification information accurate signal and sending the identification information accurate signal to an automatic recommendation module; otherwise, generating an alarm signal and sending the alarm signal to an alarm module; after the accurate identification information signal is processed and generated, the use of the user is managed by combining the authority of the user in the database, the identity of the user is identified by the system through real-time sound pressure, voice and real-time gestures of the user, the user is prevented from using the system at will, the information of the user is prevented from being leaked, and the safety performance of the set top box is improved;

2. after comparing real-time sound pressure, recognized voice and real-time gestures, the processor generates accurate voice recognition information and sends the accurate voice recognition information to the automatic recommendation module, the times of program collection, the times of program watching, the time of program browsing and the times of program participation in comment of a user who uses the television at present within ten days are obtained, an automatic recommendation coefficient is obtained by using a formula, then the system automatically selects the program with the high first three recommendation coefficients for recommendation, videos can be recommended for the user, and the time of client searching is reduced;

3. control module is used for through the work of speech control TV, and after the treater contrasted the resolution, the accurate information of speech recognition was generated and was sent to control module, control module divide into: the system comprises a television control unit, a home control unit, an entertainment unit, a weather query unit and a stock query unit, wherein the sound volume of the television can be controlled by starting the voice television, playing live broadcast and switching programs through a remote controller, and the home control unit controls all networked intelligent equipment in a user family by taking a set top box as a central control entrance, the weather inquiry unit can directly inquire the weather of the current city through the voice of the user, and can also inquire other cities, weather messages at other times, and play a notification to the user through TTS after the query is successful, the stock inquiry unit can inquire the quotation of the stock for the user, the invention has strong intelligent performance, can greatly reduce the waiting time of the user, meanwhile, the intelligent control enables the user to use the intelligent control more conveniently, and brings convenience to the life of the user.

Drawings

In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.

Fig. 1 is a schematic block diagram of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a smart television set-top box based on voice recognition includes a recognition analysis module, a database, an automatic recommendation module, a voice recording module, a user registration module, a camera unit, a processor, an alarm module, an external collection module, and a control module;

Xs＝{|Sy-Y|/Y}×Yz；

s04: obtaining a similarity value Xs;

The automatic recommendation module is used for recommending proper program types for users, the processor compares real-time sound pressure, recognized voice and real-time gestures transmitted by the external acquisition module with sound pressure information, keywords and verification information in the database, the processor generates accurate voice recognition information after the user information is recognized accurately and sends the accurate voice recognition information to the automatic recommendation module, and specific automatic recommendation steps are as follows:

p5: using formulas

The user registration module is used for submitting the number of users and data information of the users to register through the mobile phone terminal by the users and sending the number of the users who successfully register and the data information of the users to the database for storage, wherein the data information of the users comprises names, ages and photos; the user registration module transmits an end signal to the database after the user registration is successful;

The control module is used for controlling the work of the television through voice, after the processor compares the resolution, the accurate information of voice recognition is generated and sent to the control module, and the control module is divided into: the system comprises a television control unit, a home control unit, an entertainment unit, a weather query unit and a stock query unit, wherein the specific working processes of the parts are as follows:

a. the television control unit can operate the television by voice:

b2, the user can directly turn on or off the lamp through the set-top box;

b3, the user can directly operate the curtain through the set-top box;

c. the entertainment unit may conduct entertainment through the set-top box:

c1, the user can listen to favorite news programs through the set-top box;

c3, the user can play Tang poems through the set-top box;

e. the stock inquiring unit may inquire the quotation of stocks for the user.

The working principle of the invention is as follows:

the utility model provides an intelligent TV set-top box based on speech recognition, during operation, real-time acoustic pressure, discernment pronunciation and real-time gesture that discernment analysis module received external collection module transmission, the discernment analysis module is used for carrying out the authentication processing to real-time acoustic pressure, discernment pronunciation and real-time gesture in the combination acoustic pressure information, keyword and the verification information in the database, and the concrete step of authentication processing is: acquiring real-time sound pressure, recognition voice and real-time gestures transmitted by a corresponding external acquisition module; comparing the recognized voice with the verification information stored in the corresponding database, wherein the specific comparison process comprises the following steps: converting the recognized voice into text information through a voice-to-text technology, and marking the text information as a word to be verified; matching the words to be verified with all the keywords, and finding out the user information corresponding to the keywords consistent with the words to be verified; if no corresponding user information exists, generating an authentication failure signal; acquiring corresponding user information, and acquiring verification gesture and sound pressure information corresponding to the user information; acquiring real-time sound pressure and real-time gestures; the method comprises the following steps of verifying a real-time gesture, and assigning a verification potential value according to a verification processing result, wherein the specific processing steps are as follows: comparing the real-time gesture with the corresponding verification gesture, and if the real-time gesture is consistent with the corresponding verification gesture, giving a verification potential value Yz to be 1, otherwise giving a verification potential value Yz to be 0; acquiring real-time sound pressure of a user, marking the real-time sound pressure as Sy, and marking corresponding sound pressure information as Y; obtaining a similarity value Xs by using a formula; obtaining a similarity value Xs; when the Xs is lower than X1, generating a kernel communication signal, and marking corresponding user information as a user; here X1 is a preset value; the processor is used for comparing the real-time sound pressure, the recognized voice and the real-time gesture transmitted by the external acquisition module with the sound pressure information, the key words and the verification information in the database; if the processor receives the authentication failure signal, generating an alarm signal and sending the alarm signal to an alarm module; if the verification potential value Yz is given as 1, generating an accurate identification information signal and sending the accurate identification information signal to an automatic recommendation module; if the verification potential value Yz is given as 0, generating an alarm signal and sending the alarm signal to an alarm module; if the processor receives the kernel communication signal, generating an identification information accurate signal and sending the identification information accurate signal to an automatic recommendation module; otherwise, generating an alarm signal and sending the alarm signal to an alarm module; after processing and generating the accurate identification information signal, managing the use of the user by combining the authority of the user in the database, wherein the specific management steps are as follows: if the age of the user is less than the set threshold value N1, the processor generates a time limit signal and sends the signal to the control unit; if the age of the user is larger than the set threshold value N2, the processor generates a brightness reducing signal and sends the signal to the control unit, the system identifies the identity of the user through the real-time sound pressure, the voice and the real-time gestures of the user, prevents other people from using the system at will, prevents the information of the user from being leaked, and improves the safety performance of the set top box; after comparing real-time sound pressure, recognized voice and real-time gestures, the processor generates accurate voice recognition information and sends the accurate voice recognition information to the automatic recommendation module, the times of program collection, the times of program watching, the time of program browsing and the times of program participation in comment of a user who uses the television at present within ten days are obtained, an automatic recommendation coefficient is obtained by using a formula, then the system automatically selects the program with the high first three recommendation coefficients for recommendation, videos can be recommended for the user, and the time of client searching is reduced; control module is used for through the work of speech control TV, and after the treater contrasted the resolution, the accurate information of speech recognition was generated and was sent to control module, control module divide into: the system comprises a television control unit, a home control unit, an entertainment unit, a weather query unit and a stock query unit, wherein the sound volume of the television can be controlled by starting the voice television, playing live broadcast and switching programs through a remote controller, and the home control unit controls all networked intelligent equipment in a user family by taking a set top box as a central control entrance, the weather inquiry unit can directly inquire the weather of the current city through the voice of the user, and can also inquire other cities, weather messages at other times, and play a notification to the user through TTS after the query is successful, the stock inquiry unit can inquire the quotation of the stock for the user, the invention has strong intelligent performance, can greatly reduce the waiting time of the user, meanwhile, the intelligent control enables the user to use the intelligent control more conveniently, and brings convenience to the life of the user.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. An intelligent television set top box based on voice recognition is characterized by comprising a recognition analysis module, a database, an automatic recommendation module, a voice recording module, a user registration module, a camera unit, a processor, an alarm module, an external acquisition module and a control module;

Xs={|Sy-Y|/Y}

Yz；

s04: obtaining a similarity value Xs;

m2: if the age of the user is larger than a set threshold value N2, the processor generates a brightness reducing signal and sends the signal to the control unit;

p5: using formulas

Obtaining an automatic recommendation coefficient Xu, wherein mu is an error correction factor and takes the value of 0.95682; d7, d8, d9 and d10 are all preset proportionality coefficients, d7+ d8+ d9+ d10=1, d7 < d8 < d9 < d10, and T1 and T2 are time proportionality coefficients;

2. The smart television set-top box based on voice recognition as claimed in claim 1, wherein the user registration module is used for the user to submit the number of users and the data information of the users through the mobile phone terminal for registration and send the number of the users who successfully register and the data information of the users to the database for storage, and the data information of the users comprises names, ages and photos; the user registration module transmits an end signal to the database after the user registration is successful;

the database generates a voice recording start signal and sends the voice recording start signal to the voice recording module after receiving an end signal transmitted by the user registration module, the voice recording module is used for binding a user with voice key information to form registered user information and storing the registered user information in the database, the voice recording module informs the user of voice recording when receiving the voice recording start signal transmitted by the database, and at the moment, the voice recording module can record the voice key information of each user, wherein the voice key information comprises the sound pressure information and the key words of the voice of the user;

3. The smart television set-top box based on voice recognition as claimed in claim 1, wherein the control module is configured to control the television to work through voice, the processor compares the resolutions, generates information with accurate voice recognition and sends the information to the control module, and the control module is divided into: the system comprises a television control unit, a home control unit, an entertainment unit, a weather query unit and a stock query unit, wherein the specific working processes of the parts are as follows:

a. the television control unit can operate the television by voice:

b1, the user can directly control the temperature, mode and wind speed function setting of the air conditioner through the set-top box;

b2, the user can directly turn on or off the lamp through the set-top box;

b3, the user can directly operate the curtain through the set-top box;

c. the entertainment unit may conduct entertainment through the set-top box:

c1, the user can listen to favorite news programs through the set-top box;

c3, the user can play Tang poems through the set-top box;

e. the stock inquiring unit may inquire the quotation of stocks for the user.