CN104766607A

CN104766607A - Television program recommendation method and system

Info

Publication number: CN104766607A
Application number: CN201510098643.9A
Authority: CN
Inventors: 雷延强
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2015-03-05
Filing date: 2015-03-05
Publication date: 2015-07-08

Abstract

The invention discloses a television program recommendation method, which comprises the following steps: receiving a voice signal of a user; converting the voice signal into discrete voice data; recognizing the dialect type used by the user according to the voice data; recommending the television programs related to the dialect category to the user. By adopting the embodiment of the invention, programs which are more in line with the cultural language background of the user can be recommended, and the user experience is enhanced, especially for the old users who are not skilled in mastering Mandarin and man-machine operation. Meanwhile, the embodiment of the invention also provides a television program recommendation system which can execute all the method steps of the television program recommendation method.

Description

A kind of TV programme suggesting method and system

Technical field

The present invention relates to field of computer technology, particularly relate to a kind of TV programme suggesting method and system.

Background technology

Along with the fast development of digital television techniques, Digital Cable Television System can reach the transmission capacity of hundreds of programs under current encoder and modulation classification.Add the intelligent television of Built In Operating System, can the video resource of magnanimity on view Internet, therefore TV user is difficult in so numerous videos, select their interested content.In order to solve this TV information " overload " problem, electronic program guides must have intelligent, and it can according to the interest of user, and hobby and use history are automatically in advance to user's recommending television.It can also be made adjustment to recommended TV programme from the change of motion tracking user interest simultaneously.The concept of digital television program recommending system that Here it is.

Existing TV programme suggesting method carrys out recommended program according to the dominant character of user and recessive character mostly.When dominant character refers to that user is registered as commending system user, the property attribute provided, comprising: sex, the age, occupation grade for hardware information; Recessive character refers to the time period of user's TV reception, program category, the software informations such as the program often watched.

The shortcoming of the TV program commending method of prior art is: when utilizing dominant information, needs to be registered as commending system user, and provides enough dominant informations, does not take into full account the old user of unskilled grasp mandarin and human-machine operation; When utilizing recessive information, the user personality Expressive Features of utilization is inadequate, can not recommend enough TV programme accurately to user.

Recently, market has also occurred one is based on voice-operated TV programme Adjusted Option.According to user speech, carry out zapping.Such as, user pronunciation " I wants to see Hunan platform ", then automatically switch to Hunan platform.The intelligence degree of this kind of scheme is not high, can only identify fixing statement, can be regarded as a kind of control system, can not be user's intelligent recommendation TV programme.

Summary of the invention

The embodiment of the present invention proposes a kind of TV programme suggesting method and system, can recommend out the program more meeting user's Cultural Language background, strengthens Consumer's Experience, particularly for those and the old user of unskilled grasp mandarin and human-machine operation.

The embodiment of the present invention provides a kind of TV programme suggesting method, comprising:

Receive the voice signal of user;

Described voice signal is converted to discrete speech data;

User's dialect classification used is identified according to described speech data;

The TV programme relevant to described dialect classification is recommended to user.

Further, describedly identify user's dialect classification used according to described speech data, specifically comprise:

To described speech data framing;

Obtain the robust features of each frame speech data, form the fisrt feature sequence X={ x of described speech data ₁, x ₂..., x _m; Wherein, x _mrepresent the robust features of M frame speech data;

Remove the silence clip in described fisrt feature sequence X, obtain the second feature sequence Y={y of described speech data ₁, y ₂..., y _n; Wherein, y _nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;

According to the second feature sequence Y of described speech data, calculate the likelihood score of described speech data under different dialect model;

The dialect classification that user is used is judged according to the likelihood score of described speech data under different dialect model.

By extracting the robust features of each frame speech data and through after quiet process, the characteristic that can characterize described speech data can being obtained: second feature sequence; And then utilize described second feature sequence to calculate the likelihood score of described speech data under different dialect models; Likelihood score is higher, and illustrate that this group second feature sequence is more similar to described dialect model, namely the dialect model that wherein likelihood score is the highest is judged to be the dialect classification that user is used.

Further, the described second feature sequence Y according to described speech data, calculates the likelihood score of described speech data under different dialect model, specifically according to following formulae discovery:

\begin{matrix} p (Y | λ_{k}) = \underset{i}{Π} p (y_{i} / λ_{k}) \\ = \underset{i}{Π} \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})] \end{matrix}

Wherein, p (Y/ λ _k) be the likelihood score of described speech data under kth kind dialect model; P (y _i/ λ _k) be the robust features y of the i-th frame speech data of described second feature sequence _iappear at the probability of a kth dialect model; ω _{(k) j}for the weight of jth Gauss's submodel of kth kind dialect model; C _{(k) j}for the covariance of jth Gauss's submodel of kth kind dialect model; μ _{(k) j}for the average of jth Gauss's submodel of kth kind dialect model.

It should be noted that, above formula is the likelihood function of kth kind dialect model.For the likelihood function of a discrete type, can know namely the likelihood score of described speech data under kth kind dialect equals the product of the probability that each element occurs in second feature sequence.In the present embodiment, described dialect model adopts gauss hybrid models.Gauss hybrid models accurately quantizes things with Gaussian probability-density function (normal distribution curve), and a things is decomposed into some models formed based on Gaussian probability-density function (normal distribution curve).Namely a dialect model is mixed to form by multiple Gauss's submodel.Thus under the probability that each element occurs equals again this dialect model each Gauss's submodel probability and, and each Gauss's submodel is assigned different weights.Gauss hybrid models has the advantage that stability is high and convergence is good, for the accuracy that can improve likelihood score calculating.

Further, before the voice signal of described input user, also comprise the step building dialect model, specifically comprise:

Obtain the second feature sequence based on known dialect;

Adopt the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;

According to the robust features that each classification comprises, maximum likelihood algorithm is adopted to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;

According to the weight of each Gauss's submodel, average and covariance, generate the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence _ithe probability appearing at a kth dialect model is

p (y_{i} / λ_{k}) = \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})]

ω _{(k) j}for the weight of jth Gauss's submodel of kth kind dialect model; C _{(k) j}for the covariance of jth Gauss's submodel of kth kind dialect model; μ _{(k) j}for the average of jth Gauss's submodel of kth kind dialect model.

It should be noted that, described acquisition is actually based on the second feature sequence of known dialect the step obtaining sample data, and the process obtaining second feature sequence is same as described above; Each Gauss's submodel is at least assigned the robust features of a frame speech data.

Further, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.

Correspondingly, the embodiment of the present invention also provides a kind of television program recommendation system, and the institute that can implement said method in steps, comprising:

Signal receiving module, for receiving the voice signal of user;

Signal conversion module, for being converted to discrete speech data by described voice signal;

Identification module, for identifying user's dialect classification used according to described speech data;

Recommending module, for recommending the TV programme relevant to described dialect classification to user.

Further, described judge module comprises:

Point frame unit, for described speech data framing;

First ray acquiring unit, for obtaining the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data ₁, x ₂..., x _m; Wherein, x _mrepresent the robust features of M frame speech data in described speech data;

Second retrieval unit, for removing the silence clip in described fisrt feature sequence, obtains the second feature sequence Y={y of described speech data ₁, y ₂..., y _n; Wherein, y _nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;

Likelihood score computing unit, for the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;

Identifying unit, for judging according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.

Further, the described likelihood score computing unit specifically likelihood score of speech data under different dialect model according to following formulae discovery:

\begin{matrix} p (Y | λ_{k}) = \underset{i}{Π} p (y_{i} / λ_{k}) \\ = \underset{i}{Π} \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})] \end{matrix}

Further, described television program recommendation system also comprises dialect model construction module; Described dialect model construction module specifically comprises:

Sample sequence acquiring unit, for obtaining the second feature sequence based on known dialect;

Cluster cell, for adopting the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;

Model parameter calculation unit, for according to robust features assigned in each Gauss's submodel, adopts maximum likelihood algorithm to calculate the weight of each Gauss's submodel, average and covariance;

Model generation unit, for the weight according to each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence _ithe probability appearing at a kth dialect model is

p (y_{i} / λ_{k}) = \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})]

Implement the embodiment of the present invention, there is following beneficial effect: embodiments provide a kind of TV programme suggesting method, according to the voice signal of user, the said dialect classification of user can be judged, and the dialect that recommendation of taking this as a foundation is correlated with is main TV programme.This makes TV can recommend out more to meet the program of user's Cultural Language background, strengthens Consumer's Experience, particularly for those and the old user of unskilled grasp mandarin and human-machine operation.The present invention can see a kind of method of professional recommendation Dialect program as, forms complementation with existing television program recommendation system.Meanwhile, the embodiment of the present invention also provides a kind of television program recommendation system, can perform all method steps of described TV programme suggesting method.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of the TV programme suggesting method that the embodiment of the present invention provides;

Fig. 2 is the schematic flow sheet of the step S3 in Fig. 1;

Fig. 3 is the schematic flow sheet of the step S5 in Fig. 1;

Fig. 4 is the structural representation of television program recommendation system provided by the invention;

Fig. 5 is the structural representation of the judge module 3 in Fig. 4;

Fig. 6 is the structural representation of the dialect model construction module 5 in Fig. 4.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

See Fig. 1, be the schematic flow sheet of the TV programme suggesting method that the embodiment of the present invention provides, the method comprises the following steps:

S1, receives the voice signal of user;

S2, is converted to discrete speech data by described voice signal;

S3, identifies user's dialect classification used according to described speech data;

S4, recommends the TV programme relevant to described dialect classification to user.

Dialect can be divided into a variety of, and Modern Chinese dialect can be divided into seven large localism areas, comprises northern dialect, Wu Fangyan, Hunan dialect, Hakka dialect, Fujian dialect, Guangdong dialect, Jiangxi dialect.Such as, when user uses Guangdong dialect, utilize the TV programme suggesting method of the present embodiment can recommend some TV programme of the TV station in Guangdong Province to user; When user uses Hunan dialect, utilize the TV programme suggesting method of the present embodiment can recommend some TV programme of the TV station in Hunan Province to user.Wherein, each TV programme should demarcate affiliated dialect classification in advance.

In step s 4 which, describedly recommending the TV programme relevant to described dialect classification can be by showing a TV program list on television screen to user, or quoting described TV programme by the mode of voice.The TV programme suggesting method of the embodiment of the present invention can be applied to televisor or Set Top Box.

Further, the program commending instruction receiving user can also be comprised the steps: before step S1.Just perform step S1-S4 after only having user to input described program commending instruction, prevent maloperation.

As shown in Figure 2, it is the schematic flow sheet of the step S3 in Fig. 1.

In step s3, describedly identify user's dialect classification used according to described speech data, specifically comprise:

S31, to described speech data framing;

S32, obtains the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data ₁, x ₂..., x _m; Wherein, x _mrepresent the robust features of M frame speech data;

S33, removes the silence clip in described fisrt feature sequence X, obtains the second feature sequence Y={y of described speech data ₁, y ₂..., y _n; Wherein, y _nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;

S34, according to the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;

S35, judges according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.

Particularly, in step S34, the described second feature sequence Y according to described speech data, calculates the likelihood score of described speech data under different dialect model, specifically according to following formulae discovery:

\begin{matrix} p (Y | λ_{k}) = \underset{i}{Π} p (y_{i} / λ_{k}) \\ = \underset{i}{Π} \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})] \end{matrix}

Wherein, p (Y/ λ _k) be the likelihood score of described speech data under kth kind dialect; P (y _i/ λ _k) be the robust features y of the i-th frame speech data of described second feature sequence _iappear at the probability of a kth dialect model; ω _{(k) j}for the weight of jth Gauss's submodel of kth kind dialect; C _{(k) j}for the covariance of jth Gauss's submodel of kth kind dialect; μ _{(k) j}for the average of jth Gauss's submodel of kth kind dialect.

Further, before the voice signal of described input user, also comprise the step S5 building dialect model.As shown in Figure 3, it is the schematic flow sheet of the step S5 in Fig. 1, specifically comprises:

S51, obtains the second feature sequence based on known dialect;

S52,

；

S53, according to the robust features comprised in each classification, adopts maximum likelihood algorithm to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;

S54, according to the weight of each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence _ithe probability appearing at a kth dialect model is

p (y_{i} / λ_{k}) = \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})]

It should be noted that, described acquisition is actually based on many groups second feature sequence of the known dialect of same the step obtained as many groups second feature sequence of sample data, and the process obtaining second feature sequence is identical with step S31-S33; Each Gauss's submodel is at least assigned the robust features of a frame speech data.By performing step S51-S54 to different dialects, obtaining the parameter of each Gauss's submodel of each dialect model, thus building each dialect model.

Correspondingly, the embodiment of the present invention also provides a kind of television program recommendation system, can implement the institute of said method in steps.As shown in Figure 4, be the structural representation of television program recommendation system provided by the invention, this system comprises:

Signal input module 1, for inputting the voice signal of user;

Signal conversion module 2, for being converted to discrete speech data by described voice signal;

Judge module 3, for judging the dialect classification that user is used according to described speech data;

Recommending module 4, for recommending the TV programme relevant to described dialect classification to user.

As shown in Figure 5, be the structural representation of judge module 3 in Fig. 4.

Described judge module 3 comprises:

Point frame unit 31, for described speech data framing;

First ray acquiring unit 32, for obtaining the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data ₁, x ₂..., x _m; Wherein, x _mrepresent the robust features of M frame speech data in described speech data;

Second retrieval unit 33, for removing the silence clip in described fisrt feature sequence, obtains the second feature sequence Y={y of described speech data ₁, y ₂..., y _n; Wherein, y _nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;

Likelihood score computing unit 34, for the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;

Identifying unit 35, for judging according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.

Further, the described likelihood score computing unit 34 specifically likelihood score of speech data under different dialect model according to following formulae discovery:

\begin{matrix} p (Y | λ_{k}) = \underset{i}{Π} p (y_{i} / λ_{k}) \\ = \underset{i}{Π} \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})] \end{matrix}

Further, described television program recommendation system also comprises dialect model construction module 5.As shown in Figure 6, it is the structural representation of the dialect model construction module 5 in Fig. 4.

Described dialect model construction module 5 specifically comprises:

Sample sequence acquiring unit 51, for obtaining the second feature sequence based on known dialect;

Allocation units 52, for

Model parameter calculation unit 53, for according to robust features assigned in each Gauss's submodel, adopts maximum likelihood algorithm to calculate the weight of each Gauss's submodel, average and covariance;

Model generation unit 54, for the weight according to each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence _ithe probability appearing at a kth dialect model is

p (y_{i} / λ_{k}) = \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})]

One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc.

The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims

1. a TV programme suggesting method, is characterized in that, comprising:

Receive the voice signal of user;

Described voice signal is converted to discrete speech data;

The dialect classification that user uses is identified according to described speech data;

2. TV programme suggesting method as claimed in claim 1, is characterized in that, describedly identifies user's dialect classification used according to described speech data, specifically comprises:

To described speech data framing;

3. TV programme suggesting method as claimed in claim 2, is characterized in that, according to the second feature sequence Y of described speech data, calculate the likelihood score of described speech data under different dialect model, specifically according to following formulae discovery:

\begin{matrix} p (Y / λ_{k}) = \underset{i}{Π} p (y_{i} / λ_{k}) \\ = \underset{i}{Π} \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})] \end{matrix}

4. TV programme suggesting method as claimed in claim 2 or claim 3, is characterized in that, before the voice signal of described input user, also comprises the step building dialect model, specifically comprises:

Obtain the second feature sequence based on known dialect;

p (y_{i} / λ_{k}) = \underset{] j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})

5. TV programme suggesting method as claimed in claim 4, is characterized in that, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.

6. a television program recommendation system, is characterized in that, comprising:

Signal receiving module, for receiving the voice signal of user;

Identification module, for identifying the dialect classification that user uses according to described speech data;

7. television program recommendation system as claimed in claim 6, it is characterized in that, described identification module comprises:

Point frame unit, for described speech data framing;

8. television program recommendation system as claimed in claim 7, it is characterized in that, described likelihood score computing unit is the likelihood score of speech data under different dialect model according to following formulae discovery specifically:

\begin{matrix} p (Y / λ_{k}) = \underset{i}{Π} p (y_{i} / λ_{k}) \\ = \underset{i}{Π} \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})] \end{matrix}

9. television program recommendation system as claimed in claim 7 or 8, it is characterized in that, described television program recommendation system also comprises dialect model construction module; Described dialect model construction module specifically comprises:

Model parameter calculation unit, for the robust features comprised according to each classification, adopts maximum likelihood algorithm to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;

p (y_{i} / λ_{k}) = \underset{j}{Σ} ω_{(k) j} \frac{1}{{(2 π)}^{1 / 2} {| C_{(k) j} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - μ_{(k) j})}^{T} C_{(k) j}^{- 1} (y_{i} - μ_{(k) j})]

10. television program recommendation system as claimed in claim 9, is characterized in that, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.