CN104766607A - Television program recommendation method and system - Google Patents

Television program recommendation method and system Download PDF

Info

Publication number
CN104766607A
CN104766607A CN201510098643.9A CN201510098643A CN104766607A CN 104766607 A CN104766607 A CN 104766607A CN 201510098643 A CN201510098643 A CN 201510098643A CN 104766607 A CN104766607 A CN 104766607A
Authority
CN
China
Prior art keywords
speech data
dialect
model
submodel
feature sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510098643.9A
Other languages
Chinese (zh)
Inventor
雷延强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201510098643.9A priority Critical patent/CN104766607A/en
Publication of CN104766607A publication Critical patent/CN104766607A/en
Pending legal-status Critical Current

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a television program recommendation method. The television program recommendation method includes the steps that a voice signal of a user is received; the voice signal is converted to discrete voice data; a dialect category used by the user is recognized according to the voice data; television programs related to the dialect category are recommended to the user. By means of the method, programs which accords with the culture language background of the user can be commended, and the experience of users, in particular elderly users who are not skillful at the mandarin and man-machine operation, is improved. The embodiment of the invention further provides a television program recommendation system which can execute all steps of the television program recommendation method.

Description

A kind of TV programme suggesting method and system
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of TV programme suggesting method and system.
Background technology
Along with the fast development of digital television techniques, Digital Cable Television System can reach the transmission capacity of hundreds of programs under current encoder and modulation classification.Add the intelligent television of Built In Operating System, can the video resource of magnanimity on view Internet, therefore TV user is difficult in so numerous videos, select their interested content.In order to solve this TV information " overload " problem, electronic program guides must have intelligent, and it can according to the interest of user, and hobby and use history are automatically in advance to user's recommending television.It can also be made adjustment to recommended TV programme from the change of motion tracking user interest simultaneously.The concept of digital television program recommending system that Here it is.
Existing TV programme suggesting method carrys out recommended program according to the dominant character of user and recessive character mostly.When dominant character refers to that user is registered as commending system user, the property attribute provided, comprising: sex, the age, occupation grade for hardware information; Recessive character refers to the time period of user's TV reception, program category, the software informations such as the program often watched.
The shortcoming of the TV program commending method of prior art is: when utilizing dominant information, needs to be registered as commending system user, and provides enough dominant informations, does not take into full account the old user of unskilled grasp mandarin and human-machine operation; When utilizing recessive information, the user personality Expressive Features of utilization is inadequate, can not recommend enough TV programme accurately to user.
Recently, market has also occurred one is based on voice-operated TV programme Adjusted Option.According to user speech, carry out zapping.Such as, user pronunciation " I wants to see Hunan platform ", then automatically switch to Hunan platform.The intelligence degree of this kind of scheme is not high, can only identify fixing statement, can be regarded as a kind of control system, can not be user's intelligent recommendation TV programme.
Summary of the invention
The embodiment of the present invention proposes a kind of TV programme suggesting method and system, can recommend out the program more meeting user's Cultural Language background, strengthens Consumer's Experience, particularly for those and the old user of unskilled grasp mandarin and human-machine operation.
The embodiment of the present invention provides a kind of TV programme suggesting method, comprising:
Receive the voice signal of user;
Described voice signal is converted to discrete speech data;
User's dialect classification used is identified according to described speech data;
The TV programme relevant to described dialect classification is recommended to user.
Further, describedly identify user's dialect classification used according to described speech data, specifically comprise:
To described speech data framing;
Obtain the robust features of each frame speech data, form the fisrt feature sequence X={ x of described speech data 1, x 2..., x m; Wherein, x mrepresent the robust features of M frame speech data;
Remove the silence clip in described fisrt feature sequence X, obtain the second feature sequence Y={y of described speech data 1, y 2..., y n; Wherein, y nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;
According to the second feature sequence Y of described speech data, calculate the likelihood score of described speech data under different dialect model;
The dialect classification that user is used is judged according to the likelihood score of described speech data under different dialect model.
By extracting the robust features of each frame speech data and through after quiet process, the characteristic that can characterize described speech data can being obtained: second feature sequence; And then utilize described second feature sequence to calculate the likelihood score of described speech data under different dialect models; Likelihood score is higher, and illustrate that this group second feature sequence is more similar to described dialect model, namely the dialect model that wherein likelihood score is the highest is judged to be the dialect classification that user is used.
Further, the described second feature sequence Y according to described speech data, calculates the likelihood score of described speech data under different dialect model, specifically according to following formulae discovery:
p ( Y | λ k ) = Π i p ( y i / λ k ) = Π i Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
Wherein, p (Y/ λ k) be the likelihood score of described speech data under kth kind dialect model; P (y i/ λ k) be the robust features y of the i-th frame speech data of described second feature sequence iappear at the probability of a kth dialect model; ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
It should be noted that, above formula is the likelihood function of kth kind dialect model.For the likelihood function of a discrete type, can know namely the likelihood score of described speech data under kth kind dialect equals the product of the probability that each element occurs in second feature sequence.In the present embodiment, described dialect model adopts gauss hybrid models.Gauss hybrid models accurately quantizes things with Gaussian probability-density function (normal distribution curve), and a things is decomposed into some models formed based on Gaussian probability-density function (normal distribution curve).Namely a dialect model is mixed to form by multiple Gauss's submodel.Thus under the probability that each element occurs equals again this dialect model each Gauss's submodel probability and, and each Gauss's submodel is assigned different weights.Gauss hybrid models has the advantage that stability is high and convergence is good, for the accuracy that can improve likelihood score calculating.
Further, before the voice signal of described input user, also comprise the step building dialect model, specifically comprise:
Obtain the second feature sequence based on known dialect;
Adopt the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;
According to the robust features that each classification comprises, maximum likelihood algorithm is adopted to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;
According to the weight of each Gauss's submodel, average and covariance, generate the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence ithe probability appearing at a kth dialect model is
p ( y i / λ k ) = Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
It should be noted that, described acquisition is actually based on the second feature sequence of known dialect the step obtaining sample data, and the process obtaining second feature sequence is same as described above; Each Gauss's submodel is at least assigned the robust features of a frame speech data.
Further, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.
Correspondingly, the embodiment of the present invention also provides a kind of television program recommendation system, and the institute that can implement said method in steps, comprising:
Signal receiving module, for receiving the voice signal of user;
Signal conversion module, for being converted to discrete speech data by described voice signal;
Identification module, for identifying user's dialect classification used according to described speech data;
Recommending module, for recommending the TV programme relevant to described dialect classification to user.
Further, described judge module comprises:
Point frame unit, for described speech data framing;
First ray acquiring unit, for obtaining the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data 1, x 2..., x m; Wherein, x mrepresent the robust features of M frame speech data in described speech data;
Second retrieval unit, for removing the silence clip in described fisrt feature sequence, obtains the second feature sequence Y={y of described speech data 1, y 2..., y n; Wherein, y nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;
Likelihood score computing unit, for the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;
Identifying unit, for judging according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.
Further, the described likelihood score computing unit specifically likelihood score of speech data under different dialect model according to following formulae discovery:
p ( Y | λ k ) = Π i p ( y i / λ k ) = Π i Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
Wherein, p (Y/ λ k) be the likelihood score of described speech data under kth kind dialect model; P (y i/ λ k) be the robust features y of the i-th frame speech data of described second feature sequence iappear at the probability of a kth dialect model; ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
Further, described television program recommendation system also comprises dialect model construction module; Described dialect model construction module specifically comprises:
Sample sequence acquiring unit, for obtaining the second feature sequence based on known dialect;
Cluster cell, for adopting the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;
Model parameter calculation unit, for according to robust features assigned in each Gauss's submodel, adopts maximum likelihood algorithm to calculate the weight of each Gauss's submodel, average and covariance;
Model generation unit, for the weight according to each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence ithe probability appearing at a kth dialect model is
p ( y i / λ k ) = Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
Further, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.
Implement the embodiment of the present invention, there is following beneficial effect: embodiments provide a kind of TV programme suggesting method, according to the voice signal of user, the said dialect classification of user can be judged, and the dialect that recommendation of taking this as a foundation is correlated with is main TV programme.This makes TV can recommend out more to meet the program of user's Cultural Language background, strengthens Consumer's Experience, particularly for those and the old user of unskilled grasp mandarin and human-machine operation.The present invention can see a kind of method of professional recommendation Dialect program as, forms complementation with existing television program recommendation system.Meanwhile, the embodiment of the present invention also provides a kind of television program recommendation system, can perform all method steps of described TV programme suggesting method.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the TV programme suggesting method that the embodiment of the present invention provides;
Fig. 2 is the schematic flow sheet of the step S3 in Fig. 1;
Fig. 3 is the schematic flow sheet of the step S5 in Fig. 1;
Fig. 4 is the structural representation of television program recommendation system provided by the invention;
Fig. 5 is the structural representation of the judge module 3 in Fig. 4;
Fig. 6 is the structural representation of the dialect model construction module 5 in Fig. 4.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
See Fig. 1, be the schematic flow sheet of the TV programme suggesting method that the embodiment of the present invention provides, the method comprises the following steps:
S1, receives the voice signal of user;
S2, is converted to discrete speech data by described voice signal;
S3, identifies user's dialect classification used according to described speech data;
S4, recommends the TV programme relevant to described dialect classification to user.
Dialect can be divided into a variety of, and Modern Chinese dialect can be divided into seven large localism areas, comprises northern dialect, Wu Fangyan, Hunan dialect, Hakka dialect, Fujian dialect, Guangdong dialect, Jiangxi dialect.Such as, when user uses Guangdong dialect, utilize the TV programme suggesting method of the present embodiment can recommend some TV programme of the TV station in Guangdong Province to user; When user uses Hunan dialect, utilize the TV programme suggesting method of the present embodiment can recommend some TV programme of the TV station in Hunan Province to user.Wherein, each TV programme should demarcate affiliated dialect classification in advance.
In step s 4 which, describedly recommending the TV programme relevant to described dialect classification can be by showing a TV program list on television screen to user, or quoting described TV programme by the mode of voice.The TV programme suggesting method of the embodiment of the present invention can be applied to televisor or Set Top Box.
Further, the program commending instruction receiving user can also be comprised the steps: before step S1.Just perform step S1-S4 after only having user to input described program commending instruction, prevent maloperation.
As shown in Figure 2, it is the schematic flow sheet of the step S3 in Fig. 1.
In step s3, describedly identify user's dialect classification used according to described speech data, specifically comprise:
S31, to described speech data framing;
S32, obtains the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data 1, x 2..., x m; Wherein, x mrepresent the robust features of M frame speech data;
S33, removes the silence clip in described fisrt feature sequence X, obtains the second feature sequence Y={y of described speech data 1, y 2..., y n; Wherein, y nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;
S34, according to the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;
S35, judges according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.
By extracting the robust features of each frame speech data and through after quiet process, the characteristic that can characterize described speech data can being obtained: second feature sequence; And then utilize described second feature sequence to calculate the likelihood score of described speech data under different dialect models; Likelihood score is higher, and illustrate that this group second feature sequence is more similar to described dialect model, namely the dialect model that wherein likelihood score is the highest is judged to be the dialect classification that user is used.
Particularly, in step S34, the described second feature sequence Y according to described speech data, calculates the likelihood score of described speech data under different dialect model, specifically according to following formulae discovery:
p ( Y | λ k ) = Π i p ( y i / λ k ) = Π i Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
Wherein, p (Y/ λ k) be the likelihood score of described speech data under kth kind dialect; P (y i/ λ k) be the robust features y of the i-th frame speech data of described second feature sequence iappear at the probability of a kth dialect model; ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect.
It should be noted that, above formula is the likelihood function of kth kind dialect model.For the likelihood function of a discrete type, can know namely the likelihood score of described speech data under kth kind dialect equals the product of the probability that each element occurs in second feature sequence.In the present embodiment, described dialect model adopts gauss hybrid models.Gauss hybrid models accurately quantizes things with Gaussian probability-density function (normal distribution curve), and a things is decomposed into some models formed based on Gaussian probability-density function (normal distribution curve).Namely a dialect model is mixed to form by multiple Gauss's submodel.Thus under the probability that each element occurs equals again this dialect model each Gauss's submodel probability and, and each Gauss's submodel is assigned different weights.Gauss hybrid models has the advantage that stability is high and convergence is good, for the accuracy that can improve likelihood score calculating.
Further, before the voice signal of described input user, also comprise the step S5 building dialect model.As shown in Figure 3, it is the schematic flow sheet of the step S5 in Fig. 1, specifically comprises:
S51, obtains the second feature sequence based on known dialect;
S52,
Adopt the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;
S53, according to the robust features comprised in each classification, adopts maximum likelihood algorithm to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;
S54, according to the weight of each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence ithe probability appearing at a kth dialect model is
p ( y i / λ k ) = Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
It should be noted that, described acquisition is actually based on many groups second feature sequence of the known dialect of same the step obtained as many groups second feature sequence of sample data, and the process obtaining second feature sequence is identical with step S31-S33; Each Gauss's submodel is at least assigned the robust features of a frame speech data.By performing step S51-S54 to different dialects, obtaining the parameter of each Gauss's submodel of each dialect model, thus building each dialect model.
Further, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.
Correspondingly, the embodiment of the present invention also provides a kind of television program recommendation system, can implement the institute of said method in steps.As shown in Figure 4, be the structural representation of television program recommendation system provided by the invention, this system comprises:
Signal input module 1, for inputting the voice signal of user;
Signal conversion module 2, for being converted to discrete speech data by described voice signal;
Judge module 3, for judging the dialect classification that user is used according to described speech data;
Recommending module 4, for recommending the TV programme relevant to described dialect classification to user.
As shown in Figure 5, be the structural representation of judge module 3 in Fig. 4.
Described judge module 3 comprises:
Point frame unit 31, for described speech data framing;
First ray acquiring unit 32, for obtaining the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data 1, x 2..., x m; Wherein, x mrepresent the robust features of M frame speech data in described speech data;
Second retrieval unit 33, for removing the silence clip in described fisrt feature sequence, obtains the second feature sequence Y={y of described speech data 1, y 2..., y n; Wherein, y nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;
Likelihood score computing unit 34, for the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;
Identifying unit 35, for judging according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.
Further, the described likelihood score computing unit 34 specifically likelihood score of speech data under different dialect model according to following formulae discovery:
p ( Y | λ k ) = Π i p ( y i / λ k ) = Π i Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
Wherein, p (Y/ λ k) be the likelihood score of described speech data under kth kind dialect model; P (y i/ λ k) be the robust features y of the i-th frame speech data of described second feature sequence iappear at the probability of a kth dialect model; ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
Further, described television program recommendation system also comprises dialect model construction module 5.As shown in Figure 6, it is the structural representation of the dialect model construction module 5 in Fig. 4.
Described dialect model construction module 5 specifically comprises:
Sample sequence acquiring unit 51, for obtaining the second feature sequence based on known dialect;
Allocation units 52, for
Adopt the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;
Model parameter calculation unit 53, for according to robust features assigned in each Gauss's submodel, adopts maximum likelihood algorithm to calculate the weight of each Gauss's submodel, average and covariance;
Model generation unit 54, for the weight according to each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence ithe probability appearing at a kth dialect model is
p ( y i / λ k ) = Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
Further, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.
Implement the embodiment of the present invention, there is following beneficial effect: embodiments provide a kind of TV programme suggesting method, according to the voice signal of user, the said dialect classification of user can be judged, and the dialect that recommendation of taking this as a foundation is correlated with is main TV programme.This makes TV can recommend out more to meet the program of user's Cultural Language background, strengthens Consumer's Experience, particularly for those and the old user of unskilled grasp mandarin and human-machine operation.The present invention can see a kind of method of professional recommendation Dialect program as, forms complementation with existing television program recommendation system.Meanwhile, the embodiment of the present invention also provides a kind of television program recommendation system, can perform all method steps of described TV programme suggesting method.
One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims (10)

1. a TV programme suggesting method, is characterized in that, comprising:
Receive the voice signal of user;
Described voice signal is converted to discrete speech data;
The dialect classification that user uses is identified according to described speech data;
The TV programme relevant to described dialect classification is recommended to user.
2. TV programme suggesting method as claimed in claim 1, is characterized in that, describedly identifies user's dialect classification used according to described speech data, specifically comprises:
To described speech data framing;
Obtain the robust features of each frame speech data, form the fisrt feature sequence X={ x of described speech data 1, x 2..., x m; Wherein, x mrepresent the robust features of M frame speech data;
Remove the silence clip in described fisrt feature sequence X, obtain the second feature sequence Y={y of described speech data 1, y 2..., y n; Wherein, y nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;
According to the second feature sequence Y of described speech data, calculate the likelihood score of described speech data under different dialect model;
The dialect classification that user is used is judged according to the likelihood score of described speech data under different dialect model.
3. TV programme suggesting method as claimed in claim 2, is characterized in that, according to the second feature sequence Y of described speech data, calculate the likelihood score of described speech data under different dialect model, specifically according to following formulae discovery:
p ( Y / λ k ) = Π i p ( y i / λ k ) = Π i Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
Wherein, p (Y/ λ k) be the likelihood score of described speech data under kth kind dialect model; P (y i/ λ k) be the robust features y of the i-th frame speech data of described second feature sequence iappear at the probability of a kth dialect model; ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
4. TV programme suggesting method as claimed in claim 2 or claim 3, is characterized in that, before the voice signal of described input user, also comprises the step building dialect model, specifically comprises:
Obtain the second feature sequence based on known dialect;
Adopt the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;
According to the robust features that each classification comprises, maximum likelihood algorithm is adopted to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;
According to the weight of each Gauss's submodel, average and covariance, generate the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence ithe probability appearing at a kth dialect model is
p ( y i / λ k ) = Σ ] j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j )
ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
5. TV programme suggesting method as claimed in claim 4, is characterized in that, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.
6. a television program recommendation system, is characterized in that, comprising:
Signal receiving module, for receiving the voice signal of user;
Signal conversion module, for being converted to discrete speech data by described voice signal;
Identification module, for identifying the dialect classification that user uses according to described speech data;
Recommending module, for recommending the TV programme relevant to described dialect classification to user.
7. television program recommendation system as claimed in claim 6, it is characterized in that, described identification module comprises:
Point frame unit, for described speech data framing;
First ray acquiring unit, for obtaining the robust features of each frame speech data, forms the fisrt feature sequence X={ x of described speech data 1, x 2..., x m; Wherein, x mrepresent the robust features of M frame speech data in described speech data;
Second retrieval unit, for removing the silence clip in described fisrt feature sequence, obtains the second feature sequence Y={y of described speech data 1, y 2..., y n; Wherein, y nthe robust features of N frame speech data after silence clip in the described fisrt feature sequence X of representative removal, N≤M;
Likelihood score computing unit, for the second feature sequence Y of described speech data, calculates the likelihood score of described speech data under different dialect model;
Identifying unit, for judging according to the likelihood score of described speech data under different dialect model the dialect classification that user is used.
8. television program recommendation system as claimed in claim 7, it is characterized in that, described likelihood score computing unit is the likelihood score of speech data under different dialect model according to following formulae discovery specifically:
p ( Y / λ k ) = Π i p ( y i / λ k ) = Π i Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
Wherein, p (Y/ λ k) be the likelihood score of described speech data under kth kind dialect model; P (y i/ λ k) be the robust features y of the i-th frame speech data of described second feature sequence iappear at the probability of a kth dialect model; ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
9. television program recommendation system as claimed in claim 7 or 8, it is characterized in that, described television program recommendation system also comprises dialect model construction module; Described dialect model construction module specifically comprises:
Sample sequence acquiring unit, for obtaining the second feature sequence based on known dialect;
Cluster cell, for adopting the method for decision tree-based clustering that the robust features of each frame speech data of described second feature sequence is carried out cluster, each classification adopts Gauss's submodel to characterize;
Model parameter calculation unit, for the robust features comprised according to each classification, adopts maximum likelihood algorithm to calculate the weight of the Gauss's submodel corresponding to each classification, average and covariance;
Model generation unit, for the weight according to each Gauss's submodel, average and covariance, generates the dialect model of described known dialect; Wherein, the robust features y of the i-th frame speech data of described second feature sequence ithe probability appearing at a kth dialect model is
p ( y i / λ k ) = Σ j ω ( k ) j 1 ( 2 π ) 1 / 2 | C ( k ) j | 1 / 2 exp [ - 1 2 ( y i - μ ( k ) j ) T C ( k ) j - 1 ( y i - μ ( k ) j ) ]
ω (k) jfor the weight of jth Gauss's submodel of kth kind dialect model; C (k) jfor the covariance of jth Gauss's submodel of kth kind dialect model; μ (k) jfor the average of jth Gauss's submodel of kth kind dialect model.
10. television program recommendation system as claimed in claim 9, is characterized in that, described robust features comprises the energy of each frame speech data, mel-frequency cepstrum coefficient, the first order difference of mel-frequency cepstrum coefficient and second order difference.
CN201510098643.9A 2015-03-05 2015-03-05 Television program recommendation method and system Pending CN104766607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510098643.9A CN104766607A (en) 2015-03-05 2015-03-05 Television program recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510098643.9A CN104766607A (en) 2015-03-05 2015-03-05 Television program recommendation method and system

Publications (1)

Publication Number Publication Date
CN104766607A true CN104766607A (en) 2015-07-08

Family

ID=53648391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510098643.9A Pending CN104766607A (en) 2015-03-05 2015-03-05 Television program recommendation method and system

Country Status (1)

Country Link
CN (1) CN104766607A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105683964A (en) * 2016-01-07 2016-06-15 马岩 Network social contact searching method and system
CN105677722A (en) * 2015-12-29 2016-06-15 百度在线网络技术(北京)有限公司 Method and apparatus for recommending friends in social software
CN108172212A (en) * 2017-12-25 2018-06-15 横琴国际知识产权交易中心有限公司 A kind of voice Language Identification and system based on confidence level
CN108810566A (en) * 2018-06-12 2018-11-13 忆东兴(深圳)科技有限公司 A kind of smart television dialect is interpreted method
CN114449342A (en) * 2022-01-21 2022-05-06 腾讯科技(深圳)有限公司 Video recommendation method and device, computer readable storage medium and computer equipment
CN115497475A (en) * 2022-09-21 2022-12-20 深圳市人马互动科技有限公司 Information recommendation method based on voice interaction system and related device

Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1298533A (en) * 1998-04-22 2001-06-06 国际商业机器公司 Adaptation of a speech recognizer for dialectal and linguistic domain variations
US6411930B1 (en) * 1998-11-18 2002-06-25 Lucent Technologies Inc. Discriminative gaussian mixture models for speaker verification
CN1735924A (en) * 2002-11-21 2006-02-15 松下电器产业株式会社 Standard model creating device and standard model creating method
CN1763843A (en) * 2005-11-18 2006-04-26 清华大学 Pronunciation quality evaluating method for language learning machine
CN101154380A (en) * 2006-09-29 2008-04-02 株式会社东芝 Method and device for registration and validation of speaker's authentication
CN101241699A (en) * 2008-03-14 2008-08-13 北京交通大学 A speaker identification system for remote Chinese teaching
CN101286317A (en) * 2008-05-30 2008-10-15 同济大学 Speech recognition device, model training method and traffic information service platform
CN101436403A (en) * 2007-11-16 2009-05-20 创新未来科技有限公司 Method and system for recognizing tone
CN101552004A (en) * 2009-05-13 2009-10-07 哈尔滨工业大学 Method for recognizing in-set speaker
CN101573749A (en) * 2006-12-15 2009-11-04 摩托罗拉公司 Method and apparatus for robust speech activity detection
CN101622660A (en) * 2007-02-28 2010-01-06 日本电气株式会社 Audio recognition device, audio recognition method, and audio recognition program
CN101645269A (en) * 2008-12-30 2010-02-10 中国科学院声学研究所 Language recognition system and method
US20110071823A1 (en) * 2008-06-10 2011-03-24 Toru Iwasawa Speech recognition system, speech recognition method, and storage medium storing program for speech recognition
CN102184732A (en) * 2011-04-28 2011-09-14 重庆邮电大学 Fractal-feature-based intelligent wheelchair voice identification control method and system
CN102231281A (en) * 2011-07-18 2011-11-02 渤海大学 Voice visualization method based on integration characteristic and neural network
CN102238190A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Identity authentication method and system
CN102290047A (en) * 2011-09-22 2011-12-21 哈尔滨工业大学 Robust speech characteristic extraction method based on sparse decomposition and reconfiguration
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN103313108A (en) * 2013-06-14 2013-09-18 山东科技大学 Smart TV program recommending method based on context aware
CN103474061A (en) * 2013-09-12 2013-12-25 河海大学 Automatic distinguishing method based on integration of classifier for Chinese dialects
CN103491411A (en) * 2013-09-26 2014-01-01 深圳Tcl新技术有限公司 Method and device based on language recommending channels
CN103546773A (en) * 2013-08-15 2014-01-29 Tcl集团股份有限公司 Television program recommendation method and system
CN103839545A (en) * 2012-11-23 2014-06-04 三星电子株式会社 Apparatus and method for constructing multilingual acoustic model
CN103945250A (en) * 2013-01-17 2014-07-23 三星电子株式会社 Image processing apparatus, control method thereof, and image processing system
CN103943104A (en) * 2014-04-15 2014-07-23 海信集团有限公司 Voice information recognition method and terminal equipment
CN104038788A (en) * 2014-06-19 2014-09-10 中山大学深圳研究院 Community social network system and content recommendation method
CN104123934A (en) * 2014-07-23 2014-10-29 泰亿格电子(上海)有限公司 Speech composition recognition method and system
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1298533A (en) * 1998-04-22 2001-06-06 国际商业机器公司 Adaptation of a speech recognizer for dialectal and linguistic domain variations
US6411930B1 (en) * 1998-11-18 2002-06-25 Lucent Technologies Inc. Discriminative gaussian mixture models for speaker verification
CN1735924A (en) * 2002-11-21 2006-02-15 松下电器产业株式会社 Standard model creating device and standard model creating method
CN1763843A (en) * 2005-11-18 2006-04-26 清华大学 Pronunciation quality evaluating method for language learning machine
CN101154380A (en) * 2006-09-29 2008-04-02 株式会社东芝 Method and device for registration and validation of speaker's authentication
CN101573749A (en) * 2006-12-15 2009-11-04 摩托罗拉公司 Method and apparatus for robust speech activity detection
CN101622660A (en) * 2007-02-28 2010-01-06 日本电气株式会社 Audio recognition device, audio recognition method, and audio recognition program
CN101436403A (en) * 2007-11-16 2009-05-20 创新未来科技有限公司 Method and system for recognizing tone
CN101241699A (en) * 2008-03-14 2008-08-13 北京交通大学 A speaker identification system for remote Chinese teaching
CN101286317A (en) * 2008-05-30 2008-10-15 同济大学 Speech recognition device, model training method and traffic information service platform
US20110071823A1 (en) * 2008-06-10 2011-03-24 Toru Iwasawa Speech recognition system, speech recognition method, and storage medium storing program for speech recognition
CN101645269A (en) * 2008-12-30 2010-02-10 中国科学院声学研究所 Language recognition system and method
CN101552004A (en) * 2009-05-13 2009-10-07 哈尔滨工业大学 Method for recognizing in-set speaker
CN102184732A (en) * 2011-04-28 2011-09-14 重庆邮电大学 Fractal-feature-based intelligent wheelchair voice identification control method and system
CN102231281A (en) * 2011-07-18 2011-11-02 渤海大学 Voice visualization method based on integration characteristic and neural network
CN102238190A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Identity authentication method and system
CN102290047A (en) * 2011-09-22 2011-12-21 哈尔滨工业大学 Robust speech characteristic extraction method based on sparse decomposition and reconfiguration
CN102394062A (en) * 2011-10-26 2012-03-28 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN103839545A (en) * 2012-11-23 2014-06-04 三星电子株式会社 Apparatus and method for constructing multilingual acoustic model
CN103945250A (en) * 2013-01-17 2014-07-23 三星电子株式会社 Image processing apparatus, control method thereof, and image processing system
CN103313108A (en) * 2013-06-14 2013-09-18 山东科技大学 Smart TV program recommending method based on context aware
CN103546773A (en) * 2013-08-15 2014-01-29 Tcl集团股份有限公司 Television program recommendation method and system
CN103474061A (en) * 2013-09-12 2013-12-25 河海大学 Automatic distinguishing method based on integration of classifier for Chinese dialects
CN103491411A (en) * 2013-09-26 2014-01-01 深圳Tcl新技术有限公司 Method and device based on language recommending channels
CN103943104A (en) * 2014-04-15 2014-07-23 海信集团有限公司 Voice information recognition method and terminal equipment
CN104038788A (en) * 2014-06-19 2014-09-10 中山大学深圳研究院 Community social network system and content recommendation method
CN104123934A (en) * 2014-07-23 2014-10-29 泰亿格电子(上海)有限公司 Speech composition recognition method and system
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CARLOS LIMA ET AL: ""A Robust Features Extraction for Automatic Speech Recognition in Noisy Environments"", 《ICSP PROCESSINGS》 *
MING-LIANG GU ET AL: ""Chinese Dialect Identification Using SC-GMM"", 《ADVANCED MATERIALS RESEARCH》 *
WUEI-HE TSAI ET AL: ""Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification"", 《ELSEVIER》 *
杨澄宇: ""基于高斯混合模型的说话人确认系统"", 《计算机应用》 *
王岐学 等: ""基于差分特征和高斯混合模型的湖南方言识别"", 《计算机工程与应用》 *
顾明亮: ""基于高斯混合模型的汉语方言辨识系统"", 《计算机工程与应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677722A (en) * 2015-12-29 2016-06-15 百度在线网络技术(北京)有限公司 Method and apparatus for recommending friends in social software
CN105683964A (en) * 2016-01-07 2016-06-15 马岩 Network social contact searching method and system
WO2017117786A1 (en) * 2016-01-07 2017-07-13 马岩 Social network search method and system
CN108172212A (en) * 2017-12-25 2018-06-15 横琴国际知识产权交易中心有限公司 A kind of voice Language Identification and system based on confidence level
CN108172212B (en) * 2017-12-25 2020-09-11 横琴国际知识产权交易中心有限公司 Confidence-based speech language identification method and system
CN108810566A (en) * 2018-06-12 2018-11-13 忆东兴(深圳)科技有限公司 A kind of smart television dialect is interpreted method
CN114449342A (en) * 2022-01-21 2022-05-06 腾讯科技(深圳)有限公司 Video recommendation method and device, computer readable storage medium and computer equipment
CN114449342B (en) * 2022-01-21 2024-02-27 腾讯科技(深圳)有限公司 Video recommendation method, device, computer readable storage medium and computer equipment
CN115497475A (en) * 2022-09-21 2022-12-20 深圳市人马互动科技有限公司 Information recommendation method based on voice interaction system and related device

Similar Documents

Publication Publication Date Title
CN104766607A (en) Television program recommendation method and system
CN108509619B (en) Voice interaction method and device
CN110838286B (en) Model training method, language identification method, device and equipment
CN106503236B (en) Artificial intelligence based problem classification method and device
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN108538294A (en) A kind of voice interactive method and device
CN113590850A (en) Multimedia data searching method, device, equipment and storage medium
CN106302987A (en) A kind of audio frequency recommends method and apparatus
CN111508505B (en) Speaker recognition method, device, equipment and storage medium
CN103970802A (en) Song recommending method and device
CN102073631A (en) Video news unit dividing method by using association rule technology
CN107358947A (en) Speaker recognition methods and system again
CN108710653B (en) On-demand method, device and system for reading book
CN114676689A (en) Sentence text recognition method and device, storage medium and electronic device
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN101419799A (en) Speaker identification method based mixed t model
CN106204103A (en) The method of similar users found by a kind of moving advertising platform
CN111724766A (en) Language identification method, related equipment and readable storage medium
CN110378190A (en) Video content detection system and detection method based on topic identification
CN104331717B (en) The image classification method that a kind of integration characteristics dictionary structure is encoded with visual signature
CN113220929A (en) Music recommendation method based on time-staying and state-staying mixed model
CN107507627B (en) Voice data heat analysis method and system
CN108153875A (en) Language material processing method, device, intelligent sound box and storage medium
Leng et al. Audio scene recognition based on audio events and topic model
CN113111855B (en) Multi-mode emotion recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150708

RJ01 Rejection of invention patent application after publication