CN104853236A - Smart television switching control method and device thereof - Google Patents

Smart television switching control method and device thereof Download PDF

Info

Publication number
CN104853236A
CN104853236A CN201510020151.8A CN201510020151A CN104853236A CN 104853236 A CN104853236 A CN 104853236A CN 201510020151 A CN201510020151 A CN 201510020151A CN 104853236 A CN104853236 A CN 104853236A
Authority
CN
China
Prior art keywords
signal
sound
subelement
tone signal
sound signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510020151.8A
Other languages
Chinese (zh)
Inventor
于忠清
王亮
原泉
姚书磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haiersoft Co Ltd
Original Assignee
Qingdao Haiersoft Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haiersoft Co Ltd filed Critical Qingdao Haiersoft Co Ltd
Priority to CN201510020151.8A priority Critical patent/CN104853236A/en
Publication of CN104853236A publication Critical patent/CN104853236A/en
Pending legal-status Critical Current

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the invention provides a smart television switching control method and device thereof. The method includes receiving a first sound signal sent by a current object whose identity is to be recognized and performing first preprocessing on a first sound signal; extracting feature parameters of the first sound signal from the first sound signal subjected to the first pre-processing; matching the feature parameters of the first sound signals with a vocal print template stored in a vocal print template base and recognizing the identity of the current object whose identity is to be recognized according to a matching result, wherein the vocal print template base is a collection of vocal print templates corresponding to different objects obtained through training based on a second sound signal sent by at least one object; and performing switching control of the smart television according to the recognized identity information. By adopting the method and device provided by the invention, intelligent control of smart television switching can be realized.

Description

A kind of on-off control method of intelligent television and device thereof
Technical field
The execution mode of the application relates to Information Control Service reason technical field, particularly relates to a kind of on-off control method and device thereof of intelligent television.
Background technology
Along with the propelling of informationization technology, intelligent television is more and more universal, has become the important daily living article of people's amusement and leisure, obtaining information.Intelligent television shows " intelligence " in a lot, such as, can pass through network automatic acquisition TV programme, arranges carry out information pushing etc. according to user." intellectuality " of intelligent television be many facilities for people provide.
But inventor finds, no matter is current intelligent television, or traditions of the past TV, in the switching on and shutting down problem of television set, all comparatively bothers.For conventional television, on the panel of television set, (or right side, downside etc.) is provided with physical switch button usually, realizes switching on and shutting down operation by people for pressing this physical contact mode of this shift knob.For current intelligent television, usual use remote control panel carries out switching on and shutting down control, owing to being provided with radio-frequency device in remote control panel, after user presses the switching on and shutting down button of remote control panel, this device will send start/shutdown radiofrequency signal to television set, after receiving system in television set receives this signal, control TV set power and open or close.This mode relative in traditional tv by for the physical contact mode of pressing shift knob on television set, the switch control rule of television set can be realized preferably, facilitate the life of people to a certain extent.But, although which has strided forward major step towards intelligent direction, still there are problems.Such as, before start/shutdown, people may remote control panel and feeling very bothers, remote control panel needs periodic replacement battery, needs and specific television set supporting because find, and in addition, also adds the holistic cost of television set to a certain extent.
Summary of the invention
In order to solve the problem, the application's execution mode provides a kind of on-off control method and device thereof of intelligent television, to change existing control mode, improves the intelligent level of intelligent television.
The on-off control method of the intelligent television that the application's execution mode provides comprises:
The first sound tone signal that the existing object receiving identity to be identified sends, carries out the first preliminary treatment to described first sound tone signal;
From the characteristic parameter extracting this first sound tone signal through the first pretreated first sound tone signal;
The characteristic parameter of the first sound tone signal of extraction is mated with the vocal print template be stored in vocal print ATL, identify according to the identity of matching result to the existing object of described identity to be identified, described vocal print ATL is the set of the vocal print template that the second sound signal sent in advance according at least one object carries out training each object of obtaining corresponding;
The switching on and shutting down carrying out intelligent television according to the identity information identified control.
Preferably, the second sound signal sent in advance according to object carries out training the vocal print template obtaining object corresponding specifically to comprise:
Receive the second sound signal that object sends, the second preliminary treatment is carried out to described second sound signal;
From the characteristic parameter extracting this second sound signal through the second pretreated second sound signal;
Utilize the characteristic parameter training preset sound model of second sound signal, obtain the vocal print template corresponding with object.
Preferably, describedly the second preliminary treatment is carried out to second sound signal comprise analog-to-digital conversion is carried out to second sound signal, or, end-point detection is carried out to second sound signal, or, analog-to-digital conversion and sub-frame processing and/or end-point detection are carried out to second sound signal, wherein:
Carry out end-point detection to second sound signal specifically to comprise:
Obtain the energy of described second sound signal;
Described energy and preset energy value are compared, the part being greater than preset energy value in second sound signal is defined as effective voice parts;
Carry out sub-frame processing to second sound signal specifically to comprise:
Windowing process is carried out to described second sound signal second sound signal to be divided into the frame signal that length is preset length.
Preferably, the described characteristic parameter from extracting this second sound signal through the second pretreated second sound signal specifically comprises:
Fourier transform is carried out to through the second pretreated second sound signal;
Filtering is carried out to the signal after Fourier transform;
Discrete cosine transform detection is carried out to filtered signal, and extracts the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as second sound signal.
Preferably, the general coefficient of described extraction Mei Er frequency specifically comprises:
Non-linear compression is carried out to the signal carried out after discrete transform detection, obtains each bank of filters;
Carry out loudness to each bank of filters to solve and obtain loudness;
Conversion is carried out to described loudness and obtains sense of hearing autocorrelator trace;
Linear predictor coefficient is extracted according to sense of hearing autocorrelator trace;
The general coefficient of Mei Er frequency is calculated according to described linear predictor coefficient.
Preferably, the described characteristic parameter training preset sound model utilizing second sound signal, obtains the vocal print template corresponding with object and specifically comprises:
Utilize the characteristic parameter of second sound signal to train general hidden Markov model, obtain corresponding exclusive hidden Markov model unique with object, using described exclusive hidden Markov model as the vocal print template corresponding with object.
Preferably, describedly the first preliminary treatment carried out to first sound tone signal specifically comprise:
Analog-to-digital conversion is carried out to first sound tone signal, or, end-point detection is carried out to first sound tone signal, or, analog-to-digital conversion and sub-frame processing and/or end-point detection are carried out to first sound tone signal, wherein:
Carry out end-point detection to first sound tone signal specifically to comprise:
Obtain the energy of described first sound tone signal;
Described energy and preset energy value are compared, the part being greater than preset energy value in first sound tone signal is defined as effective voice parts;
Carry out sub-frame processing to first sound tone signal specifically to comprise:
Windowing process is carried out first sound tone signal is divided into the frame signal that length is preset length to described first sound tone signal;
The described characteristic parameter from extracting this first sound tone signal through the first pretreated first sound tone signal specifically comprises:
Fourier transform is carried out to through the first pretreated first sound tone signal;
Filtering is carried out to the signal after Fourier transform;
Discrete cosine transform detection is carried out to filtered signal, and extracts the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as first sound tone signal.
The application embodiment further provides a kind of opening/closing control device of intelligent television.This device comprises: the first receiving element, the first pretreatment unit, the first extraction unit, match cognization unit and control unit, wherein:
Described first receiving element, the first sound tone signal that the existing object for receiving identity to be identified sends;
Described first pretreatment unit, for carrying out the first preliminary treatment to described first sound tone signal;
Described first extraction unit, for from the characteristic parameter extracting this first sound tone signal in the first pretreated first sound tone signal;
Described match cognization unit, characteristic parameter for the first sound tone signal by extraction mates with the vocal print template be stored in vocal print ATL, identify according to the identity of matching result to the existing object of described identity to be identified, described vocal print ATL is the set of the vocal print template that the second sound signal sent in advance according at least one object carries out training each object of obtaining corresponding;
Described control unit, controls for the switching on and shutting down carrying out intelligent television according to the identity information identified.
Preferably, described device also comprises training unit, second sound signal for sending in advance according to object carries out training and obtains vocal print template corresponding to object, described training unit comprises the second reception subelement, the second preliminary treatment subelement, second extracts subelement, model training subelement, wherein:
Described second receives subelement, for receiving the second sound signal that object sends;
Described second preliminary treatment subelement, for carrying out the second preliminary treatment to described second sound signal;
Described second extracts subelement, for from the characteristic parameter extracting this second sound signal in the second pretreated second sound signal;
Described model training subelement, for utilizing the characteristic parameter training preset sound model of second sound signal, obtains the vocal print template corresponding with object.
Preferably, the second preliminary treatment subelement comprises the second analog-to-digital conversion subelement, or the second end-point detection subelement, or the second analog-to-digital conversion subelement and the second sub-frame processing subelement and/or the second end-point detection subelement, wherein:
Described second end-point detection subelement, for obtaining the energy of described second sound signal, comparing described energy and preset energy value, the part being greater than preset energy value in second sound signal is defined as effective voice parts;
Described second sub-frame processing subelement, for carrying out windowing process to described second sound signal second sound signal to be divided into the frame signal that length is preset length.
Preferably, described second extracts subelement comprises the second varitron unit, the second filtering subelement, the second detection sub-unit and the second parameter extraction subelement, wherein:
Described second varitron unit, for carrying out Fourier transform to through the second pretreated second sound signal;
Described second filtering subelement, for carrying out filtering to the signal after Fourier transform;
Described second detection sub-unit, for carrying out discrete cosine transform detection to filtered signal;
Described second parameter extraction subelement, for extracting the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as second sound signal.
Preferably, described first pretreatment unit comprises the first analog-to-digital conversion subelement, or the first end-point detection subelement, or the first analog-to-digital conversion subelement and the first sub-frame processing subelement and/or the first end-point detection subelement, wherein:
Described first end-point detection subelement, for obtaining the energy of described first sound tone signal, compares described energy and preset energy value, the part being greater than preset energy value in first sound tone signal is defined as effective voice parts;
Described first sub-frame processing subelement, for carrying out windowing process first sound tone signal is divided into the frame signal that length is preset length to described first sound tone signal.
Preferably, described first extraction unit comprises the first varitron unit, the first filtering subelement, the first detection sub-unit and the first parameter extraction subelement, wherein:
Described first varitron unit, for carrying out Fourier transform to through the first pretreated first sound tone signal;
Described first filtering subelement, for carrying out filtering to the signal after Fourier transform;
Described first detection sub-unit, for carrying out discrete cosine transform detection to filtered signal;
Described first parameter extraction subelement, for extracting the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as first sound tone signal.
The application's execution mode carries out characteristic parameter extraction after the pre-treatment to the voice signal received, then the characteristic parameter of extraction is utilized to mate with the vocal print template in vocal print ATL, carry out identification according to matching result, thus carry out the switching on and shutting down control of intelligent television based on the identity information identified.Compared with prior art, this mode does not need to carry out contact physical operations, does not need the button pressed on remote control panel yet, but realizes, to the control of intelligent television switching on and shutting down, greatly facilitating TV user by the mode obtaining sound.And, carrying out in switch control rule, also identified by the identity of vocal print template to the object of sounding, only having identified object could realize controlling the switching on and shutting down of intelligent television, thus achieve the effect of " customized type control ".
Accompanying drawing explanation
By reference to accompanying drawing reading detailed description hereafter, above-mentioned and other objects of exemplary embodiment of the invention, feature and advantage will become easy to understand.In the accompanying drawings, show some execution modes of the present invention by way of example, and not by way of limitation, wherein:
Fig. 1 is the application's a kind of Application Scenarios-Example applicatory;
Fig. 2 is the flow chart of an embodiment of the on-off control method of intelligent television;
Fig. 3 is a kind of execution mode flow chart setting up vocal print template of the application;
Fig. 4 is the structured flowchart of an embodiment of the opening/closing control device of intelligent television.
Embodiment
Below with reference to some illustrative embodiments, principle of the present invention and spirit are described.Should be appreciated that providing these execution modes is only used to enable those skilled in the art understand better and then realize the present invention, and not limit the scope of the invention by any way.On the contrary, provide these execution modes to be to make disclosing of the application more thorough and complete, and scope disclosed in the present application intactly can be conveyed to those skilled in the art.
See Fig. 1, the figure shows the adaptable a kind of scene of the application, in this scenario, intelligent television 11 inside arranges the equipment that one receives sound, such as, built-in microphone etc., this intelligent television receives the sound controlled from the TV turning on-off of certain (a bit) user 12, this sound can be delivered in the equipment of reception sound of intelligent television by natural media such as air, also can be delivered in the equipment receiving sound by sound conduction media such as special circuits, after intelligent television 11 receives the sound from user, carry out inter-process, whether identify the current user sounded is predesignated subscriber, whether the sound-content sent is predetermined instruction content, then switching on and shutting down action is carried out according to the switch block of recognition result startup intelligent television, thus the switching on and shutting down realizing intelligent television control.In other application scenarios, other some building blocks or device can be there is.These identical or different various application scenarioss, the on-off control method of the turn television of the intelligent television that all execution mode of the application can be used to provide.
See Fig. 2, the figure shows the flow process of an embodiment of the on-off control method of a kind of intelligent television that the application provides, this flow process comprises:
Step S21: the first sound tone signal that the existing object receiving identity to be identified sends, carries out the first preliminary treatment to described first sound tone signal;
As previously mentioned, the application needs by the control of sound realization to intelligent television switching on and shutting down.For this reason, first need to receive the voice signal from the external world, this voice signal comes from certain object, this to as if need the existing object carrying out identification, this object can be presented as the people that can send voice messaging, and " he " wishes that realizing switching on and shutting down by the sound of oneself to intelligent television controls.Such as, intelligent television is made to transform to open state from holding state (or off-mode) or make intelligent television become holding state (or off-mode) from open state.After getting voice signal, some measure can be taked to carry out preliminary treatment to acoustic information, pretreatment operation can make voice signal be more suitable for follow-up operation, such as, under normal circumstances, voice signal is analog signal, and preliminary treatment so here can be that the voice signal of simulation is converted to digital signal, thus feature extraction, coupling etc. is operated to the convenience provided to a certain degree.
Step S22: from the characteristic parameter extracting this first sound tone signal through the first pretreated first sound tone signal;
Voice signal is after the preliminary treatment of abovementioned steps, and therefrom can carry out the extraction of characteristic parameter, characteristic parameter here can be presented as various different feature.Because the sound of the application can comprise two aspects (especially second aspect) to the control of television set: one is the control of sound-content; Two is the control of the identity of the object of sounding, the characteristic parameter so extracted should be able to reflect the information of voice signal at sound-content and object identity message context, certainly, although this two parts can be comprised, but these two parts can appear in characteristic parameter according to certain ratio, stress to embody the difference of user to the concrete mode of Sound control intelligent television switching on and shutting down.In actual application, the characteristic parameter proposed can be the general envelope parameters, linear predictor coefficient, auto-correlation coefficient, reflection coefficient, log area ratio, linear predictive residual etc. of voice signal, or, after extracting these parameters, it is combined according to certain mode, using the combination parameter that formed after combination as characteristic parameter.
Step S23: the characteristic parameter of the first sound tone signal of extraction is mated with the vocal print template be stored in vocal print ATL, identify according to the identity of matching result to the existing object of described identity to be identified, described vocal print ATL is the set of the vocal print template that the second sound signal sent in advance according at least one object carries out training each object of obtaining corresponding;
Store multiple vocal print template in vocal print ATL, these vocal print templates can reflect the sound characteristic of certain object to a great extent, to such an extent as to set up the tight corresponding relation between certain object by this vocal print template.This vocal print template is that the voice signal by sending certain (a bit) object is trained and obtained, namely using the sound of certain object as sample sound, carry out voice training according to certain model, form the template corresponding with the object of sounding.Here, have need to illustrate at 2: one is the relation between " existing object of identity to be identified " mentioned in " object " and abovementioned steps S21 in this step.The existing object of identity to be identified is in the step s 21 certain special object that current hope realizes by the sound of himself controlling intelligent television switching on and shutting down, and " object " in this step is the not special object as sample sound source, namely think following certain time can do by myself and go to realize that (a bit) object to intelligent television switch control rule by the sound of oneself, this object is in order to reach this object, need the vocal print template first setting up oneself in vocal print ATL, mate so that follow-up, but, also likely this object carried out the switching on and shutting down operation of real Sound control intelligent television never again after setting up vocal print template, as can be seen here, " existing object of identity to be identified " may not belong to one of those objects when setting up vocal print template, also may belong to its one of, similarly, " object " in this step can be the actual object carrying out the switching on and shutting down of Sound control intelligent television, also may not be, that is, on the whole, at least common factor is there is in the set of " existing object of identity to be identified " with the set of the object setting up vocal print template, this is above-mentioned relation between the two.Two is the relations between " first sound tone signal " and " second sound signal ", here the ordinal number that " first ", " second " are such is added before " voice signal ", object is differentiation two " voice signal ", according to the situation of the application, first sound tone signal is the current voice signal carrying out receiving when the switching on and shutting down of Sound control intelligent television operate, and second sound signal is setting up the sample signal received in vocal print template procedure.
Extracted the characteristic parameter of present sound signals by abovementioned steps after, this characteristic parameter can be mated one by one with the vocal print template in vocal print ATL, coupling can be completely style coupling, the i.e. two-value coupling of "Yes" and "No" formula also can be the interval coupling with certain threshold range.For front a kind of matching way, two results may be there are: one is to mate by matching operation, namely in vocal print ATL, have found the vocal print template met with the characteristic parameter of current extraction, in this case, can confirm the identity of the existing object of identity to be identified; Two is to mate, namely in vocal print ATL, do not find the vocal print template met with the characteristic parameter of current extraction, in this case, it is generally acknowledged, the existing object of this identity to be identified may not have legal identity, does not namely have the object setting up vocal print template in advance in vocal print ATL.For rear a kind of matching way, although the characteristic parameter extracted may not be meet completely with the vocal print template in vocal print ATL, but, the degree met is in the threshold range preset, so in this case, also can think that matching result is certainty matching result, and then the identification of the object of identity to be identified is carried out according to this certainty matching result, certainly, if the degree met is outside the threshold range preset, so can think that matching result is negativity matching result, and then the identification of the object of identity to be identified is carried out according to this negativity matching result.
Step S24: the switching on and shutting down carrying out intelligent television according to the identity information identified control;
Identified the identity information of existing object by abovementioned steps after, the switching on and shutting down that this identity information can be utilized to realize intelligent television control, and switching on and shutting down control to comprise three kinds of situations: one is the start controlling intelligent television according to the identity information identified; Two is the shutdown controlling intelligent television according to the identity information identified; Three are the identity informations that possible identify is illegal identity, and so current intelligent television may can not make obvious response, namely still can rest on original state.
The application's execution mode carries out characteristic parameter extraction after the pre-treatment to the voice signal received, then the characteristic parameter of extraction is utilized to mate with the vocal print template in vocal print ATL, carry out identification according to matching result, thus carry out the switching on and shutting down control of intelligent television based on the identity information identified.Compared with prior art, this mode no longer needs the physical operations of contact, also no longer remote control panel is needed, but realize the control to intelligent television switching on and shutting down by the mode obtaining user voice, thus greatly facilitate TV user, save the holistic cost of intelligent television, improve the intelligent level of intelligent television.And, carrying out in switch control rule, also identified by the identity of vocal print template to the object of sounding, only there is identified object could realize controlling the switching on and shutting down of intelligent television, the personnel of control intelligent television switching on and shutting down are limited within particular range, thus achieve the effect of " customized type control ".
In abovementioned steps S23, mention that the vocal print template in vocal print ATL can be that the second sound signal sent in advance according to object carries out training the vocal print template corresponding with this object obtained.In actual application, various ways can be taked to realize the foundation of vocal print template.Such as, voice signal can be obtained and extract spectrum envelop parameter voice messaging in aforementioned describing, so can set up vocal print template based on this information.See Fig. 3, this diagram has showed a kind of flow process setting up the preferred implementation of vocal print template that the application provides, and this flow process comprises:
Step S31: receive the second sound signal that object sends, the second preliminary treatment is carried out to described second sound signal;
Step S32: from the characteristic parameter extracting this second sound signal through the second pretreated second sound signal;
In actual application, need the characteristic parameter difference extracted, then the mode extracted may there are differences.As previously mentioned, if characteristic parameter is spectrum envelop parameter voice messaging, then first by through pretreated voice signal input filter group, then with suitable speed, the signal that filter exports can be sampled, thus obtains the characteristic parameter of voice signal.This kind of characteristic parameter is the parameter extracted as the physiological structure of glottis, sound channel and nasal cavity based on phonatory organ.In addition, the parameter of reflection auditory properties can also be extracted as characteristic parameter.
Step S33: the characteristic parameter training preset sound model utilizing second sound signal, obtains the vocal print template corresponding with object.
The preliminary treatment mentioned in above-mentioned steps S31 can be various pretreatment mode, such as, carries out analog-to-digital conversion etc.In addition, the application can also comprise following exemplary approach:
One of exemplary approach: " framing " processes.Voice signal is nonlinear time-varying signal, has the feature of short-term stationarity, and for this reason, the preliminary treatment of the application can carry out short-time analysis to voice signal, is divided into by voice signal length to be the signal one by one of preset length.The mode that " windowing " can be taked during specific implementation to process carries out short-time analysis, such as, adopts " Hamming window function " to analyze it, can reduce the signal discontinuity at frame starting and ending place in this way.Length for each frame split can set according to the feature of voice signal under normal circumstances.Such as, generally, the voice signal of people is relatively stable in 10 ~ 30ms, therefore, frame length can be set between 10 ~ 30ms.
Exemplary approach two: end-point detection.Under normal circumstances, the voice signal received is that tool is noisy, such as comprises background noise and ambient noise, accurately can be judged starting point and the end point of effective voice signal by end-point detection, thus filtering noise.During specific implementation, can consider that the principle utilizing short-time energy realizes, in voice signal, effective voice signal (such as, voice signal) and noise signal distinct on energy, the energy of effective voice signal place signal segment is generally greater than the energy of noise signal place signal segment, therefore, first can obtain the energy of voice signal, this energy value and preset energy value are compared, according to comparative result, the part being greater than preset energy value in voice signal is defined as effective voice parts, then utilizes this effective voice parts to carry out follow-up operation.
The extraction characteristic parameter mentioned in above-mentioned steps S32.As previously mentioned, the type of characteristic parameter is different, and the mode extracting characteristic parameter there are differences.In this application, using general for Mei Er frequency coefficient as characteristic parameter, characteristic parameter extraction is carried out according to following mode:
Fourier transform is carried out to through the second pretreated second sound signal;
Filtering is carried out to the signal after Fourier transform;
Discrete cosine transform detection is carried out to filtered signal, and extracts the general coefficient of Mei Er frequency, using the general coefficient of Mei Er frequency of extraction as the characteristic parameter of second sound signal, the general coefficient of Mei Er frequency can be extracted here in the following manner and specifically comprise:
Non-linear compression is carried out to the signal carried out after discrete transform detection, obtains each bank of filters, be formulated as:
S ~ ( b ) = Σ k = 1 N - 1 | H b ( k ) | 2 | X ( k ) | 2
Wherein: b represents b Mei Er frequency range, b=1 ..., K, H bk () is the spectral window of design.
Carry out loudness to each bank of filters to solve and obtain loudness, be formulated as:
Y ( b ) = S ( b ) 3
Conversion is carried out to described loudness and obtains sense of hearing autocorrelator trace, be formulated as:
R ~ ( m ) = 1 K Σ b = 1 K Y ( b ) cos ( πbm K ) + ( - 1 ) m 2 K Y ( K )
Extract linear predictor coefficient according to sense of hearing autocorrelator trace, be formulated as:
R ~ ( m ) = Σ k = 1 p a ~ k R ~ ( | m - k | )
Calculate the general coefficient of Mei Er frequency according to described linear predictor coefficient, be formulated as:
c ~ ( m ) = a ~ m + Σ k = 1 m - 1 k c c ~ ( k ) a ~ m - k , 1 ≤ m ≤ p
Mention in above-mentioned steps S33 and utilize characteristic parameter to train preset sound model to obtain vocal print template, in actual application, the characteristic parameter of second sound signal can be utilized to train general hidden Markov model, obtain corresponding exclusive hidden Markov model unique with object, using described exclusive hidden Markov model as the vocal print template corresponding with object.Hidden Markov model is a kind of stochastic model based on transition probability and transmission probability, and it regards voice as the random process be made up of observable symbol sebolic addressing, and symbol sebolic addressing is then the output of sonification system status switch.When carrying out template and setting up, for each speaker sets up sonification model, obtain state transition probability matrix and symbol output probability matrix by training, then calculate the maximum probability of unknown voice in state migration procedure, the model corresponding according to maximum probability is adjudicated.
In the description process of above-mentioned execution mode, for preliminary treatment, the steps such as characteristic parameter extraction are described mainly for the angle of setting up of second sound signal from vocal print template, in actual application, although for the preliminary treatment of first sound tone signal, the implementation of characteristic parameter extraction etc. can be different from the preliminary treatment for second sound signal, characteristic parameter extraction process, but, in this application, preliminary treatment same or similar with above-mentioned second sound signal can be also taked by preferred pin to first sound tone signal, the mode of characteristic parameter extraction etc., particularly: carrying out the first preliminary treatment to described first sound tone signal can comprise: carry out analog-to-digital conversion to first sound tone signal, or, end-point detection is carried out to first sound tone signal, or, analog-to-digital conversion and sub-frame processing and/or end-point detection are carried out to first sound tone signal, wherein:
Can adopt in such a way the end-point detection that first sound tone signal is carried out: the energy obtaining described first sound tone signal; Described energy and preset energy value are compared, the part being greater than preset energy value in first sound tone signal is defined as effective voice parts.
Can adopt in such a way the sub-frame processing that first sound tone signal is carried out: windowing process is carried out first sound tone signal is divided into the frame signal that length is preset length to described first sound tone signal.
Specifically carry out according to following step from the characteristic parameter extracting this first sound tone signal through the first pretreated first sound tone signal:
Fourier transform is carried out to through the first pretreated first sound tone signal;
Filtering is carried out to the signal after Fourier transform;
Discrete cosine transform detection is carried out to filtered signal, and extracts the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as first sound tone signal.
The on-off control method of foregoing to the intelligent television of the application has been described in detail, correspondingly, and the embodiment that embodiment further provides the opening/closing control device of intelligent television of the application.See Fig. 4, the figure shows the composition structure of the embodiment of a kind of intelligent television opening/closing control device of the application.This device embodiment comprises: the first receiving element U41, the first pretreatment unit U42, the first extraction unit U43, match cognization unit U44 and control unit U45, wherein:
First receiving element U41, the first sound tone signal that the existing object for receiving identity to be identified sends;
First pretreatment unit U42, for carrying out the first preliminary treatment to described first sound tone signal;
First extraction unit U43, for from the characteristic parameter extracting this first sound tone signal in the first pretreated first sound tone signal;
Match cognization unit U44, characteristic parameter for the first sound tone signal by extraction mates with the vocal print template be stored in vocal print ATL, identify according to the identity of matching result to the existing object of described identity to be identified, described vocal print ATL is the set of the vocal print template that the second sound signal sent in advance according at least one object carries out training each object of obtaining corresponding;
Control unit U45, controls for the switching on and shutting down carrying out intelligent television according to the identity information identified.
The course of work of said apparatus embodiment is: the first sound tone signal that the existing object that the first receiving element U41 receives identity to be identified sends, then by the first pretreatment unit U42, the first preliminary treatment is carried out to first sound tone signal, first extraction unit U43 is from the characteristic parameter extracting this first sound tone signal through the first pretreated first sound tone signal, by match cognization unit U44, the characteristic parameter of the first sound tone signal of extraction is mated with the vocal print template be stored in vocal print ATL again, identify according to the identity of matching result to the existing object of described identity to be identified, finally, the switching on and shutting down that control unit U45 carries out intelligent television according to the identity information identified control.This device embodiment can obtain the same or similar technique effect of aforesaid embodiment of the method with the application, for avoiding repetition, and no longer superfluous words here.
In said apparatus embodiment, vocal print template can be that self sets up, also can be input to again in this device embodiment after establishing by other means directly to apply, that is, setting up parts and can be arranged among said apparatus embodiment about vocal print template, under this situation, said apparatus embodiment can comprise training unit U46, for realizing such object, the second sound signal namely for sending in advance according to object carries out training and obtains the vocal print template corresponding with object.And, training unit U46 may further include the second reception subelement U461, the second preliminary treatment subelement U462, second and extracts subelement U463, model training subelement U464, wherein: second receives subelement U461, for receiving the second sound signal that object sends; Second preliminary treatment subelement U462, for carrying out the second preliminary treatment to described second sound signal; Second extracts subelement U463, for from the characteristic parameter extracting this second sound signal in the second pretreated second sound signal; Model training subelement U464, for utilizing the characteristic parameter training preset sound model of second sound signal, obtains the vocal print template corresponding with object.
The second preliminary treatment subelement in above-mentioned training unit can carry out various preliminary treatment, and different preliminary treatment can adopt different internal structures to realize.If preprocessing process carries out analog-to-digital conversion, then the second preliminary treatment subelement can comprise the second analog-to-digital conversion subelement; If preprocessing process carries out noise reduction operation, then the second preliminary treatment subelement can comprise the second end-point detection subelement and the/the second sub-frame processing subelement, wherein: described second end-point detection subelement, for obtaining the energy of described second sound signal, described energy and preset energy value are compared, the part being greater than preset energy value in second sound signal is defined as effective voice parts; Described second sub-frame processing subelement, for carrying out windowing process to described second sound signal second sound signal to be divided into the frame signal that length is preset length.Certainly, these different pretreatment modes, can be included simultaneously enter in training unit, also according to actual techniques situation, some functional unit can be comprised, such as, only there is analog-to-digital conversion subelement, or there is analog-to-digital conversion subelement and the second end-point detection subelement.
In above-mentioned training unit second extracts subelement and may further include the second varitron unit, the second filtering subelement, the second detection sub-unit and the second parameter extraction subelement, wherein: described second varitron unit, for carrying out Fourier transform to through the second pretreated second sound signal; Described second filtering subelement, for carrying out filtering to the signal after Fourier transform; Described second detection sub-unit, for carrying out discrete cosine transform detection to filtered signal; Described second parameter extraction subelement, for extracting the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as second sound signal.Except taking this extraction characteristic parameter, those skilled in the art can also take other modes on this basis.
Said apparatus embodiment mentions that training unit can be carried out preliminary treatment, be carried out the functions such as the extraction of characteristic parameter by the second extraction subelement by the second preliminary treatment subelement, in actual applications, these functional units and the function that completes go for the first pretreatment unit U42 and fisrt feature extraction unit U43.Particularly, first pretreatment unit U42 may further include the first analog-to-digital conversion subelement U421, or the first end-point detection subelement U422, or the first analog-to-digital conversion subelement and the first sub-frame processing subelement U423 and/or the first end-point detection subelement U422, wherein: the first end-point detection subelement U422, for obtaining the energy of described first sound tone signal, described energy and preset energy value are compared, the part being greater than preset energy value in first sound tone signal is defined as effective voice parts; First sub-frame processing subelement U423, for carrying out windowing process first sound tone signal is divided into the frame signal that length is preset length to described first sound tone signal.First extraction unit U43 may further include the first varitron unit U431, the first filtering subelement U432, the first detection sub-unit U433 and the first parameter extraction subelement U434, wherein: the first varitron unit U431, for carrying out Fourier transform to through the first pretreated first sound tone signal; First filtering subelement U432, for carrying out filtering to the signal after Fourier transform; First detection sub-unit U433, for carrying out discrete cosine transform detection to filtered signal; First parameter extraction subelement U434, for extracting the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as first sound tone signal.
It should be noted that easy in order to what describe, the difference that what above-described embodiment of this specification and the various distortion implementations of embodiment stressed is all with other embodiments or mode of texturing, part identical, similar between each situation can mutually see.Especially, for the improved procedure of device embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part can see the explanation of embodiment of the method part.Each unit of device embodiment described above or can may not be and physically separates, and both can be positioned at a place, or under also can being distributed to multiple network environment.In actual application, some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme, and those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
Also what deserves to be explained is, although foregoing teachings has described spirit and the principle of the invention with reference to some embodiments, but should be appreciated that, the invention is not limited to disclosed embodiment, can not combine the feature that the division of each side does not mean that in these aspects yet, this division is only the convenience in order to state.The invention is intended to contain the interior included various amendment of spirit and scope and the equivalent arrangements of claims.

Claims (13)

1. an on-off control method for intelligent television, is characterized in that, the method comprises:
The first sound tone signal that the existing object receiving identity to be identified sends, carries out the first preliminary treatment to described first sound tone signal;
From the characteristic parameter extracting this first sound tone signal through the first pretreated first sound tone signal;
The characteristic parameter of the first sound tone signal of extraction is mated with the vocal print template be stored in vocal print ATL, identify according to the identity of matching result to the existing object of described identity to be identified, described vocal print ATL is the set of the vocal print template that the second sound signal sent in advance according at least one object carries out training each object of obtaining corresponding;
The switching on and shutting down carrying out intelligent television according to the identity information identified control.
2. method according to claim 1, is characterized in that, carries out training the vocal print template obtaining object corresponding specifically to comprise according to the second sound signal that object sends in advance:
Receive the second sound signal that object sends, the second preliminary treatment is carried out to described second sound signal;
From the characteristic parameter extracting this second sound signal through the second pretreated second sound signal;
Utilize the characteristic parameter training preset sound model of second sound signal, obtain the vocal print template corresponding with object.
3. method according to claim 2, is characterized in that, describedly carries out the second preliminary treatment to second sound signal and comprises:
Analog-to-digital conversion is carried out to second sound signal, or, end-point detection is carried out to second sound signal, or, analog-to-digital conversion and sub-frame processing and/or end-point detection are carried out to second sound signal, wherein:
Carry out end-point detection to second sound signal specifically to comprise:
Obtain the energy of described second sound signal;
Described energy and preset energy value are compared, the part being greater than preset energy value in second sound signal is defined as effective voice parts;
Carry out sub-frame processing to second sound signal specifically to comprise:
Windowing process is carried out to described second sound signal second sound signal to be divided into the frame signal that length is preset length.
4. method according to claim 2, is characterized in that, the described characteristic parameter from extracting this second sound signal through the second pretreated second sound signal specifically comprises:
Fourier transform is carried out to through the second pretreated second sound signal;
Filtering is carried out to the signal after Fourier transform;
Discrete cosine transform detection is carried out to filtered signal, and extracts the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as second sound signal.
5. method according to claim 4, is characterized in that, the general coefficient of described extraction Mei Er frequency specifically comprises:
Non-linear compression is carried out to the signal carried out after discrete transform detection, obtains each bank of filters;
Carry out loudness to each bank of filters to solve and obtain loudness;
Conversion is carried out to described loudness and obtains sense of hearing autocorrelator trace;
Linear predictor coefficient is extracted according to sense of hearing autocorrelator trace;
The general coefficient of Mei Er frequency is calculated according to described linear predictor coefficient.
6. method according to claim 2, is characterized in that, the described characteristic parameter training preset sound model utilizing second sound signal, obtains the vocal print template corresponding with object and specifically comprise:
Utilize the characteristic parameter of second sound signal to train general hidden Markov model, obtain corresponding exclusive hidden Markov model unique with object, using described exclusive hidden Markov model as the vocal print template corresponding with object.
7. method according to claim 1, is characterized in that, describedly carries out the first preliminary treatment to first sound tone signal and specifically comprises:
Analog-to-digital conversion is carried out to first sound tone signal, or, end-point detection is carried out to first sound tone signal, or, analog-to-digital conversion and sub-frame processing and/or end-point detection are carried out to first sound tone signal, wherein:
Carry out end-point detection to first sound tone signal specifically to comprise:
Obtain the energy of described first sound tone signal;
Described energy and preset energy value are compared, the part being greater than preset energy value in first sound tone signal is defined as effective voice parts;
Carry out sub-frame processing to first sound tone signal specifically to comprise:
Windowing process is carried out first sound tone signal is divided into the frame signal that length is preset length to described first sound tone signal;
The described characteristic parameter from extracting this first sound tone signal through the first pretreated first sound tone signal specifically comprises:
Fourier transform is carried out to through the first pretreated first sound tone signal;
Filtering is carried out to the signal after Fourier transform;
Discrete cosine transform detection is carried out to filtered signal, and extracts the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as first sound tone signal.
8. an opening/closing control device for intelligent television, is characterized in that, this device comprises: the first receiving element, the first pretreatment unit, the first extraction unit, match cognization unit and control unit, wherein:
Described first receiving element, the first sound tone signal that the existing object for receiving identity to be identified sends;
Described first pretreatment unit, for carrying out the first preliminary treatment to described first sound tone signal;
Described first extraction unit, for from the characteristic parameter extracting this first sound tone signal in the first pretreated first sound tone signal;
Described match cognization unit, characteristic parameter for the first sound tone signal by extraction mates with the vocal print template be stored in vocal print ATL, identify according to the identity of matching result to the existing object of described identity to be identified, described vocal print ATL is the set of the vocal print template that the second sound signal sent in advance according at least one object carries out training each object of obtaining corresponding;
Described control unit, controls for the switching on and shutting down carrying out intelligent television according to the identity information identified.
9. want the device described in 8 according to right, it is characterized in that, described device also comprises training unit, second sound signal for sending in advance according to object carries out training and obtains vocal print template corresponding to object, described training unit comprises the second reception subelement, the second preliminary treatment subelement, second extracts subelement, model training subelement, wherein:
Described second receives subelement, for receiving the second sound signal that object sends;
Described second preliminary treatment subelement, for carrying out the second preliminary treatment to described second sound signal;
Described second extracts subelement, for from the characteristic parameter extracting this second sound signal in the second pretreated second sound signal;
Described model training subelement, for utilizing the characteristic parameter training preset sound model of second sound signal, obtains the vocal print template corresponding with object.
10. device according to claim 9, it is characterized in that, the second preliminary treatment subelement comprises the second analog-to-digital conversion subelement, or the second end-point detection subelement, or the second analog-to-digital conversion subelement and the second sub-frame processing subelement and/or the second end-point detection subelement, wherein:
Described second end-point detection subelement, for obtaining the energy of described second sound signal, comparing described energy and preset energy value, the part being greater than preset energy value in second sound signal is defined as effective voice parts;
Described second sub-frame processing subelement, for carrying out windowing process to described second sound signal second sound signal to be divided into the frame signal that length is preset length.
11. devices according to claim 9, is characterized in that, described second extracts subelement comprises the second varitron unit, the second filtering subelement, the second detection sub-unit and the second parameter extraction subelement, wherein:
Described second varitron unit, for carrying out Fourier transform to through the second pretreated second sound signal;
Described second filtering subelement, for carrying out filtering to the signal after Fourier transform;
Described second detection sub-unit, for carrying out discrete cosine transform detection to filtered signal;
Described second parameter extraction subelement, for extracting the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as second sound signal.
12. devices according to claim 8, it is characterized in that, described first pretreatment unit comprises the first analog-to-digital conversion subelement, or the first end-point detection subelement, or the first analog-to-digital conversion subelement and the first sub-frame processing subelement and/or the first end-point detection subelement, wherein:
Described first end-point detection subelement, for obtaining the energy of described first sound tone signal, compares described energy and preset energy value, the part being greater than preset energy value in first sound tone signal is defined as effective voice parts;
Described first sub-frame processing subelement, for carrying out windowing process first sound tone signal is divided into the frame signal that length is preset length to described first sound tone signal.
13. devices according to claim 8, is characterized in that, described first extraction unit comprises the first varitron unit, the first filtering subelement, the first detection sub-unit and the first parameter extraction subelement, wherein:
Described first varitron unit, for carrying out Fourier transform to through the first pretreated first sound tone signal;
Described first filtering subelement, for carrying out filtering to the signal after Fourier transform;
Described first detection sub-unit, for carrying out discrete cosine transform detection to filtered signal;
Described first parameter extraction subelement, for extracting the general coefficient of Mei Er frequency, using the characteristic parameter of the general coefficient of Mei Er frequency of extraction as first sound tone signal.
CN201510020151.8A 2015-01-15 2015-01-15 Smart television switching control method and device thereof Pending CN104853236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510020151.8A CN104853236A (en) 2015-01-15 2015-01-15 Smart television switching control method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510020151.8A CN104853236A (en) 2015-01-15 2015-01-15 Smart television switching control method and device thereof

Publications (1)

Publication Number Publication Date
CN104853236A true CN104853236A (en) 2015-08-19

Family

ID=53852531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510020151.8A Pending CN104853236A (en) 2015-01-15 2015-01-15 Smart television switching control method and device thereof

Country Status (1)

Country Link
CN (1) CN104853236A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847261A (en) * 2016-12-23 2017-06-13 伟乐视讯科技股份有限公司 The Voiceprint Recognition System and method for recognizing sound-groove of a kind of Set Top Box
CN107393539A (en) * 2017-07-17 2017-11-24 傅筱萸 A kind of sound cipher control method
CN108718419A (en) * 2018-03-22 2018-10-30 江苏大丰和顺电子有限公司 A kind of television set intelligently remote control and its working method based on voiceprint lock
CN110689892A (en) * 2018-07-06 2020-01-14 上海博泰悦臻网络技术服务有限公司 Exclusive control method and system based on voiceprint, storage medium and vehicle-mounted terminal
CN111613229A (en) * 2020-05-13 2020-09-01 深圳康佳电子科技有限公司 Voiceprint control method of television loudspeaker box, storage medium and smart television
CN113506550A (en) * 2021-07-29 2021-10-15 北京花兰德科技咨询服务有限公司 Artificial intelligent reading display and display method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403953A (en) * 2002-09-06 2003-03-19 浙江大学 Palm acoustic-print verifying system
CN1567431A (en) * 2003-07-10 2005-01-19 上海优浪信息科技有限公司 Method and system for identifying status of speaker
CN101938610A (en) * 2010-09-27 2011-01-05 冠捷显示科技(厦门)有限公司 Novel voiceprint recognition-based television device
CN202679498U (en) * 2012-06-14 2013-01-16 青岛海尔电子有限公司 Television set capable of controlling switching with voice
CN103686292A (en) * 2013-12-27 2014-03-26 乐视致新电子科技(天津)有限公司 Startup and shutdown method and device for intelligent electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403953A (en) * 2002-09-06 2003-03-19 浙江大学 Palm acoustic-print verifying system
CN1567431A (en) * 2003-07-10 2005-01-19 上海优浪信息科技有限公司 Method and system for identifying status of speaker
CN101938610A (en) * 2010-09-27 2011-01-05 冠捷显示科技(厦门)有限公司 Novel voiceprint recognition-based television device
CN202679498U (en) * 2012-06-14 2013-01-16 青岛海尔电子有限公司 Television set capable of controlling switching with voice
CN103686292A (en) * 2013-12-27 2014-03-26 乐视致新电子科技(天津)有限公司 Startup and shutdown method and device for intelligent electronic device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A. REVATHI ; Y. VENKATARAMANI: "Text Independent Composite Speaker Identification/Verification Using Multiple Features", 《2009 WRI WORLD CONGRESS ON COMPUTER SCIENCE AND INFORMATION ENGINEERING》 *
PC WOODL , MJF GALES , D PYE , SJ YOUNG: "THE DEVELOPMENT OF THE 1996 HTK BROADCAST NEWS TRANSCRIPTION SYSTEM", 《DARPA SPEECH RECOGNITION WORKSHOP》 *
ZHU, XUAN CHEN. YINING LIU, JIA LIU, RUNSHENG: "FEATURE SELECTION IN MANDARIN LARGE VOCABULARY", 《2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847261A (en) * 2016-12-23 2017-06-13 伟乐视讯科技股份有限公司 The Voiceprint Recognition System and method for recognizing sound-groove of a kind of Set Top Box
CN107393539A (en) * 2017-07-17 2017-11-24 傅筱萸 A kind of sound cipher control method
CN108718419A (en) * 2018-03-22 2018-10-30 江苏大丰和顺电子有限公司 A kind of television set intelligently remote control and its working method based on voiceprint lock
CN110689892A (en) * 2018-07-06 2020-01-14 上海博泰悦臻网络技术服务有限公司 Exclusive control method and system based on voiceprint, storage medium and vehicle-mounted terminal
CN111613229A (en) * 2020-05-13 2020-09-01 深圳康佳电子科技有限公司 Voiceprint control method of television loudspeaker box, storage medium and smart television
CN113506550A (en) * 2021-07-29 2021-10-15 北京花兰德科技咨询服务有限公司 Artificial intelligent reading display and display method
CN113506550B (en) * 2021-07-29 2022-07-05 北京花兰德科技咨询服务有限公司 Artificial intelligent reading display and display method

Similar Documents

Publication Publication Date Title
CN104853236A (en) Smart television switching control method and device thereof
CN102005070A (en) Voice identification gate control system
CN107886967B (en) A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
CN107112026A (en) System, the method and apparatus for recognizing and handling for intelligent sound
CN107799126A (en) Sound end detecting method and device based on Supervised machine learning
CN105448303A (en) Voice signal processing method and apparatus
CN112102846B (en) Audio processing method and device, electronic equipment and storage medium
CN104575504A (en) Method for personalized television voice wake-up by voiceprint and voice identification
CN103236260A (en) Voice recognition system
CN108198569A (en) A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN110992932B (en) Self-learning voice control method, system and storage medium
CN108597505A (en) Audio recognition method, device and terminal device
CN111105796A (en) Wireless earphone control device and control method, and voice control setting method and system
CN104123930A (en) Guttural identification method and device
WO2019075829A1 (en) Voice translation method and apparatus, and translation device
CN107910006A (en) Audio recognition method, device and multiple source speech differentiation identifying system
CN106782498A (en) Voice messaging player method, device and terminal
CN104851423A (en) Sound message processing method and device
JP2000349865A (en) Voice communication apparatus
CN110556114B (en) Speaker identification method and device based on attention mechanism
WO2018001125A1 (en) Method and device for audio recognition
CN106971712A (en) A kind of adaptive rapid voiceprint recognition methods and system
CN104240705A (en) Intelligent voice-recognition locking system for safe box
CN106981287A (en) A kind of method and system for improving Application on Voiceprint Recognition speed
CN113012694A (en) Light-life voice recognition control system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150819

RJ01 Rejection of invention patent application after publication