CN102568478B - Video play control method and system based on voice recognition - Google Patents

Video play control method and system based on voice recognition Download PDF

Info

Publication number
CN102568478B
CN102568478B CN201210025924.8A CN201210025924A CN102568478B CN 102568478 B CN102568478 B CN 102568478B CN 201210025924 A CN201210025924 A CN 201210025924A CN 102568478 B CN102568478 B CN 102568478B
Authority
CN
China
Prior art keywords
user
phonetic feature
voice
feature
uid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210025924.8A
Other languages
Chinese (zh)
Other versions
CN102568478A (en
Inventor
吴昊宇
邓龙
姚键
邱丹
潘柏宇
卢述奇
刘睿姝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Alibaba Music Technology Co Ltd
Original Assignee
1Verge Internet Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 1Verge Internet Technology Beijing Co Ltd filed Critical 1Verge Internet Technology Beijing Co Ltd
Priority to CN201210025924.8A priority Critical patent/CN102568478B/en
Publication of CN102568478A publication Critical patent/CN102568478A/en
Application granted granted Critical
Publication of CN102568478B publication Critical patent/CN102568478B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The utility model discloses a video control method based on voice recognition. The method comprises the steps of training the voice of a user so as to extract voice features and storing the voice features in a voice feature library; receiving a voice control command of the user and comparing the voice control command of the user with the stored user voice features; when the user voice features are matched with the user voice features in a server, extracting the voice control command and controlling the video play based on the voice control command. After the technical scheme is adopted, the technical defect in the prior art that voice recognition is applied to a single machine or software for the features need to be downloaded is overcome; in addition, as the voice features of the application are stored in the voice feature library based on a specific person, the effect of recognizing based on the voice of the specific person can be realized; and moreover, the method is high in accuracy when being used for voice recognition and control. Furthermore, the invention also discloses a video control system based on voice recognition.

Description

A kind of video playing control method based on speech recognition and system
Technical field
The present invention relates to a kind of video control method, particularly relate to a kind of video playing control method based on speech recognition, belong to field of speech recognition.
Background technology
At present, the task of Computer Distance Education is computing machine can be understood statement that the mankind speak or order, and makes corresponding action.
Wherein, from the seventies in last century, speech recognition technology of computer achieves breakthrough progress in research.Present speech recognition technology of computer is all widely used in every field, such as speech recognition dialing, phonetic search, Voice command etc.But all there are some problems in existing Computer Distance Education system.Because Computer Distance Education needs to carry out a large amount of calculating, so it is all the calculating being applied to unit substantially that existing Computer Distance Education calculates, or need download and the task that specific software just can carry out speech recognition is installed, not and Internet technology well combine.The speech recognition system that operating system carries can only complete specific simple task, with other program, or is not connected with internet, applications, can not adapts to the demand of the fast development of current internet.
Because the language of the mankind is varied, and the pronunciation of the different people of same word is also different, Computer Distance Education is from the degree of dependence of the voice to people, and the mode set up according to acoustic model divides, and can be divided into specific people discern and signer-independent sign language recognition system.
Summary of the invention
The present invention is directed to the shortcoming of prior art, provide a kind of video playing control method based on speech recognition, the method can have video control effects more flexibly.In addition, the invention also discloses a kind of video playback control system based on speech recognition.
According to the first object of the present invention, the invention provides a kind of video playing control method based on speech recognition, comprising:
Carry out training to the voice of user extract phonetic feature and be kept in phonetic feature storehouse;
Receive the voice control command of user, contrast with the user vocal feature of described preservation;
Wherein, after the user vocal feature in the phonetic feature and server of this user matches, extract this voice control command and carry out the control of video playback based on this voice control command.
Further, preferred method is, describedly carries out training to user speech and extracts phonetic feature being kept in phonetic feature storehouse, specifically comprises:
Calculate the parameters,acoustic of voice of user, extract the key characterization parameter that can reflect phonic signal character and carry out dimensionality reduction;
Obtain the training utterance of the several times control command of user's input;
After pre-service and phonetic feature, obtain the speech characteristic vector parameter of specific user and be stored in the phonetic feature storehouse in the webserver.
Further, preferred method is, described key characterization parameter adopts MFCC parameter.
Further, preferred method is, the voice control command of described reception user, contrasts with the user vocal feature of described preservation, specifically comprise:
Each instruction voice feature in storing in the voice control command of follow-up for user input and server is carried out similarity measurement, judges whether the voice control command of user mates the feature in phonetic feature storehouse.
Further, preferred method is, described video control method broadcasts spigot based on FLASH, wherein, also comprises:
Complete the identification step of corresponding user speech control command in 10 seconds, carry out corresponding video control action returning successfully.
After this invention takes technique scheme, overcome the technical disadvantages that speech recognition in prior art is all the software being applied to unit or necessary download features; Further, the phonetic feature due to the application is kept in phonetic feature storehouse, can realize the effect of the speech recognition of feature based people, and this kind of method carries out speech recognition and control, and its accuracy rate is higher.
According to another object of the present invention, the invention provides a kind of video playback control system based on speech recognition, comprising:
Phonetic feature training unit, extracts phonetic feature for carrying out training to the voice of user and is kept in phonetic feature storehouse;
Phonetic feature recognition unit, for receiving the voice control command of user, contrasts with the user vocal feature of described preservation;
Video control unit, after matching, extracts this voice control command and carries out the control of video playback based on this voice control command for the user vocal feature in the phonetic feature and server of this user.
Further, preferably, described phonetic feature training unit, specifically comprises:
Characteristic parameter extraction subelement, for calculating the parameters,acoustic of the voice of user, extracting the key characterization parameter that can reflect phonic signal character and carrying out dimensionality reduction;
Characteristic parameter training subelement, for obtaining the training utterance of the several times control command of user's input; After pre-service and phonetic feature, obtain the speech characteristic vector parameter of specific user;
Send subelement, for above-mentioned speech characteristic vector parameter being stored in the phonetic feature storehouse in the webserver.
Further, preferably, described key characterization parameter adopts MFCC parameter.
Further, preferably, described phonetic feature recognition unit, specifically comprises:
Contrast subunit, for each instruction voice feature in storing in the voice control command of follow-up for user input and server is carried out similarity measurement, judges whether the voice control command of user mates the feature in phonetic feature storehouse.
Further, preferably, described video control unit, also comprises:
FLASH player subelement;
Player controls subelement, for completing the identification of corresponding user speech control command in 10 seconds, carries out corresponding video control action returning successfully.
After this invention takes technique scheme, have all advantages of preceding method, namely the application overcomes the technical disadvantages that speech recognition in prior art is all the software being applied to unit or necessary download features; Further, the phonetic feature due to the application is kept in phonetic feature storehouse, can realize the effect of the speech recognition of feature based people, and this kind of method carries out speech recognition and control, and its accuracy rate is higher.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the present invention is described in detail, to make above-mentioned advantage of the present invention definitely.
Fig. 1 is the schematic flow sheet of the video playing control method that the present invention is based on speech recognition;
Fig. 2 is the schematic diagram carrying out voice and video in one embodiment of the present of invention;
Fig. 3 is the schematic diagram carrying out voice training in one embodiment of the present of invention;
Fig. 4 is the schematic flow sheet carrying out the control of speech recognition video of one embodiment of the present of invention;
Fig. 5 is the schematic flow sheet carrying out the control of speech recognition video of an alternative embodiment of the invention;
Fig. 6 is the structural representation of the video playback control system that the present invention is based on speech recognition;
Fig. 7 is the schematic diagram of the phonetic feature training unit in one embodiment of the present of invention;
Fig. 8 is the configuration diagram of the phonetic feature training unit of one embodiment of the present of invention;
Fig. 9 is the schematic diagram of the phonetic feature recognition unit of one embodiment of the present of invention;
Figure 10 is the schematic diagram of the video control unit of one embodiment of the present of invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
embodiment of the method one
Below in conjunction with accompanying drawing, a detailed description is carried out to the present invention;
Wherein, Fig. 1 is the schematic flow sheet of the video playing control method that the present invention is based on speech recognition, and Fig. 2 is the schematic diagram carrying out voice and video in one embodiment of the present of invention;
According to the present embodiment, the described video playing control method based on speech recognition, comprising:
S101: training is carried out to the voice of some users and extracts phonetic feature;
S102: the phonetic feature of above-mentioned specific user is kept in phonetic feature storehouse;
S103: the voice control command receiving user;
S014: the user vocal feature of the voice control command of the user received and described preservation is contrasted;
S015: after the user vocal feature in the phonetic feature and server of this user matches, extracts this voice control command and carries out the control of video playback based on this voice control command.
Wherein, in step s 102, can be kept in phonetic feature storehouse based on user name with account together with concrete phonetic feature, wherein, in a preferred embodiment, this phonetic feature storehouse is the database in an Internet Server.
Further, step S103 comprises:
Each instruction voice feature in storing in the voice control command of follow-up for user input and server is carried out similarity measurement, judges whether the voice control command of user mates the feature in phonetic feature storehouse.
Wherein, the video control method described in the application, based on FLASH player, wherein, also comprises:
Complete the identification step of corresponding user speech control command in 10 seconds, carry out corresponding video control action returning successfully.
After this invention takes technique scheme, overcome the technical disadvantages that speech recognition in prior art is all the software being applied to unit or necessary download features; Further, the phonetic feature due to the application is kept in phonetic feature storehouse, can realize the effect of the speech recognition of feature based people, and this kind of method carries out speech recognition and control, and its accuracy rate is higher.
embodiment of the method two:
The present invention is further described, and wherein, the application mainly comprises: phonetic feature training step, phonetic feature identification step and video rate-determining steps, is described in detail respectively below to above-mentioned three steps of the present invention.
Fig. 3 is the schematic diagram carrying out voice training in one embodiment of the present of invention;
As shown in Figure 3, described method mainly comprises the following steps:
Some specific registered users open webpage, can show a speech recognition FLASH in webpage, and this FLASH technology is that prior art is comparatively known, does not describe in detail at this.
Wherein, when system get this user do not carry out phonetic feature training time, it can point out user to carry out voice training, otherwise directly enters next step;
Wherein, system can provide some basic words, such as: start, suspend, play, improve volume, F.F. etc., user carries out phonetic feature training according to above-mentioned prompting.
Wherein, in phonetic feature training step, comprising:
In the speech feature extraction stage: the parameters,acoustic calculating voice, carry out the calculating of phonetic feature, extract the key characterization parameter that can reflect phonic signal character, realize dimensionality reduction.
Wherein, in speech recognition technology, what take is MFCC and DTW technology, wherein, MFCC (Mel Frequency Cepstrum Coefficient, Mel frequency cepstral coefficient), be in the frequency-domain analysis of audio frequency, the most frequently used a kind of characteristic coefficient, applies also extensive.Its feature is the nonlinear characteristic taken into full account in the auditory system of people, uses linear graduation, use logarithmic scale at high frequencies in low frequency situation.Therefore, sound signal can be carried out more rational segmentation by MFCC.For a section audio, n group (n corresponds to sound frame number) MFCC parameter just can be obtained.Speech recognition process afterwards, just can use this n group parameter to process.
In isolated word recognition system, DTW (Dynamic Time Warping, dynamic time consolidation) is the algorithm commonly used the most, and it uses the thought of dynamic programming, solving the pronunciation template matches problem brought different in size, is a kind of comparatively classical algorithm in speech recognition.First DTW algorithm needs to train the template corresponding to isolated word to be identified.First DTW algorithm needs to train the template corresponding to isolated word to be identified.Between training sample, length is also different.Therefore template how is selected also to be a problem that must consider.
Common way is, first calculates the average length of audio sample, then using the sample closest to average length as template, using other sample as training sample, be used for train, adjustment template occurrence.Finally for the sample of length same with template, just can calculate similarity and distance, carry out identifying operation.
In the application, what mainly take is MFCC parameter, by means of this MFCC parameter, and the noise immunity that the phonetic feature of its entirety is good and robustness.
Training stage: user inputs several times training utterance, system, through pre-service and two stages of speech feature extraction, obtains the character vector of specific user.
Finally, webpage can point out user whether to upload this phonetic feature, and according to this prompting, user selects the phonetic feature of oneself to be uploaded in special sound feature database or local computing.
After the phonetic feature that trained user, user just can carry out the subsequent step such as speech recognition and video control.
embodiment of the method three:
Wherein, described speech recognition steps comprises:
Receive the voice of user's input;
Each instruction voice feature in storing in the voice control command of follow-up for user input and phonetic feature storehouse is carried out similarity measurement;
According to both similarity size to judge whether the voice control command of user mates the feature in phonetic feature storehouse.
In one embodiment, user, in viewing process, needs to click specific voice operating button; Fig. 4 is the schematic flow sheet carrying out the control of speech recognition video of one embodiment of the present of invention;
Wherein, after clicking operation button, in special time, such as, say voice control command within 10 seconds, the operational order said within these 10 seconds is considered to effective, and identifies, mates corresponding operational order, and makes a response.
In addition, in one embodiment, in viewing process, need first to say certain function word in classical Chinese writings, such as " beginning " facing to microphone, Fig. 5 is the schematic flow sheet carrying out the control of speech recognition video of an alternative embodiment of the invention;
Wherein, speech recognition program identification function word in classical Chinese writings after, in special time, such as, say voice control command in 10 seconds, the operational order said within these 10 seconds is considered to effective, and identify, mate corresponding operational order, and make a response.
Further, after speech recognition program identifies function word in classical Chinese writings 10 seconds, if do not identify voice control command, so again enter loitering phase, at this time need again to say function word in classical Chinese writings to microphone, just can carry out Voice command afterwards.
By technique scheme, solve among speech recognition process, due to the microphone of speech recognition program monitoring users all the time, avoid user in the process of viewing video, because some maloperation makes viewing experience bad, there is good technique effect.
In addition, due to after the phonetic feature of server stores user, next time, user was at other computer, or mobile device opens speech recognition program again, without the need to again training, but with the phonetic feature preserved, carry out speech recognition and video player is controlled, and then make the application carry out Voice command based on particular person, overcome the shortcoming that multiple client cannot be applied.
Such as, a certain user completes voice training and by after training the phonetic feature that obtains to upload onto the server, later in the machine, his machine or mobile device use this speech recognition flash program, without the need to retraining, two kinds of the direct selective recognition stage start speech recognition operation, again identify and and then realize Voice command.
Wherein, among the application, employ the widely used flash technology in internet, there is coverage rate high, convenient propagation, be easy to use, the features such as multiple terminals cooperation.Certainly, also can take the HTML5 technology of Microsoft, these are all that those skilled in the art can know, and do not describe in detail at this.
embodiment of the method four:
Below application example of the present invention is described:
1. the UID=1 of user A, he has downloaded the speech recognition flash program that webpage is pointed out first, the phonetic feature of the user of UID=1 had not been set up in particular person phonetic feature storehouse, prompting user just must can use speech identifying function after voice training, and provide voice training operation indicating, train rear user A that speech recognition can be used to carry out Voice command to video.
2. the UID=1 of user A, he completes voice training, later no matter in the machine, his machine or mobile device are wanted to realize speech identifying function, only need download or open flash speech recognition extender, without the need to again carrying out voice training, direct opening voice recognition function.If user adopts the mode 1 of speech recognition period, click START button and in 10 seconds, provide instruction " broadcasting ", system completes speech recognition and then makes the reaction of " broadcasting " video, as user also has other instruction then to need again to click START button, in 10 seconds, provide steering order; According to mode 2, provide function word in classical Chinese writings " beginning ", wait for that user provides subsequent instructions 10 seconds, if user provides instruction " broadcasting " in 10 seconds, and then make a response, afterwards System recover wait for user provide function word in classical Chinese writings state, as user also have other instruction then to need that function word in classical Chinese writings is described again after provide subsequent instructions again.
3. user B attempts to use the ID of user A to carry out speech recognition, provide instruction after click starts to play, the phonetic feature of server search UID=1, find that the phonetic feature of this phonetic order does not mate with the phonetic feature of UID=1 in special sound feature database, then provide information, prompting user registers or logs in the account of oneself, then carries out speech recognition operation.
In conjunction with foregoing description, be described in detail as follows to technological merit of the present invention:
1. coverage rate is high, refers to that the browser of 99% is equipped with flash plug-in unit, and present mobile device much also all supports flash plug-in unit, just can extensively dispose without the need to special support.
2. this speech recognition schemes does not need to install specific program to facilitate propagation to refer to, only needs automatically to download speech recognition program, just can use in the enterprising enforcement of flash.
3. be easy to use the Voice command referred to for Online Video, voice recognition instruction is simple, can be realized specific video playback controlling functions by a small amount of voice.
4. multiple terminals supports it is by the phonetic feature of server record user, after having changed computer or mobile device, just can carry out Voice command without the need to again training.
system embodiment one:
Be described in detail to system of the present invention below in conjunction with accompanying drawing, wherein, Fig. 6 is the structural representation of the video playback control system that the present invention is based on speech recognition;
As shown in Figure 6, the described video control system based on speech recognition, comprising:
Phonetic feature training unit, extracts phonetic feature for carrying out training to the voice of user and is kept in phonetic feature storehouse;
Phonetic feature recognition unit, for receiving the voice control command of user, contrasts with the user vocal feature of described preservation;
Video control unit, after matching, extracts this voice control command and carries out the control of video playback based on this voice control command for the user vocal feature in the phonetic feature and server of this user.
Fig. 7 is the schematic diagram of the phonetic feature training unit in one embodiment of the present of invention; Fig. 8 is the configuration diagram of the phonetic feature training unit of one embodiment of the present of invention;
Described phonetic feature training unit, specifically comprises:
Characteristic parameter extraction subelement, for calculating the parameters,acoustic of the voice of user, extracting the key characterization parameter that can reflect phonic signal character and carrying out dimensionality reduction;
Characteristic parameter training subelement, for obtaining the training utterance of the several times control command of user's input; After pre-service and phonetic feature, obtain the speech characteristic vector parameter of specific user;
Send subelement, for above-mentioned speech characteristic vector parameter being stored in the phonetic feature storehouse in the webserver.
Wherein, described key characterization parameter adopts MFCC parameter.
Fig. 9 is the schematic diagram of the phonetic feature recognition unit of one embodiment of the present of invention;
Described phonetic feature recognition unit, specifically comprises:
Contrast subunit, for each instruction voice feature in storing in the voice control command of follow-up for user input and phonetic feature storehouse is carried out similarity measurement, judges whether the voice control command of user mates the feature in phonetic feature storehouse.
Figure 10 is the schematic diagram of the video control unit of one embodiment of the present of invention.
As shown in Figure 10, described video control unit, also comprises:
FLASH player subelement;
Player controls subelement, for completing the identification of corresponding user speech control command in 10 seconds, carries out corresponding video control action returning successfully.
The application overcomes the technical disadvantages that speech recognition in prior art is all the software being applied to unit or necessary download features; Further, the phonetic feature due to the application is kept in phonetic feature storehouse, can realize the effect of the speech recognition of feature based people, and this kind of method carries out speech recognition and control, and its accuracy rate is higher.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in a computer read/write memory medium, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM (read-only memory) (Read Only Memory, be called for short ROM), random access memory (Random Acess Memory, be called for short RAM), magnetic disc, terminal phone software or CD etc. various can be program code stored medium.
Last it is noted that the foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, although with reference to previous embodiment to invention has been detailed description, for a person skilled in the art, it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (6)

1., based on a video control method for speech recognition, comprising:
When user does not carry out phonetic feature training, training is carried out to the voice of user and extracts phonetic feature, and described UID preserved together with concrete phonetic feature in phonetic feature storehouse in the server based on the UID of user, wherein, this phonetic feature storehouse is the database in an Internet Server;
After the phonetic feature of UID and user described in phonetic feature library storage, receive the voice control command that the user with described UID inputs on the machine, his machine or mobile device, the user vocal feature preserved with described phonetic feature storehouse contrasts;
Wherein, after the user vocal feature in the phonetic feature and server of this user matches, extract this voice control command and carry out the control of video playback based on this voice control command;
Described carry out training to user speech extract phonetic feature and described UID is kept in phonetic feature storehouse by UID based on user together with concrete phonetic feature, specifically comprise:
Calculate the parameters,acoustic of voice of user, extract the key characterization parameter that can reflect phonic signal character and carry out dimensionality reduction;
Obtain the training utterance of the several times control command of user's input;
After pre-service and phonetic feature, obtain the speech characteristic vector parameter of specific user and be stored in the phonetic feature storehouse in the webserver together with the UID of user;
The voice control command that the described user with described UID inputs on the machine, his machine or mobile device, contrasts with the user vocal feature of described preservation, specifically comprises:
Each instruction voice feature corresponding with the UID of this user stored in phonetic feature storehouse for the voice control command of follow-up for the user with described UID input is carried out similarity measure, judges whether the voice control command of user mates the feature in phonetic feature storehouse;
Wherein, user, in viewing process, needs first to say certain function word in classical Chinese writings facing to microphone;
Speech recognition program is after identification function word in classical Chinese writings, and the operational order in special time is considered to effective, and identifies, mates corresponding operational order, and makes a response;
Wherein, after speech recognition program identification function word in classical Chinese writings special time, if do not identify voice control command, so again carry out loitering phase, at this time need again to say function word in classical Chinese writings to microphone, just can carry out Voice command afterwards;
Wherein, when user B attempts to use the UID of user A to carry out speech recognition, provide instruction after click starts to play, the phonetic feature of the UID of server search user A, find that the phonetic feature of the user A that the phonetic feature of this phonetic order is corresponding with UID in special sound feature database does not mate, then provide information, prompting user B registers or logs in the account of oneself, then carries out the operation of speech recognition.
2. the video control method based on speech recognition according to claim 1, is characterized in that, described key characterization parameter adopts MFCC parameter.
3. the video control method based on speech recognition according to claim 1, is characterized in that, described video control method, based in FLASH player, wherein, also comprises:
Complete the identification step of corresponding user speech control command in 10 seconds, carry out corresponding video control action returning successfully.
4., based on a video control system for speech recognition, comprising:
Phonetic feature training unit, during for not carrying out phonetic feature training as user, training is carried out to the voice of user and extracts phonetic feature, and described UID preserved together with concrete phonetic feature in phonetic feature storehouse in the server based on the UID of user, wherein, this phonetic feature storehouse is the database in an Internet Server;
Phonetic feature recognition unit, for after the phonetic feature of UID and user described in phonetic feature library storage, receive the voice control command that the user with described UID inputs on the machine, his machine or mobile device, the user vocal feature preserved with described phonetic feature storehouse contrasts;
Wherein, after the user vocal feature in the phonetic feature and server of this user matches, extract this voice control command and carry out the control of video playback based on this voice control command;
Described phonetic feature training unit, specifically comprises:
Characteristic parameter extraction subelement, for calculating the parameters,acoustic of the voice of user, extracting the key characterization parameter that can reflect phonic signal character and carrying out dimensionality reduction;
Feature see training subelement, for obtaining the training utterance of several times control command of user's input; After pre-service and phonetic feature, obtain the speech characteristic vector parameter of specific user;
Described phonetic feature recognition unit, specifically comprises:
Contrast subunit, for each instruction voice feature corresponding with the UID of this user stored in phonetic feature storehouse for the voice control command of follow-up for the user with described UID input is carried out similarity measure, judge whether the voice control command of user mates the feature in phonetic feature storehouse;
Send subelement, for above-mentioned speech characteristic vector parameter being stored in the phonetic feature storehouse in the webserver together with the UID of user, wherein, user, in viewing process, needs first to say certain function word in classical Chinese writings facing to microphone;
Speech recognition program is after identification function word in classical Chinese writings, and the operational order in special time is considered to effective, and identifies, mates corresponding operational order, and makes a response;
Wherein, after speech recognition program identification function word in classical Chinese writings special time, if do not identify voice control command, so again enter loitering phase, at this time need again to say function word in classical Chinese writings to microphone, just can carry out Voice command afterwards;
Wherein, when user B attempts to use the UID of user A to carry out speech recognition, provide instruction after click starts to play, the phonetic feature of the UID of server search user A, find that the phonetic feature of the user A that the phonetic feature of this phonetic order is corresponding with UID in special sound feature database does not mate, then provide information, prompting user B registers or logs in the account of oneself, then carries out the operation of speech recognition.
5. the video control system based on speech recognition according to claim 4, is characterized in that, described key characterization parameter adopts MFCC parameter.
6. the video control system based on speech recognition according to claim 4, described video control unit also comprises:
FLASH player subelement;
Player controls subelement, for completing the identification step of corresponding user speech control command in 10 seconds, carries out corresponding video control action returning successfully.
CN201210025924.8A 2012-02-07 2012-02-07 Video play control method and system based on voice recognition Expired - Fee Related CN102568478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210025924.8A CN102568478B (en) 2012-02-07 2012-02-07 Video play control method and system based on voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210025924.8A CN102568478B (en) 2012-02-07 2012-02-07 Video play control method and system based on voice recognition

Publications (2)

Publication Number Publication Date
CN102568478A CN102568478A (en) 2012-07-11
CN102568478B true CN102568478B (en) 2015-01-07

Family

ID=46413734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210025924.8A Expired - Fee Related CN102568478B (en) 2012-02-07 2012-02-07 Video play control method and system based on voice recognition

Country Status (1)

Country Link
CN (1) CN102568478B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103543930A (en) * 2012-07-13 2014-01-29 腾讯科技(深圳)有限公司 E-book operating and controlling method and device
CN102880392B (en) * 2012-09-29 2017-08-11 广东欧珀移动通信有限公司 A kind of method, device and mobile terminal of browsing pictures
CN103778915A (en) * 2012-10-17 2014-05-07 三星电子(中国)研发中心 Speech recognition method and mobile terminal
CN103839547A (en) * 2012-11-27 2014-06-04 英业达科技有限公司 System for loading corresponding instruction elements by comparing voice operation signals and method thereof
CN103366744B (en) * 2013-07-04 2015-10-14 三星半导体(中国)研究开发有限公司 Based on the method and apparatus of Voice command portable terminal
CN103426429B (en) * 2013-07-15 2017-04-05 三星半导体(中国)研究开发有限公司 Sound control method and device
CN104423980B (en) * 2013-08-26 2018-12-14 联想(北京)有限公司 Information processing method and information processing equipment
CN104699676B (en) * 2013-12-04 2019-03-26 中国电信股份有限公司 Information search method and system based on speech recognition
CN104754261A (en) * 2013-12-26 2015-07-01 深圳市快播科技有限公司 Projection equipment and projection method
CN104766608A (en) * 2014-01-07 2015-07-08 深圳市中兴微电子技术有限公司 Voice control method and voice control device
CN104269170B (en) * 2014-09-17 2018-04-20 成都博智维讯信息技术有限公司 A kind of ERP authorities audio recognition method
CN104200807B (en) * 2014-09-18 2017-11-17 温州大学 A kind of ERP sound control methods
CN104320255A (en) * 2014-09-30 2015-01-28 百度在线网络技术(北京)有限公司 Method for generating account authentication data, and account authentication method and apparatus
CN104598138B (en) * 2014-12-24 2017-10-17 三星电子(中国)研发中心 electronic map control method and device
CN105185384B (en) * 2015-06-11 2018-11-30 南京舒尔茨智能技术有限公司 Sound control play system and control method with environmental simulation function
US20180130467A1 (en) * 2015-09-09 2018-05-10 Mitsubishi Electric Corporation In-vehicle speech recognition device and in-vehicle equipment
CN105872619A (en) * 2015-12-15 2016-08-17 乐视网信息技术(北京)股份有限公司 Video playing record matching method and matching device
CN105897686A (en) * 2015-12-21 2016-08-24 乐视致新电子科技(天津)有限公司 Smart television user account speech management method and smart television
CN107527613A (en) * 2016-06-21 2017-12-29 中兴通讯股份有限公司 A kind of video traffic control method, mobile terminal and service server
CN106162987A (en) * 2016-07-01 2016-11-23 深圳市盛莱普智能科技有限公司 The sound control method of lighting
CN106409289B (en) * 2016-09-23 2019-06-28 合肥美的智能科技有限公司 Environment self-adaption method, speech recognition equipment and the household electrical appliance of speech recognition
CN106504743B (en) * 2016-11-14 2020-01-14 北京光年无限科技有限公司 Voice interaction output method for intelligent robot and robot
CN106409285A (en) * 2016-11-16 2017-02-15 杭州联络互动信息科技股份有限公司 Method and apparatus for intelligent terminal device to identify language type according to voice data
CN109979442A (en) * 2017-12-27 2019-07-05 珠海市君天电子科技有限公司 A kind of sound control method, device and electronic equipment
CN108538293B (en) * 2018-04-27 2021-05-28 海信视像科技股份有限公司 Voice awakening method and device and intelligent device
CN108831458A (en) * 2018-05-29 2018-11-16 广东声将军科技有限公司 A kind of offline voice is to order transform method and system
CN108766466A (en) * 2018-06-19 2018-11-06 河南孚点电子科技有限公司 A kind of video control method based on voice signal alarm
CN110867188A (en) * 2018-08-13 2020-03-06 珠海格力电器股份有限公司 Method and device for providing content service, storage medium and electronic device
CN108986792B (en) * 2018-09-11 2021-02-12 苏州思必驰信息科技有限公司 Training and scheduling method and system for voice recognition model of voice conversation platform
CN118588061A (en) * 2024-08-02 2024-09-03 深圳唯创知音电子有限公司 Equipment control method, system, equipment and storage medium based on voice recognition

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493987A (en) * 2008-01-24 2009-07-29 深圳富泰宏精密工业有限公司 Sound control remote-control system and method for mobile phone
CN201845550U (en) * 2010-10-28 2011-05-25 庄鸿 Voice recognition system of compact disc/digital video disc (CD/DVD) player

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332262B (en) * 2011-09-23 2012-12-19 哈尔滨工业大学深圳研究生院 Method for intelligently identifying songs based on audio features

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493987A (en) * 2008-01-24 2009-07-29 深圳富泰宏精密工业有限公司 Sound control remote-control system and method for mobile phone
CN201845550U (en) * 2010-10-28 2011-05-25 庄鸿 Voice recognition system of compact disc/digital video disc (CD/DVD) player

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王昆仑.语音特征的降维变换与特征鲁棒性.《新疆师范大学学报(自然科学版)》.2000,第19卷(第3期),15-19. *
语音识别控制在音频、视频系统中的应用;吴智量等;《微计算机信息(测控自动化)》;20041231;第20卷(第7期);113-114 *

Also Published As

Publication number Publication date
CN102568478A (en) 2012-07-11

Similar Documents

Publication Publication Date Title
CN102568478B (en) Video play control method and system based on voice recognition
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
US10629186B1 (en) Domain and intent name feature identification and processing
CN110706690A (en) Speech recognition method and device
US20170140750A1 (en) Method and device for speech recognition
US10529340B2 (en) Voiceprint registration method, server and storage medium
US9837068B2 (en) Sound sample verification for generating sound detection model
CN110544473B (en) Voice interaction method and device
US20160180838A1 (en) User specified keyword spotting using long short term memory neural network feature extractor
CN104575504A (en) Method for personalized television voice wake-up by voiceprint and voice identification
US20140236600A1 (en) Method and device for keyword detection
US20190318737A1 (en) Dynamic gazetteers for personalized entity recognition
CN108281138B (en) Age discrimination model training and intelligent voice interaction method, equipment and storage medium
CN106971723A (en) Method of speech processing and device, the device for speech processes
US20170262537A1 (en) Audio scripts for various content
CN109994106B (en) Voice processing method and equipment
CN105975569A (en) Voice processing method and terminal
US20220238118A1 (en) Apparatus for processing an audio signal for the generation of a multimedia file with speech transcription
CN112825248A (en) Voice processing method, model training method, interface display method and equipment
CN110691258A (en) Program material manufacturing method and device, computer storage medium and electronic equipment
CN103943111A (en) Method and device for identity recognition
CN111640434A (en) Method and apparatus for controlling voice device
CN109360551B (en) Voice recognition method and device
CN106531148A (en) Cartoon dubbing method and apparatus based on voice synthesis
CN108322770A (en) Video frequency program recognition methods, relevant apparatus, equipment and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee after: Youku network technology (Beijing) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: 1VERGE INTERNET TECHNOLOGY (BEIJING) Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20200624

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: Youku network technology (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210301

Address after: 100102 room 801, 8th floor, building 9, District 4, Wangjing East Garden, Chaoyang District, Beijing

Patentee after: Beijing Alibaba Music Technology Co.,Ltd.

Address before: 310052 room 508, 5th floor, building 4, No. 699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee before: Alibaba (China) Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150107

Termination date: 20210207

CF01 Termination of patent right due to non-payment of annual fee