CN107066477A

CN107066477A - A kind of method and device of intelligent recommendation video

Info

Publication number: CN107066477A
Application number: CN201611147664.6A
Authority: CN
Inventors: 张莹; 梁治刚; 林岳; 顾思斌; 潘柏宇; 王冀
Original assignee: 1Verge Internet Technology Beijing Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2016-12-13
Filing date: 2016-12-13
Publication date: 2017-08-18

Abstract

The invention discloses a kind of method and system of intelligent recommendation video, methods described includes receiving foreign medium sound, and final goal voice is identified from the foreign medium sound, and the final goal voice includes speech source and noise source；Noise reduction is carried out to the final goal voice using noise subtraction algorithm；According to voice conversion function, the final goal voice after noise reduction is converted into target text；Obtain the target keyword in the target text；The target keyword and the keyword in the big data of backstage are matched, video to be recommended is obtained.The present invention searches for corresponding video to be recommended by backstage big data, and pointedly video recommendations are done in realization to user；User finds video and more facilitated and efficient；The accuracy of recommendation is high, further lifts Consumer's Experience.

Description

A kind of method and device of intelligent recommendation video

Technical field

The present invention relates to video technique field, more particularly to a kind of method and device of intelligent recommendation video.

Background technology

With the raising of quality of life, mobile phone as we communication, entertain and consume in one instrument, it appears it is more next It is more indispensable.Wherein, the demand that mobile phone disclosure satisfy that our instant messagings and entertain immediately, this causes us to be almost it Machine is not from body, inevitably wholeheartedly two situation under this situation, such as after coming home from work, habitually electricity The channel liked depending on being multicast to, the channel includes news, ball match or TV play etc., and then one side television on is set about while playing Machine；Wherein, the sound of external video is the good resource of an acquisition user preferences video, if such as can be according to TV The sound of broadcasting judges hobby of the user to video, so for user cell phone intelligent recommend video, this will further be lifted User watches the experience of video.

The method of current existing intelligent recommendation video mainly has two kinds：A kind of is the video class that user actively selects to like Not, the video of identical category is then recommended according to the selection of user；It is another, it is the viewing historical record according to user, pushes away Recommend other videos for the video identical category watched with user.Wherein, actively the video liked is selected to be recommended, Yi Jigen The recommendation carried out according to watching record of user, foundation be all user operation, however, can be inevitable in user's operating process There is the maloperation of user or tries operation in ground, and this is simultaneously not belonging to the video that user likes；So, both recommend methods without Method reflects the true interest of user, and the intellectuality of recommendation, the degree of accuracy, comprehensive degree are relatively low.

The content of the invention

In order to solve the above-mentioned technical problem, the present invention proposes a kind of method and device of intelligent recommendation video.

The present invention is realized with following technical scheme：

A kind of method of intelligent recommendation video, methods described includes：

Foreign medium sound is received, final goal voice, institute are identified from the foreign medium sound

Stating final goal voice includes speech source and noise source；

Noise reduction is carried out to the final goal voice using noise subtraction algorithm；

According to voice conversion function, the final goal voice after noise reduction is converted into target text；

Obtain the target keyword in the target text；

The target keyword and the keyword in the big data of backstage are matched, video to be recommended is obtained.

It is further, described to identify that final goal voice includes from the foreign medium sound,

Judge whether the frequency of the foreign medium sound is more than 5KHz, if so, then the foreign medium sound is advance target Voice, judges the signal source of the advance target voice.

Further, the signal source for judging the advance target voice includes,

If the target voice is single tone signal waveforms, noise reduction is carried out to the advance target voice,

The advance target voice is final goal voice.

Further, the noise subtraction algorithm includes,

Reference signal is produced by adaptive noise cancellation method,

Noise reduction process is carried out to the final goal voice using the reference signal.

Further, the target keyword obtained in the target text includes,

Participle is carried out to the target text using participle technique, target word is obtained,

Judge whether the probability of the target word is more than predetermined probabilities, if so, the then target word

For target keyword.

Preferably, the foreign medium sound includes the sound of mobile device or the sound of non-mobile device.

A kind of device of intelligent recommendation video, described device includes：

Sound receiving module, for receiving foreign medium sound, final goal voice is identified from the foreign medium sound, The final goal voice includes speech source and noise source；

Noise reduction module, can carry out noise reduction using noise subtraction algorithm to the final goal voice；

Voice conversion module, for according to voice conversion function, the final goal voice after noise reduction to be converted into target text；

Keyword acquisition module, for obtaining the target keyword in the target text；

Keywords matching module, for matching the target keyword and the keyword in the big data of backstage, obtains to be recommended regard Frequently.

Further, the sound receiving module includes,

Frequency judging unit, for judging whether the frequency of the foreign medium sound is more than 5KHz, if so, the then outside matchmaker Body sound is advance target voice, judges the signal source of the advance target voice.

Further, the sound receiving module also includes signal judge module, for judging

Target voice is stated in the case of single tone signal waveforms, noise reduction, the advance target are carried out to the advance target voice Voice is final goal voice.

Further, the noise reduction module includes,

Reference signal generation unit, for producing reference signal, noise reduction processing unit, for profit by adaptive noise cancellation method Noise reduction process is carried out to the final goal voice with the reference signal.

Further, the keyword acquisition module includes,

Participle unit, for carrying out participle to the target text using participle technique, obtains target word, probabilistic determination list Member, for judging whether the probability of the target word is more than predetermined probabilities, if so, then the target word is target critical Word.

Preferably, the foreign medium sound includes the sound of mobile device and/or the sound of non-mobile device.

The device have the advantages that：The present invention realizes foreign medium by obtaining foreign medium sound using algorithm The judgement of sound and the noise reduction of target voice；Word can be converted speech into simultaneously, target keyword is recorded, and matching target is closed Keyword, corresponding video to be recommended is searched for by backstage big data, and pointedly video recommendations are done in realization to user；User finds Video more facilitates and efficient；The accuracy of recommendation is high, further lifts Consumer's Experience.

Brief description of the drawings

Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.

Fig. 1 is the method flow diagram for the intelligent recommendation video that embodiment one is provided；

Fig. 2 is the single channel speech enhan-cement schematic diagram that embodiment one is provided；

Fig. 3 is the device block diagram for the intelligent recommendation video that embodiment two is provided；

Fig. 4 is the adaptive noise cancellation method operation principle schematic diagram that embodiment two is provided；

Fig. 5 is a kind of structural representation for terminal that embodiment three is provided.

Embodiment

In order that those skilled in the art more fully understand the present invention program, below in conjunction with the present invention

Accompanying drawing in embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described Embodiment be only the embodiment of a part of the invention, but be not limited thereto.Also, based on the embodiment in the present invention, The every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, should all belong to The scope of protection of the invention.

It should be noted that term " comprising " and " having " and their any deformation, it is intended that covering is non-exclusive Include, for example, the process, method, device, product or the equipment that contain series of steps or unit are not necessarily limited to clearly arrange Those steps or unit gone out, but may include not listing clearly or solid for these processes, method, product or equipment The other steps or unit having.

It should be noted that coming under this specially to user terminal intelligent recommendation video capability being realized by such method The protection domain of profit；Also, directly pass through place it is further contemplated that not needing voice to be converted to text for the technical scheme The voice obtained after reason can just realize the recommendation of video.

Embodiment one：

As shown in figure 1, present embodiments providing a kind of method of intelligent recommendation video, methods described includes：

S101. foreign medium sound is received, final goal voice, the final mesh are identified from the foreign medium sound Poster sound includes speech source and noise source；

Wherein, the foreign medium sound includes the sound of mobile device and/or the sound of non-mobile device.

Further, it is described to identify that final goal voice includes from the foreign medium sound, sentence

Whether the frequency of the disconnected foreign medium sound is more than 5KHz, if so, then the foreign medium sound is advance target language Sound, judges the signal source of the advance target voice.

It should be noted that in the frequency for judging the foreign medium sound, the frequency of the foreign medium sound Namely audio sampling frequency, specifically, is further explained to audio sampling frequency, utilizes A/D converter（A/D）With every Second, the speed of up to ten thousand times were sampled to sound wave；The number that each of which second is sampled referred to as sample frequency, unit is Hz（It is conspicuous Hereby）；Sample each time and all have recorded the state of original analog sound wave at a time, referred to as sample, a string of sample is connected Pick up and, just form one section of sound wave.

Sample frequency is generally divided into 22.05KHz, 44.1KHz, 48KHz Three Estate, wherein, 22.05 KHz are that FM is wide The sound quality broadcast, 44.1KHz is theoretic CD Quality boundary, and 48KHz is then more accurate, for higher than 48KHz's Sample frequency human ear can not be discernable, so being not much use value.Further, 5KHz sample rate is only capable of reaching People's voice quality；11KHz sample rate is to play the minimum standard of segment sound, is a quarter of CD Quality； The sound of 22KHz sample rates can reach the half of CD Quality, and current most of websites are all from such sample rate； 44KHz Sample rate be standard CD Quality, good auditory effect can be reached.

Preferably, MIC is utilized（Microphone）Frequency of the sensor to the foreign medium sound

Whether detected more than 5KHz, if the frequency of the foreign medium sound can not reach people's voice quality, It is less than or equal to 5KHz, then video recommendations are nonsensical, without recommending.

Further, the signal source for judging the advance target voice includes, if the target

Voice is single tone signal waveforms, then noise reduction is carried out to the advance target voice, and the advance target voice is final mesh Poster sound；If the target voice is complex waveform, without recommending.

Wherein, single tone signal waveforms refer to the voice signal being made up of single-frequency and amplitude；Complex waveform refers to By the voice signal of the different sinusoidal wave component of some frequencies and amplitude, what nature was present is complex tone mostly.

S102. noise reduction is carried out to the final goal voice using noise subtraction algorithm；

Wherein, the process to final goal voice de-noising namely strengthens the process of voice, speech enhan-cement

Target be that raw tone as pure as possible is extracted from Noisy Speech Signal, however, due to interference be all it is random, The complete pure voice of extraction is hardly possible from noisy speech, therefore the purpose of speech enhan-cement is mainly improvement voice matter Amount, eliminates ambient noise.

Wherein, in order to reduce noise of the signal in transmitting procedure, quality of voice transmission is improved, mostly

Several general sound enhancement methods can be used.

Noise cancellation method, is, using noise jamming as process object, very big decay to be curbed or carry out by noise jamming, With the noise specific mass for improving signal transmission and receiving；Harmonic frequency suppresses method, i.e., the periodicity principle based on noise, using humorous The self-adapting comb filtering of ripple noise implements fundamental frequency tracking to complete noise reduction；Using vocoder synthetic method again, it utilizes iterative method, On the basis of pronunciation modeling, model parameter is estimated, noise-free signal is recombined again with the method for description voice signal；Spectrum subtracts Method is that noise spectrum valuation is subtracted from noisy speech valuation, so as to obtain more pure voice spectrum.

Further, the noise subtraction algorithm includes, and reference is produced by adaptive noise cancellation method

Signal, noise reduction process is carried out using the reference signal to the final goal voice.Compare other method, adaptively It is a kind of method of effective noise reduction that noise cancellation method, which carries out noise reduction process, and noise reduction amplitude improves, and the voice after noise reduction exists It is also more excellent in terms of definition and naturalness.

Specifically, as shown in Fig. 2 the periodicity based on acoustic vocal, adaptive noise cancellation method can be by producing ginseng Examine signal and be used.Wherein, reference signal is a cycle formation of delay main signal, it is desirable to have complicated spacing estimation Algorithm.FFT is utilized in speech frame（Fast Fourier Transformation）Fast Fourier transform, with the noise of estimation Amplitude frequency spectrum subtracts each other, and inverse transformation this spectral magnitude after subtracting each other, recycles the phase of original noise, has obtained noise in short-term Amplitude and phase frequency spectrum；Wherein, enhancing step is that a frame is completed with connecing a frame, and the method is first filtered the voice of pollution using band logical Ripple device group resolves into different group of frequencies, and then the noise power of each subrane is estimated during without voice；Pass through Noise suppressed can be obtained using decay factor, wherein decay factor corresponded to each subrane estimating noise power than upper wink When signal power.

S103. according to voice conversion function, the final goal voice after noise reduction is converted into target text；

The final goal voice includes speech source and noise source, obtains purer by carrying out noise reduction to final goal voice Voice, is further converted to word by speech recognition technology by purer voice, is used as the premise for obtaining video to be recommended. Wherein, presence can realize the technology that enhanced voice is converted into word in application software or input method.

S104. the target keyword in the target text is obtained；

Further, the target keyword obtained in the target text includes,

For target keyword.

Specifically, after enhanced voice being converted into word, word is further divided into several words, wherein, it is No is target keyword, is by judging whether the probability for the word that participle is obtained is more than predetermined probabilities, if so, the word then obtained Language can be used as target keyword；The predetermined probabilities and the probability of the keyword in the big data of backstage are consistent.

S105. the target keyword and the keyword in the big data of backstage are matched, video to be recommended is obtained.

Specifically, by the way that target keyword is matched with the keyword in the big data of backstage, if the match is successful, enter One step is by the keyword lookup video to be recommended in the big data of backstage, by the video push to be recommended to user terminal, if With unsuccessful, then the matching of next target keyword is carried out.

In summary, a kind of method for intelligent recommendation video that the present embodiment is provided, passes through the knowledge to foreign medium sound Other and judgement, can accurately know the video of user preferences, pointedly carry out video recommendations.

Embodiment two：

As shown in figure 3, present embodiments providing a kind of device of intelligent recommendation video, described device is used

In the method for performing the intelligent recommendation video that above-described embodiment one is provided, described device includes：

Sound receiving module 210, for receiving foreign medium sound, final goal language is identified from the foreign medium sound Sound, the final goal voice includes speech source and noise source；

Noise reduction module 220, can carry out noise reduction using noise subtraction algorithm to the final goal voice；

Voice conversion module 230, for according to voice conversion function, the final goal voice after noise reduction to be converted into target text Word；

The final goal voice includes speech source and noise source, obtains purer by carrying out noise reduction to final goal voice Voice, is further converted to word by speech recognition technology by purer voice, is used as the premise for obtaining video to be recommended. Wherein, the technology that enhanced voice is converted into word can be realized by having in application software or input method.

Keyword acquisition module 240, for obtaining the target keyword in the target text；

Keywords matching module 250, for matching the target keyword and the keyword in the big data of backstage, is obtained to be recommended Video.

Further, the sound receiving module 210 includes,

Frequency judging unit 211, for judging whether the frequency of the foreign medium sound is more than 5KHz, if so, then described outer Portion's media sound is advance target voice, judges the signal source of the advance target voice.

Further, the sound receiving module 210 also includes signal judge module 212, for judging the mesh In the case that poster sound is single tone signal waveforms, noise reduction is carried out to the advance target voice, the advance target voice is most Whole target voice.

Preferably, whether the frequency of the foreign medium sound is detected more than 5KHz using MIC sensors.

Further, the noise reduction module 220 includes,

Reference signal generation unit 221, for producing reference signal by adaptive noise cancellation method,

Noise reduction processing unit 222, for being dropped using the reference signal to the final goal voice

Make an uproar processing.

Wherein, the operation principle schematic diagram of adaptive noise cancellation method is as shown in figure 4, Adaptive noise canceller is to be based on A kind of extension of adaptive filtration theory, it has two input pickups, and first sensor is in addition to signal s is received, also Receive one and the incoherent noise n of signal₀, that is, input s+ n₀It is used as being originally inputted for canceller；Second sensor connects Receive it is uncorrelated to signal but with noise n₀With the related noise n of certain unknown manner₁；Second sensor provides ginseng to canceller Examine input.Sef-adapting filter is by noise n₁Filtered, produced and noise n₀The output y matched；By the output from original defeated Subtract and obtain in entering：

ε= s+ n₀-y；

Generally, from noise source to original leg, the channel transfer characteristic of reference arm is unknown, or simply approximate understands and do not have There is fixed property.So, it is impossible to obtained with the wave filter of a preset parameter and n₀The output y matched.And use certainly Adaptive filter can just be obtained by the adaptive algorithm that is controlled by output error signal come automatically adjusting parameter at any time Good neutralization effect.

Further, the keyword acquisition module 240 includes,

Participle unit 241, for carrying out participle to the target text using participle technique, obtains target word,

Probabilistic determination unit 242, for judging whether the probability of the target word is more than predetermined probabilities, if so, the then mesh Mark word is target keyword.

In summary, the device for a kind of intelligent recommendation video that the present embodiment is provided, is received by noise reduction module to sound The sound that module is received carries out noise reduction, and further the word after participle is matched using Keywords matching module, and then To video to be recommended；The true interest of user can be reflected, the intellectuality and the degree of accuracy of recommendation are higher, so as to improve user Experience.

Embodiment three：

As shown in figure 5, the embodiments of the invention provide a kind of terminal, the terminal can be used for implementing providing in above-described embodiment one Intelligent recommendation video method, the terminal can also include the device of intelligent recommendation video provided in embodiment two.Specifically For：

Terminal 800 can include RF（Radio Frequency, radio frequency）Circuit 110, include one or more computers Memory 120, input block 130, display unit 140, sensor 150, voicefrequency circuit 160, the WiFi of readable storage medium storing program for executing (wireless fidelity, Wireless Fidelity) module 170, include one or the processor of more than one processing core The part such as 180 and power supply 190.It will be understood by those skilled in the art that the terminal structure shown in Fig. 5 was not constituted to end The restriction at end, can be included than illustrating more or less parts, either combine some parts or different parts arrangement. Wherein：

RF circuits 110 can be used for receive and send messages or communication process in, the reception and transmission of signal, especially, by the descending of base station After information is received, transfer to one or more than one processor 180 is handled；In addition, being sent to base station by up data are related to. Generally, RF circuits 110 include but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, Yong Hushen Part module（SIM）Card, transceiver, coupler, LNA（Low Noise Amplifier, low-noise amplifier）, duplexer etc.. In addition, RF circuits 110 can also be communicated by radio communication with network and other equipment.The radio communication can use any (Global System of Mobile communication, the whole world is mobile for communication standard or agreement, including but not limited to GSM Communication system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, WCDMA), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc..

Memory 120 can be used for storage software program and module, and processor 180 is stored in memory 120 by operation Software program and module, so as to perform various function application and data processing.Memory 120 can mainly include storage journey Sequence area and storage data field, wherein, the application program that storing program area can be needed for storage program area, function（Such as sound is broadcast Playing function, image player function etc.）Deng；Storage data field can be stored uses created data according to terminal 800（Such as sound Frequency evidence, phone directory etc.）Deng.In addition, memory 120 can include high-speed random access memory, it can also include non-volatile Property memory, for example, at least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, deposit Reservoir 120 can also include Memory Controller, to provide the access of processor 180 and input block 130 to memory 120.

Input block 130 can be used for the numeral or character information for receiving input, and generation to be set with user and function The relevant keyboard of control, mouse, action bars, optics or the input of trace ball signal.Specifically, input block 130 may include to touch Sensitive surfaces 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad, collect and use Touch operation of the family on or near it（Such as user is using any suitable objects such as finger, stylus or annex in touch-sensitive table Operation on face 131 or near touch sensitive surface 131）, and corresponding attachment means are driven according to formula set in advance.It is optional , touch sensitive surface 131 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus detection is used The touch orientation at family, and the signal that touch operation is brought is detected, transmit a signal to touch controller；Touch controller is from touch Touch information is received in detection means, and is converted into contact coordinate, then gives processor 180, and can reception processing device 180 The order sent simultaneously is performed.Furthermore, it is possible to using polytypes such as resistance-type, condenser type, infrared ray and surface acoustic waves Realize touch sensitive surface 131.Except touch sensitive surface 131, input block 130 can also include other input equipments 132.Specifically, Other input equipments 132 can include but is not limited to physical keyboard, function key（Such as volume control button, switch key etc.）、 One or more in trace ball, mouse, action bars etc..

Display unit 140 can be used for the information that is inputted by user of display or the information for being supplied to user and terminal 800 Various graphical user interface, these graphical user interface can be made up of figure, text, icon, video and its any combination. Display unit 140 may include display panel 141, optionally, can use LCD (Liquid Crystal Display, liquid crystal Show device), the form such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display panel 141.Further, touch sensitive surface 131 can cover display panel 141, when touch sensitive surface 131 detects touching on or near it Touch after operation, send processor 180 to determine the type of touch event, with type of the preprocessor 180 according to touch event Corresponding visual output is provided on display panel 141.Although in Figure 5, touch sensitive surface 131 and display panel 141 are conducts Two independent parts are inputted and input function to realize, but in some embodiments it is possible to by touch sensitive surface 131 with showing Panel 141 is integrated and realizes input and output function.

Terminal 800 may also include at least one sensor 150, such as optical sensor, motion sensor and other sensings Device.Specifically, optical sensor may include ambient light sensor and proximity transducer, wherein, ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 141, and proximity transducer can close display when terminal 800 is moved in one's ear Panel 141 and/or backlight.As one kind of motion sensor, gravity accelerometer can detect in all directions（Generally Three axles）The size of acceleration, can detect that size and the direction of gravity, the application available for identification terminal posture when static（Than Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating）, Vibration identification correlation function（Such as pedometer, percussion）Deng;Extremely The other sensors such as the gyroscope, barometer, hygrometer, thermometer, the infrared ray sensor that can also configure in terminal 800, herein Repeat no more.

Voicefrequency circuit 160, loudspeaker 161, microphone 162 can provide the COBBAIF between user and terminal 800.Audio Electric signal after the voice data received conversion can be transferred to loudspeaker 161, sound is converted to by loudspeaker 161 by circuit 160 Sound signal output；On the other hand, the voice signal of collection is converted to electric signal by microphone 162, after voicefrequency circuit 160 is received Voice data is converted to, then after voice data output processor 180 is handled, through RF circuits 110 to be sent to such as another end End, or voice data is exported to memory 120 so as to further processing.Voicefrequency circuit 160 is also possible that earphone jack, To provide the communication of peripheral hardware earphone and terminal 800.

WiFi belongs to short range wireless transmission technology, and terminal 800 can help user's transceiver electronicses by WiFi module 170 Mail, browse webpage and access streaming video etc., it has provided the user wireless broadband internet and accessed.Although Fig. 5 is shown WiFi module 170, but it is understood that, it is simultaneously not belonging to must be configured into for terminal 800, can exist as needed completely Do not change in the essential scope of invention and omit.

Processor 180 is the control centre of terminal 800, utilizes various interfaces and each portion of the whole terminal of connection Point, by operation or perform and be stored in software program and/or module in memory 120, and call and be stored in memory 120 Interior data, perform the various functions and processing data of terminal 800, so as to carry out integral monitoring to terminal.Optionally, processor 180 may include one or more processing cores；It is preferred that, processor 180 can integrated application processor and modem processor, Wherein, application processor mainly handles operating system, user interface and application program etc., and modem processor mainly handles nothing Line communicates.It is understood that above-mentioned modem processor can not also be integrated into processor 180.

Terminal 800 also includes the power supply 190 powered to all parts（Such as battery）, it is preferred that power supply can pass through electricity Management system and processor 180 are logically contiguous, so as to realize management charging, electric discharge and power consumption by power-supply management system The functions such as management.Power supply 190 can also include one or more direct current or AC power, recharging system, power supply event The random component such as barrier detection circuit, power supply changeover device or inverter, power supply status indicator.

Although not shown, terminal 800 can also include camera, bluetooth module etc., will not be repeated here.Specifically in this reality Apply in example, the display unit of terminal is touch-screen display, terminal also includes memory, and one or more than one Program, one of them or more than one program storage is configured to by one or more than one processing in memory Device execution states one or more than one program bag contains the instruction for being used for being operated below：

Stating final goal voice includes speech source and noise source；

Obtain the target keyword in the target text；

Further also including is used to perform following instruction：Judge the frequency of the foreign medium sound

Whether it is more than 5KHz, if so, then the foreign medium sound is advance target voice, judges the advance target voice Signal is originated.

Further also including is used to perform following instruction：If the target voice is tone signal ripple

Shape, then carry out noise reduction to the advance target voice, and the advance target voice is final goal voice.

Further also including is used to perform following instruction：Ginseng is produced by adaptive noise cancellation method

Signal is examined, noise reduction process is carried out to the final goal voice using the reference signal.

Further also including is used to perform following instruction：Using participle technique to the target text

Participle is carried out, target word is obtained, judges whether the probability of the target word is more than predetermined probabilities, if so, then described Target word is target keyword.

Preferably, the memory includes at least one foreign medium acoustic memory, and the terminal for example, can Think the hand-held cell phone terminal of user, while user plays mobile phone, at one's side play the program liked on TV； Now mobile phone terminal backstage can constantly receive the sound in TV programme, and the sound is preserved to foreign medium sound and deposited In reservoir, when user wants that mobile phone terminal will call foreign medium when opening video player when watching video from mobile phone terminal Instructed in acoustic memory, and then start to perform aforesaid operations instruction.

In summary, a kind of terminal that the present embodiment is provided, the terminal can implement what is provided in above-described embodiment one The method of intelligent recommendation video, additionally it is possible to which the device of the intelligent recommendation video provided in embodiment two is provided, it is outside by removing The noise of media sound, word is converted to by the voice obtained after denoising, further carries out Keywords matching, can accurately be known The video of user preferences, accuracy, comprehensively to user carry out video recommendations.

Example IV：

A kind of storage medium is present embodiments provided, the readable storage medium storing program for executing can be the readable storage medium included in memory Matter；Can also be individualism, without the readable storage medium storing program for executing in supplying terminal.

Readable storage medium storing program for executing is stored with one or more than one program, and described program, which is included, is used for what is operated below Instruction：

The first step：Foreign medium sound is received, final goal is identified from the foreign medium sound

Voice, the final goal voice includes speech source and noise source；

Second step：Noise reduction is carried out to the final goal voice using noise subtraction algorithm；

3rd step：According to voice conversion function, the final goal voice after noise reduction is converted into target text；

4th step：Obtain the target keyword in the target text；

5th step：The target keyword and the keyword in the big data of backstage are matched, video to be recommended is obtained.

Further described program also includes the instruction for being used for being operated below, described from the outside

Identify that final goal voice includes in media sound, sentence

Further described program also includes the instruction for being used for being operated below, and the judgement is described pre-

The signal source of first target voice includes, if the target voice is single tone signal waveforms, to the advance target language Sound carries out noise reduction, and the advance target voice is final goal voice.

Further described program also includes the instruction for being used for being operated below, and the noise subtraction is calculated

Method includes, and reference signal is produced by adaptive noise cancellation method, using the reference signal to the final goal language Sound carries out noise reduction process.

Further described program also includes the instruction for being used for being operated below, the acquisition mesh

Target keyword in mark word includes, and carries out participle to the target text using participle technique, obtains target word, Judge whether the probability of the target word is more than predetermined probabilities, if so, then the target word is target keyword.

In summary, a kind of storage medium that the present embodiment is provided, can method described in storage implementation example one it is corresponding Instruction, by the identification and judgement to foreign medium sound, can accurately know hobby of the user to video, the accuracy of recommendation Height, and then user is found video and is more facilitated and efficient.

In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment The part of detailed description, may refer to the associated description of other embodiment.

The modules in technical scheme in the present invention can be realized by terminal or miscellaneous equipment.The meter Calculation machine terminal and other equipment include processor and memory.The program that the memory is used to store in the present invention refers to Order/module, the processor is stored in programmed instruction/module in memory by operation, realizes corresponding function of the present invention.

Part or the technical scheme that technical scheme in the present invention substantially contributes to prior art in other words All or part can be embodied in the form of software product, the computer software product is stored in storage medium, bag Some instructions are included to cause one or more computer equipment（Can be personal computer, server or network equipment etc.）Hold The all or part of step of each embodiment methods described of the row present invention.

The division of heretofore described module/unit, only a kind of division of logic function can have another when actually realizing Outer dividing mode, such as multiple units or component can combine or be desirably integrated into another device, or some features can To ignore, or do not perform.Some or all of module/unit therein can be selected according to the actual needs realizes this to reach The purpose of scheme of the invention.

In addition, each module/unit in each embodiment of the invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.

Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims

1. a kind of method of intelligent recommendation video, it is characterised in that methods described includes：

Stating final goal voice includes speech source and noise source；

Obtain the target keyword in the target text；

2. the method for intelligent recommendation video according to claim 1, it is characterised in that described from the foreign medium sound In identify that final goal voice includes,

Judge whether the frequency of the foreign medium sound is more than 5KHz, if so, then the foreign medium sound is advance target Voice, determines whether the signal source of the advance target voice.

3. the method for intelligent recommendation video according to claim 2, it is characterised in that the judgement advance target language The signal source of sound includes,

The advance target voice is final goal voice.

4. the method for intelligent recommendation video according to claim 1, it is characterised in that the noise subtraction algorithm includes,

Reference signal is produced by adaptive noise cancellation method,

5. the method for intelligent recommendation video according to claim 1, it is characterised in that in the acquisition target text Target keyword include,

For target keyword.

6. the method for intelligent recommendation video according to claim 1, it is characterised in that the foreign medium sound includes moving The sound of dynamic equipment and/or the sound of non-mobile device.

7. a kind of device of intelligent recommendation video, it is characterised in that described device includes：

8. the device of intelligent recommendation video according to claim 7, it is characterised in that the sound receiving module includes,

9. the device of intelligent recommendation video according to claim 8, it is characterised in that the sound receiving module also includes

Signal judge module, for judge the target voice be single tone signal waveforms in the case of,

Noise reduction is carried out to the advance target voice, the advance target voice is final goal voice.

10. the device of intelligent recommendation video according to claim 7, it is characterised in that the noise reduction module includes,

Reference signal generation unit, for producing reference signal by adaptive noise cancellation method,

Noise reduction processing unit, for carrying out noise reduction process to the final goal voice using the reference signal.

11. the device of intelligent recommendation video according to claim 7, it is characterised in that the keyword acquisition module bag Include,

Participle unit, for carrying out participle to the target text using participle technique, obtains target word

Language,

Probabilistic determination unit, for judging whether the probability of the target word is more than predetermined probabilities, if

It is that then the target word is target keyword.

12. the device of intelligent recommendation video according to claim 7, it is characterised in that the foreign medium sound includes The sound of mobile device and/or the sound of non-mobile device.