CN107679098A

CN107679098A - A kind of multimedia data processing method, device and storage medium

Info

Publication number: CN107679098A
Application number: CN201710807348.5A
Authority: CN
Inventors: 陈珊
Original assignee: MIGU Video Technology Co Ltd
Current assignee: MIGU Video Technology Co Ltd
Priority date: 2017-09-08
Filing date: 2017-09-08
Publication date: 2018-02-09

Abstract

The invention discloses a kind of multimedia data processing method, including：Monitor voice messaging；Multi-medium data corresponding with voice messaging is determined based on the voice messaging monitored；In identified multi-medium data, multi-medium data to be played is determined；In multi-medium data to be played, play position is determined；Multi-medium data to be played is played based on institute's play position.The invention also discloses a kind of apparatus for processing multimedia data and storage medium.

Description

A kind of multimedia data processing method, device and storage medium

Technical field

The present invention relates to the communication technology, more particularly to a kind of multi-media processing method, device and storage medium.

Background technology

In the prior art, user using phonetic order find corresponding to after multi-medium data, terminal device only can be from The beginning of the multi-medium data found plays the multi-medium data, and still, user is often only needed from being found Specified node in multi-medium data commences play out, and traditional lookup playing process can not allow user from multi-medium data Specific part commences play out, and causes user to have to expend the time from the broadcasting of the beginning of multi-medium data or first adjustment broadcasting Progress is played out again, and many obstacles are brought for the use of user.

Invention chapters and sections

In view of this, the embodiment of the present invention it is expected to provide a kind of multimedia data processing method, device and storage medium.Energy The voice messaging by identifying user is reached, the multimedia number is commenced play out from the specified location in found multi-medium data According to.

To reach above-mentioned purpose, what the technical scheme of the embodiment of the present invention was realized in：

The embodiments of the invention provide a kind of multi-media processing method, including：

Monitor voice messaging；

Multi-medium data corresponding with the voice messaging is determined based on the voice messaging monitored；

In identified multi-medium data, multi-medium data to be played is determined；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.

It is described that multimedia number corresponding with the voice messaging is determined based on the voice messaging monitored in such scheme According to, including：

The voice messaging is matched with candidate's multi-medium data, according to matching degree from candidate's multi-medium data It is determined that multi-medium data corresponding with the voice messaging.

It is described corresponding candidate's multi-medium data is chosen according to matching degree to be and the voice messaging pair in such scheme The multi-medium data answered, including：

Choose matching degree exceed the first matching degree threshold value candidate's multi-medium data, as with the voice messaging pair The multi-medium data answered.

It is described in identified multi-medium data in such scheme, multi-medium data to be played is determined, including：

According to the selection instruction of reception, in identified multi-medium data, the more matchmakers of candidate of the instruction instruction are determined Volume data is the multi-medium data to be played；Or

According to the matching degree of identified multi-medium data, multi-medium data to be played is determined；

The matching degree is the multi-medium data and the matching result of the first matching degree threshold value.

When the matching degree of identified multi-medium data exceeds the first matching degree threshold value, according to preset strategy, Multi-medium data to be played is determined in identified multi-medium data；

The preset strategy includes at least one of：

Select the most multi-medium data of history broadcasting time；

Selection plays the most multi-medium data of barrage quantity；

Select the multi-medium data being once played；

When the matching degree of identified multi-medium data exceeds the first matching degree threshold value, according to preset strategy, Multi-medium data corresponding to the multi-medium data is concentrated, and determines multi-medium data to be played；

The preset strategy includes at least one of：

Selection stores date earliest multi-medium data；

The multi-medium data of selection storage date the latest；

Select the multi-medium data not being played.

It is described to determine play position in the multi-medium data to be played in such scheme, including：

The multi-medium data to be played is matched with the phonetic feature of the voice messaging, it is true according to the result of matching The fixed play position.

It is described to match multi-medium data to be played with the phonetic feature of the voice messaging in such scheme, according to The result of matching determines the play position, including：

When the phonetic feature matching result of the multi-medium data to be played and the voice messaging is less than default the During two matching degree threshold values, one of following operation is performed：

The multimedia data location to be played corresponding with the voice messaging starting position is defined as to play position Put；

The starting position of the multi-medium data to be played is defined as play position.

Preset when the phonetic feature matching result of the multi-medium data to be played and the voice messaging is more than or equal to The second matching degree threshold value when, perform one of following operation：

The multimedia data location to be played corresponding with the voice messaging end position is defined as to play position Put；

By more matchmakers to be played of the position correspondence of the latter unit distance with the voice messaging end position Volume data position is defined as play position.

The embodiment of the present invention additionally provides a kind of apparatus for processing multimedia data, including：

Module is monitored, for monitoring voice messaging；

Multi-medium data determining module, for fixed multi-medium data corresponding with the voice messaging；

Multi-medium data determining module to be played is to be played more in identified multi-medium data, determining Media data；

Play position determining module, in the multi-medium data to be played, determining play position；

Playing module, for playing the multi-medium data to be played from the play position.

In such scheme,

The multi-medium data determining module, for the voice messaging to be matched with candidate's multi-medium data；

The multi-medium data determining module, for being determined according to matching degree from candidate's multi-medium data and institute State multi-medium data corresponding to voice messaging.

In such scheme,

The multi-medium data determining module, the more matchmakers of candidate of the first matching degree threshold value are exceeded for choosing matching degree Volume data, as multi-medium data corresponding with the voice messaging.

In such scheme,

The multi-medium data determining module to be played, for the selection instruction according to reception, in identified more matchmakers In volume data, the candidate's multi-medium data for determining the instruction instruction is the multi-medium data to be played；

The multi-medium data determining module to be played, for according to determined by multi-medium data matching degree, Determine multi-medium data to be played；

In such scheme,

The multi-medium data determining module to be played, for working as according to preset strategy, in identified multimedia number Multi-medium data to be played is determined according to middle；

The preset strategy includes at least one of：

Select the most multi-medium data of history broadcasting time；

Selection plays the most multi-medium data of barrage quantity；

Select the multi-medium data being once played；

In such scheme,

The multi-medium data determining module to be played, for according to preset strategy, being corresponded in the multi-medium data Multi-medium data concentrate, determine multi-medium data to be played；

The preset strategy includes at least one of：

Selection stores date earliest multi-medium data；

The multi-medium data of selection storage date the latest；

Select the multi-medium data not being played.

In such scheme,

The play position determining module, for by the voice of the multi-medium data to be played and the voice messaging Characteristic matching；

The play position determining module, for determining the play position according to the result of matching.

In such scheme,

The play position determining module, for will be corresponding with the voice messaging starting position described to be played more Media data position is defined as play position；

The play position determining module, for the starting position of the multi-medium data to be played to be defined as playing Position.

In such scheme,

The play position determining module, for will be corresponding with the voice messaging end position described to be played more Media data position is defined as play position；

The play position determining module, for by the position of the latter unit distance with the voice messaging end position The multimedia data location to be played corresponding to putting is defined as play position.

The embodiment of the present invention additionally provides a kind of apparatus for processing multimedia data, it is characterised in that the multi-medium data Processing unit includes：

Processor and the memory for storing the computer program that can be run on a processor,

Wherein, when the processor is used to run the computer program, following operate is performed：

Monitor voice messaging；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.

The embodiment of the present invention additionally provides a kind of computer-readable recording medium, is stored thereon with computer program, and it is special Sign is that the computer program is executed by processor：

Monitor voice messaging；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.

In the embodiment of the present invention, multimedia number corresponding with the voice messaging can be determined by the voice messaging of monitoring According to；And further in identified multi-medium data, determine multi-medium data to be played；And described to be played more In media data, play position is determined；It is finally based on the play position and plays the multi-medium data to be played, passes through this Kind mode can avoid the uninterested content of user to a certain extent, accurate to play user's content interested.

Brief description of the drawings

Fig. 1 is one optional schematic flow sheet of multimedia data processing method provided in an embodiment of the present invention；

Fig. 2 is one optional schematic flow sheet of multimedia data processing method provided in an embodiment of the present invention；

Fig. 3 is one optional schematic flow sheet of multimedia data processing method provided in an embodiment of the present invention；

Fig. 4 is one optional composition schematic diagram of apparatus for processing multimedia data provided in an embodiment of the present invention；

Fig. 5 is another optional composition schematic diagram of apparatus for processing multimedia data provided in an embodiment of the present invention.

Embodiment

The characteristics of in order to more fully hereinafter understand the embodiment of the present invention and technology chapter, below in conjunction with the accompanying drawings to this hair The realization of bright embodiment is described in detail, appended accompanying drawing purposes of discussion only for reference, is not used for limiting the present invention.

Before the present invention will be described in further detail, the noun and term being related in the embodiment of the present invention are said Bright, the noun and term being related in the embodiment of the present invention are applied to following explanation.

First matching degree threshold value, in identified multi-medium data, determining multi-medium data to be played；

Second matching degree threshold value, for determining play position in multi-medium data to be played；

Wherein, the second matching degree threshold value is more than the first matching degree threshold value；

Unit distance：Including but not limited to：Chronomere's distance is 1 second, word length unit is a byte；Such as： When the voice messaging is word, the position of the latter unit distance of the voice messaging end position is believed for the voice Cease the position of the latter byte of end position；When the voice messaging is melody, after the voice messaging end position The position of one unit distance is the position of 1 second after the voice messaging end position.

Fig. 1 is one optional schematic flow sheet of multimedia data processing method provided in an embodiment of the present invention, such as Fig. 1 institutes Show, the optional flow of multimedia data processing method one provided in an embodiment of the present invention comprises the following steps：

Step 101：Monitor voice messaging；

In practical application, before voice messaging is begun listening for, it can also be started according to the speech recognition listened to and referred to Order, start voice recognition processing pattern；

The reception mode of the speech recognition enabled instruction includes at least one：

Corresponding with the speech recognition enabled instruction physical button and/or virtual key are activated；

Occurs voice messaging corresponding with the speech recognition enabled instruction in the voice messaging monitored；

Detect default specific operation corresponding with the speech recognition enabled instruction；

Step 102：Multi-medium data corresponding with the voice messaging is determined based on the voice messaging monitored；

In practical application, the voice messaging can be matched with candidate's multi-medium data, phase is chosen according to matching degree The candidate's multi-medium data answered is multi-medium data corresponding with the voice messaging；

Specifically, the first matching degree threshold value can be preset, the time that matching degree exceeds the first matching degree threshold value is chosen Multi-medium data is selected, as multi-medium data corresponding with the voice messaging；When the first set matching degree threshold value not When suitable, multi-medium data can may not be screened, now, can also be arranged according to matching degree descending, is arranged in the descending At least one candidate's multi-medium data is selected in the result of row, as multi-medium data corresponding with the voice messaging；

Specifically, can by user select at least one candidate's multi-medium data, as with the voice messaging pair The multi-medium data answered, it can also automatically select according to candidate's multi-medium data described in matching degree highest at least one, make For multi-medium data corresponding with the voice messaging；

Step 103：In identified multi-medium data, multi-medium data to be played is determined；

In practical application, in identified multi-medium data, it can be determined to be played according to the selection instruction of reception Multi-medium data；Or

Specifically, can also be when the matching degree of identified multi-medium data exceeds the first matching degree threshold value, root According to preset strategy, multi-medium data to be played is determined in identified multi-medium data；

The preset strategy includes：

Select the most multi-medium data of history broadcasting time；Or

The most multi-medium data of barrage quantity corresponding to selection；Or

Select the multi-medium data being once played；

If there is multiple matching value highest multi-medium datas, then can be determined according to default strategy to be played Multi-medium data, wherein, default strategy can be to select the most multi-medium data of broadcasting time, corresponding barrage quantity most Multi-medium data that multi-medium data that more multi-medium data, user once played, user once thumbed up, shared etc..

For different user's use habits, first can also be exceeded when the matching degree of identified multi-medium data During with degree threshold value, according to preset strategy, concentrate, determine to be played more in multi-medium data corresponding to the multi-medium data Media data；

The preset strategy includes：

Selection stores date earliest multi-medium data；Or

The multi-medium data of selection storage date the latest；Or

Select the multi-medium data not being played；Specifically, when the plurality of multi-medium data belongs to matchmaker more than one Volumetric data set, for example, when belonging to a TV play, will can also collect the most forward multi-medium data such as number/issue, collection number/ Collect the most forward multimedias such as number/issue in multiple video informations that the multi-medium data most rearward such as issue, user do not watch Data etc. are defined as multi-medium data to be played.

Fig. 2 is one optional schematic flow sheet of multimedia data processing method provided in an embodiment of the present invention, such as Fig. 2 institutes Show, the optional flow of multimedia data processing method one provided in an embodiment of the present invention comprises the following steps：

Step 201：Monitor voice messaging；

Step 202：Multi-medium data corresponding with the voice messaging is determined based on the voice messaging monitored；

Specifically, the first matching degree threshold value can be preset, the time that matching degree exceeds the first matching degree threshold value is chosen Multi-medium data is selected, as multi-medium data corresponding with the voice messaging；

In certain extreme cases, the first matching degree threshold value agrees that unreasonable, technical scheme can be set Can also be arranged according to matching degree descending, select at least one candidate's multi-medium data, as with the voice messaging Corresponding multi-medium data；

Step 203：In identified multi-medium data, multi-medium data to be played is determined；

The preset strategy includes：

Select the most multi-medium data of history broadcasting time；Or

The most multi-medium data of barrage quantity corresponding to selection；Or

Select the multi-medium data being once played；

The preset strategy includes：

Selection stores date earliest multi-medium data；Or

The multi-medium data of selection storage date the latest；Or

Select the multi-medium data not being played；Specifically, when the plurality of multi-medium data belongs to matchmaker more than one Volumetric data set, for example, when belonging to a TV play, will can also collect the most forward multi-medium data such as number/issue, collection number/ Collect the most forward multimedias such as number/issue in multiple video informations that the multi-medium data most rearward such as issue, user do not watch Data etc. are defined as multi-medium data to be played；

Once the multi-medium data being played also includes：The multi-medium data that was once played by present terminal or The multi-medium data that user's account once played.

It should be noted that：Terminal device or server device corresponding with terminal device in storing multimedia data, It can be that the multi-medium data distributes content ID, can be the multi-medium data for belonging to a multi-medium data collection specifically Identical content ID is distributed, and different content ID is distributed not belong to the multi-medium data of a multi-medium data collection. Can also be that the multi-medium data distributes collection of drama label, to identify the multimedia after content ID is assigned with for multi-medium data Data are put in the collection numerical digit that affiliated multi-medium data is concentrated.

Step 204：In the multi-medium data to be played, play position is determined；

In practical application, the multi-medium data to be played can be matched with the phonetic feature of the voice messaging, The play position is determined according to the result of matching；

Specifically, when the phonetic feature matching result of the multi-medium data to be played and the voice messaging is less than in advance If the second matching degree threshold value when, perform one of following operation：

It should be noted that：Heretofore described second matching degree threshold value should be more than the first matching degree threshold Value, optionally, value is 0.4 before the first matching degree threshold value, and the second matching degree threshold value value is 0.9.

Step 205：The multi-medium data to be played is played based on the play position.

Fig. 3 is one optional schematic flow sheet of multimedia data processing method provided in an embodiment of the present invention, such as Fig. 3 institutes Show, the optional flow of multimedia data processing method one provided in an embodiment of the present invention comprises the following steps：

Step 301：Pass through button (including the button of entity and virtual button), the default sound on terminal device Sound, default gesture etc., triggering terminal carry out speech recognition.

Specifically, if terminal has screen, the interface of speech recognition in a triggered, can be shown；If terminal does not have There is screen, then can carry out voice message in a triggered.

Specifically, in one embodiment of the application, terminal device can set some entity key or virtual key to open The shortcut of speech recognition is opened, after pressing or clicking on the key, speech identifying function can be opened, now, terminal device can be with Microphone is opened, and monitors whether user inputs voice.

In another embodiment of the application, specific vocabulary can also be set on voice by terminal device in advance The mark of identification.When user opens a terminal the media player in equipment, the microphone on terminal device can also be opened simultaneously, And it is monitored.When monitoring that user have input specific vocabulary, terminal device can open speech identifying function.

In another embodiment of the application, terminal device will can also be operated specifically in advance, for example, desired guiding trajectory Slide etc., be set on the mark of speech recognition.User can open a terminal equipment, then, in terminal device Specific operation is performed on display screen, after terminal device, which monitors user, have input specific operation, voice knowledge can be opened Other function, now, terminal device can open microphone, and monitor whether user inputs voice.

In this application, when terminal device opens speech identifying function, the interface of speech recognition can be shown or sent Suggestion voice, to prompt user speech identification function to open.

It should be noted that in this application, if in media player is not switched in the case of open voice knowledge Other function, the then multimedia player that terminal device can subsequently call the media player of acquiescence or user clicks carry out voice Identification.

Step 302：When monitoring that user inputs voice, the voice of user's input is identified, it is determined that corresponding more Media data；

Specifically, the voice of user's input can include melody, word etc..After monitoring that user inputs voice, terminal The multi-medium data that equipment can store according to the multi-medium data being locally stored and/or corresponding server device is defeated to user The voice entered is identified, wherein, the corresponding server device can be the equipment that communication connection is established with terminal device.

It should be noted that when the voice that the multi-medium data according to storage inputs to user is identified, can be with base It is identified, can also be identified based on the combination of existing technology in existing technology, the present invention is without limitation.

In the present invention, of the voice and each multi-medium data of user's input can be obtained according to above-mentioned identification process With value.Terminal device can preset the first matching degree threshold value, then, the matching more than the first matching degree threshold value that will be obtained Multi-medium data corresponding to value is defined as multi-medium data corresponding with the voice of user's input.In the present invention, terminal device Multiple media informations corresponding with the voice of user's input can be determined.

It should be noted that above-mentioned first matching degree threshold value can be used for weighing whether appointed multimedia data is and uses The more similar multi-medium data of voice of family input, in the present invention, can be from multiple more by the first matching degree threshold value The multi-medium data more similar to the voice of user's input is filtered out in media data.

In one example, the first matching degree threshold value can be 0.6, appointed multimedia data and the voice of user's input Matching value can be 0.4, from this two values, the matching value of the voice that the appointed multimedia data and user input compared with Low, now, the appointed multimedia data can not be the multi-medium data more similar to the voice of user's input.

In another example, the first matching degree threshold value can be similarly 0.6, and another multi-medium data inputs with user The matching value of voice can be 0.7, from this two values, the matching for the voice that another multi-medium data and user input Value is higher, and now, another multi-medium data can be the multi-medium data more similar to the voice of user's input.

The multi-medium data more similar to user's voice inputted is being determined according to default first matching degree threshold value Afterwards, the multi-medium data of those determinations can be defined as to multi-medium data corresponding with the voice of user's input.

In the present invention, it is determined that with after the corresponding multi-medium data of voice of user's input, terminal device can should A little multi-medium datas show user, so that user can be selected those multi-medium datas (as clicked on selection, voice choosing Select).Wherein, in displaying, terminal device can show those multi-medium datas by matching value in the way of from high to low.

Step 303：After multi-medium data corresponding to the voice determination that user inputs, terminal device can determine to wait to broadcast The multi-medium data put；

Specifically, in the present invention it is possible to the multi-medium data of determination is showed into user, and monitors user at one section Whether interior those multi-medium datas to be selected, if monitoring, user has carried out corresponding selection within a period of time, The multi-medium data that user's selection can then be determined is multi-medium data to be played, if not monitoring user within a period of time Corresponding selection is carried out, then it is multi-medium data to be played that can determine multi-medium data corresponding to highest matching value.

In the present invention, the multi-medium data of determination can not also be showed to user, but can directly determine highest Multi-medium data corresponding to matching value is multi-medium data to be played.

In the present invention, if there is multiple matching value highest multi-medium datas, then can come according to default strategy Multi-medium data to be played is determined, wherein, default strategy can be the most multi-medium data of selected broadcasting time, correspondingly The most multi-medium data of barrage quantity, the multi-medium data that once played of user, the user multimedia that once thumbed up, shared Data etc..

It should be noted that when the plurality of multi-medium data belongs to a multi-medium data collection, for example, belonging to one During TV play, will can also collect the multi-medium data most rearward such as the most forward multi-medium data such as number/issue, collection number/issue, Collect most forward multi-medium datas such as number/issue etc. in multiple video informations that user does not watch and be defined as multimedia to be played Data.

In the present invention, terminal device or server device corresponding with terminal device, can in storing multimedia data Think that the multi-medium data distributes content ID, can be the multi-medium data point for belonging to a multi-medium data collection specifically Different content ID is distributed with identical content ID, and not belong to the multi-medium data of a multi-medium data collection. Can also be that the multi-medium data distributes collection of drama label, to identify the multimedia number after being assigned with content ID for multi-medium data Put according to the collection numerical digit in affiliated multi-medium data concentration.

Wherein, content ID, collection of drama label can be numeral, character etc., and the present invention is without limitation.

In the present invention, it is determined that after multi-medium data corresponding with the voice of user's input, terminal device can obtain Content ID and collection of drama label in each multi-medium data determined, then, whether have in the content ID for judging to obtain identical Content ID, if so, multi-medium data corresponding to can then determining belongs to same multi-medium data collection, then, terminal is set It is standby to determine that multi-medium data is put in the collection numerical digit that described multi-medium data is concentrated according to the collection of drama label of acquisition, and can To determine multi-medium data to be played according to collection number.

Step 304：, can be according to the media information and the voice of user's input after media information to be played is determined Matching value determine the play position of the media information, then, the media information is played at the play position of determination；

Specifically, in the present invention it is possible to according to the voice that user inputs and of the multi-medium data to be played determined The play position of multi-medium data to be played is determined with value.Specifically, terminal device can according to user input voice with The matching value of multi-medium data to be played and the magnitude relationship of default second matching degree threshold value, to determine the multi-medium data Play position.Wherein, the second matching degree threshold value can be more than the first matching threshold.

It should be noted that in the present invention, the second matching degree threshold value can be used for weighing multimedia number to be played According to whether being the closely similar multi-medium data of voice with user's input.

In one example, the second matching degree threshold value can be 0.9, multi-medium data to be played and user's input The matching value of voice can be 0.8, from this two values, of the multi-medium data to be played and the voice of user's input It is less than default second matching degree threshold value with value, now, the multi-medium data to be played can not be and user's input The closely similar multi-medium data of voice.

In another example, the second matching degree threshold value can be similarly 0.9, multi-medium data to be played and user The matching value of the voice of input can be 0.96, from this two values, the multi-medium data to be played and user's input The matching value of voice is more than default second matching degree threshold value, and now, the multi-medium data to be played can be and user The closely similar multi-medium data of the voice of input.

In the present invention, if the matching value that multi-medium data to be played inputs voice with user is preset less than terminal device The second matching degree threshold value, then can illustrate the more familiar multi-medium data of user, at this point it is possible to by user start input language Corresponding position is defined as the play position of the media information during sound, or the original position of the multi-medium data is defined as into the matchmaker The play position of body information；If the matching value that multi-medium data to be played inputs voice with user is default more than terminal device Second matching degree threshold value, then it can illustrate the very more familiar multi-medium data of user, at this point it is possible to whether still detect user So in input voice, if user, still in input voice, position corresponding to the voice that user will be inputted is defined as the media The play position (realizing the effect for following the speech play of the user's input media information) of information；If user has terminated voice Input, then corresponding position is defined as the play position of the media information when terminating to input voice by user.

For example, the matching value that multi-medium data to be played inputs voice with user can be 0.8, default second matching journey It can be 0.9 to spend threshold value, then can determine that the matching value of multi-medium data to be played and user's input voice is less than terminal device Default second matching degree threshold value, it can be assumed that multi-medium data to be played is that a head is sung, corresponding to the voice of user's input It is the 11st to the 13rd of the song, then it is determined that the matching value of multi-medium data to be played and user's input voice is less than After default second matching degree threshold value, it may be determined that the 11st of head songs, or the play position that the 1st is the song. Wherein, when it is determined that the 11st or the 1st of head songs is play position, can be set by User Defined, or Default setting.

For another example the matching value of multi-medium data to be played and user's input voice can be 0.97, default second 0.9 can be similarly with degree threshold value, then can determine that the matching value of multi-medium data to be played and user's input voice is more than The default second matching degree threshold value of terminal device, it can be assumed that multi-medium data to be played is a first song, what user had inputted It is the 11st to the 13rd of the song corresponding to voice, and the user is still inputting voice, then it is determined that more matchmakers to be played Volume data inputs the matching value of voice more than after default second matching degree threshold value with user, it may be determined that the 14th of head songs Sentence is the play position of the song.

In the present invention, can be from the play position of determination it is determined that after the play position of multi-medium data to be played Place plays the multi-medium data to be played.

It should be noted that in the present invention, the multi-medium data to be played is played at the play position from determination When, the operation of the user can be monitored, if user within a period of time, for example, in 10 seconds, performs one and specified Operation, then the operation that can be specified according to this, to switch play position, and play the multimedia at the play position newly switched Data.

In the present invention, if existed in media information to be played multiple higher with the voice match value of user's input Part, then part to be played can be first determined, then, then play position be determined from the part to be played of determination.

Wherein, it is determined that during part to be played, the higher part of the most forward matching degree in position can be defined as waiting to broadcast Put part, can also part broadcasting time is more, that the corresponding a fairly large number of matching degree of barrage is higher be defined as waiting to broadcast Put part.

In one example, user can be with one section of melody of phonetic entry, and the climax parts of this section of melody and a first song have There is higher matching degree, due to there are multiple climax parts in a first song, therefore, multiple languages with user's input in head songs be present The higher part of sound matching degree, now, terminal device can determine part to be played from the higher part of the plurality of matching degree.

Can be assumed that the climax parts one of head songs share 8, occur 3 times in head songs, 3 times the respectively the 11st to 17th, the 23rd to the 30th and the 42nd to the 48th.Then when terminal device is by the higher portion of the most forward matching degree in position When point being defined as part to be played, part to be played can be defined as by the 11st to the 17th.

After part to be played is determined, terminal device can be according to the voice that user inputs and more matchmakers to be played of determination The matching value of volume data determines the play position of multi-medium data to be played from the part to be played of determination.

In one example, multi-medium data to be played can be a song, and what is determined from the song is to be played Part can be the 11st to the 17th of the song, then if the voice and the matching value of the song of user's input are less than Default second matching degree threshold value, then the 11st corresponding position of the song can be defined as to the broadcasting position of the song Put；, can be with if the matching value of voice and the song of user's input is more than or equal to default second matching degree threshold value User is detected whether still in input voice, if user inputs, and closes to an end and inputs the 13rd of the song, then terminal The 14th of the voice that equipment will will can input, the i.e. song, corresponding position is defined as the play position of the song, And the song is played from the 14th of the song；If user has terminated to input, and after 12 of the song are inputted Terminate input, then can be by the 13rd of the song, corresponding position is defined as the play position of the song.

, can be with it should be noted that determining play position from part to be played, and after playing the part to be played Continue to play the media information according to the selection of user, directly play the higher content of next matching degree of the media information Deng.

Fig. 4 is one optional composition schematic diagram of apparatus for processing multimedia data provided in an embodiment of the present invention, such as Fig. 4 institutes Show, the optional composition of apparatus for processing multimedia data one provided in an embodiment of the present invention includes：

Module 401 is monitored, for monitoring voice messaging；

In practical application, the monitoring module 401, it can be used for monitoring speech recognition enabled instruction；

Described device also includes：

Sound identification module, for starting voice recognition processing pattern；

The monitoring module 401, is additionally operable to monitor physical button corresponding with the speech recognition enabled instruction and/or void Intend button；

The monitoring module 401, it is additionally operable to default specific operation corresponding with the speech recognition enabled instruction

The sound identification module, for identifying voice messaging corresponding with the speech recognition enabled instruction

Multi-medium data determining module 402, for fixed multi-medium data corresponding with the voice messaging；

In practical application, the multi-medium data determining module 402, for by the voice messaging and candidate's multimedia number According to matching；

The multi-medium data determining module 402, it is for choosing corresponding candidate's multi-medium data according to matching degree Multi-medium data corresponding with the voice messaging；

Specifically, the multi-medium data determining module 402, exceeds the first matching degree threshold value for choosing matching degree Candidate's multi-medium data, as multi-medium data corresponding with the voice messaging；

The multi-medium data determining module 402, for being arranged according to matching degree descending, select at least one time Multi-medium data is selected, as multi-medium data corresponding with the voice messaging；

Multi-medium data determining module 403 to be played, in identified multi-medium data, determining to be played Multi-medium data；

Specifically, the multi-medium data determining module 403 to be played, for the selection instruction according to reception, in institute In the multi-medium data of determination, multi-medium data to be played is determined；

The multi-medium data determining module 403 to be played, the matching journey for the multi-medium data determined by Degree, determines multi-medium data to be played；

When the matching degree of identified multi-medium data exceeds the first matching degree threshold value, more matchmakers to be played Volume data determining module 403, for working as according to preset strategy, multimedia to be played is determined in identified multi-medium data Data；

The preset strategy includes：

Select the most multi-medium data of history broadcasting time；Or

The most multi-medium data of barrage quantity corresponding to selection；Or

Select the multi-medium data being once played., then can be with if there is multiple matching value highest multi-medium datas Multi-medium data to be played is determined according to default strategy, wherein, default strategy can be that selected broadcasting time is most Multi-medium data, the most multi-medium data of corresponding barrage quantity, the user multi-medium data, the user Zeng Dian that once played Multi-medium data praise, shared etc..

For different user's use habits, first can also be exceeded when the matching degree of identified multi-medium data During with degree threshold value, the multi-medium data determining module 403 to be played, it can be also used for according to preset strategy, described Multi-medium data corresponding to multi-medium data is concentrated, and determines multi-medium data to be played；

The preset strategy includes：

Selection stores date earliest multi-medium data；Or

The multi-medium data of selection storage date the latest；Or

Select the multi-medium data not being played.Specifically, when the plurality of multi-medium data belongs to matchmaker more than one Volumetric data set, for example, when belonging to a TV play, will can also collect the most forward multi-medium data such as number/issue, collection number/ Collect the most forward multimedias such as number/issue in multiple video informations that the multi-medium data most rearward such as issue, user do not watch Data etc. are defined as multi-medium data to be played.

Play position determining module 404, in the multi-medium data to be played, determining play position；

Specifically, the play position determining module 404, for by the multi-medium data to be played and the voice The phonetic feature matching of information；

The play position determining module 404, for determining the play position according to the result of matching；

Wherein, the play position determining module 404, for that corresponding with the voice messaging starting position described will treat The multimedia data location of broadcasting is defined as play position；

The play position determining module 404, for the starting position of the multi-medium data to be played to be defined as Play position.

The play position determining module 404, for will be corresponding with the voice messaging end position described to be played Multimedia data location be defined as play position；

The play position determining module 404, for will be with the latter unit distance of the voice messaging end position The multimedia data location to be played of position correspondence be defined as play position

Playing module 405, for playing the multi-medium data to be played from the play position.

Fig. 5 is another optional composition schematic diagram of apparatus for processing multimedia data provided in an embodiment of the present invention.Such as figure Shown, apparatus for processing multimedia data 500 can carry to include mobile phone, computer, number with audio monitoring function Word broadcast terminal, information transceiving equipment, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.. Apparatus for processing multimedia data 500 shown in Fig. 5 includes：At least one processor 501, memory 502, at least one network connect Mouth 504 and user interface 503.Each component in apparatus for processing multimedia data 500 is coupling in one by bus system 505 Rise.It is understood that bus system 505 is used to realize the connection communication between these components.Bus system 505, which is removed, includes data/address bus Outside, in addition to power bus, controlling bus and status signal bus in addition.But for the sake of clear explanation, in Fig. 4 will be various Bus is all designated as bus system 505.

Wherein, user interface 503 can include display, keyboard, mouse, trace ball, click wheel, button, button, sense of touch Plate or touch-screen etc..

It is appreciated that memory 502 can be volatile memory or nonvolatile memory, may also comprise volatibility and Both nonvolatile memories.Wherein, nonvolatile memory can be read-only storage (ROM, Read Only Memory), Programmable read only memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM, Erasable Programmable Read-Only Memory), Electrically Erasable Read Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access store Device (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface are deposited Reservoir, CD or read-only optical disc (CD-ROM, Compact Disc Read-Only Memory)；Magnetic surface storage can be Magnetic disk storage or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms can use, such as Static RAM (SRAM, Static Random Access Memory), synchronous static RAM (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random Access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links Dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus Random access memory (DRRAM, Direct Rambus Random Access Memory).Description of the embodiment of the present invention is deposited Reservoir 502 is intended to include these memories with any other suitable type.

Memory 502 in the embodiment of the present invention is used to store various types of data to support multimedia-data procession to fill Put 500 operation.The example of these data includes：For any computer operated on apparatus for processing multimedia data 500 Program, such as operating system 5021 and application program 5022；Contact data；Telephone book data；Message；Picture；Video etc..Its In, operating system 5021 includes various system programs, such as ccf layer, core library layer, driving layer etc., for realizing various bases Business and the hardware based task of processing.Application program 5022 can include various application programs, such as media player (Media Player), browser (Browser) etc., for realizing various applied business.Realize multimedia of the embodiment of the present invention The program of processing method may be embodied in application program 5022.

User interface 503 can realize monitoring phonetic function, specifically, the voice messaging that the user interface 503 is monitored It can be transmitted by the network interface 504.

The method that the embodiments of the present invention disclose can apply in processor 501, or be realized by processor 501. Processor 501 is probably a kind of IC chip, has the disposal ability of signal.In implementation process, the above method it is each Step can be completed by the integrated logic circuit of the hardware in processor 501 or the instruction of software form.Above-mentioned processing Device 501 can be general processor, digital signal processor (DSP, Digital Signal Processor), or other can Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..Processor 501 can be realized or held Disclosed each method, step and logic diagram in the row embodiment of the present invention.General processor can be microprocessor or appoint What conventional processor etc..The step of method with reference to disclosed in the embodiment of the present invention, it can be embodied directly at hardware decoding Reason device performs completion, or performs completion with the hardware in decoding processor and software module combination.Software module can be located at In storage medium, the storage medium is located at memory 502, and processor 501 reads the information in memory 502, with reference to its hardware The step of completing preceding method.

In the exemplary embodiment, apparatus for processing multimedia data 500 can be by the integrated electricity of one or more application specifics Road (ASIC, Application Specific Integrated Circuit), DSP, PLD (PLD, Programmable Logic Device), CPLD (CPLD, Complex Programmable Logic Device), field programmable gate array (FPGA, Field-Programmable Gate Array), general processor, control Device, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronics member Part is realized, for performing the multi-media processing method.

In the exemplary embodiment, the embodiment of the present invention additionally provides a kind of computer-readable recording medium, such as including The memory 502 of computer program, above computer program can be performed by the processor 501 of apparatus for processing multimedia data 500, To complete step described in preceding method.Computer-readable recording medium can be FRAM, ROM, PROM, EPROM, EEPROM, The memories such as Flash Memory, magnetic surface storage, CD or CD-ROM；Can also be include one of above-mentioned memory or The various equipment of any combination, such as mobile phone, computer, tablet device, personal digital assistant.

The embodiment of the present invention additionally provides a kind of computer-readable recording medium, is stored thereon with computer program, the meter When calculation machine program is run by processor, perform：

Monitor voice messaging；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program production Product.Therefore, the embodiment of the present invention can use the embodiment in terms of hardware embodiment, software implementation or combination software and hardware Form.Moreover, the embodiment of the present invention can use the calculating for wherein including computer usable program code in one or more The form for the computer program product implemented in machine usable storage medium (including magnetic disk storage and optical memory etc.).

The embodiment of the present invention is with reference to method according to embodiments of the present invention, equipment (system) and computer program product Flow chart and/or block diagram describe.It should be understood that can be by computer program instructions implementation process figure and/or block diagram Each flow and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These calculating can be provided Processing of the machine programmed instruction to all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices Device is to produce a machine so that the instruction for passing through computer or the computing device of other programmable data processing devices produces For realizing the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames Device.

These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, it is all All any modification, equivalent and improvement made within the spirit and principles in the present invention etc., it should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of multimedia data processing method, it is characterised in that methods described includes：

Monitor voice messaging；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.

2. according to the method for claim 1, it is characterised in that described to be determined and institute's predicate based on the voice messaging monitored Multi-medium data corresponding to message breath, including：

The voice messaging is matched with candidate's multi-medium data, determined according to matching degree from candidate's multi-medium data Multi-medium data corresponding with the voice messaging.

3. according to the method for claim 2, it is characterised in that described that corresponding candidate's multimedia is chosen according to matching degree Data are multi-medium data corresponding with the voice messaging, including：

Candidate's multi-medium data that matching degree exceeds the first matching degree threshold value is chosen, as corresponding with the voice messaging Multi-medium data.

4. according to the method for claim 2, it is characterised in that it is described in identified multi-medium data, it is determined that waiting to broadcast The multi-medium data put, including：

According to the selection instruction of reception, in identified multi-medium data, candidate's multimedia number of the instruction instruction is determined According to for the multi-medium data to be played；Or

The matching degree of the multi-medium data of the determination is the multi-medium data and default first matching degree threshold value Matching result.

5. according to the method for claim 2, it is characterised in that it is described in identified multi-medium data, it is determined that waiting to broadcast The multi-medium data put, including：

When the matching degree of identified multi-medium data exceeds default first matching degree threshold value, according to preset strategy, Multi-medium data to be played is determined in identified multi-medium data；

The preset strategy includes at least one of：

Select the most multi-medium data of history broadcasting time；

Selection plays the most multi-medium data of barrage quantity；

Select the multi-medium data being once played；

The matching degree is the multi-medium data and the matching result of the default first matching degree threshold value.

6. according to the method for claim 1, it is characterised in that it is described in identified multi-medium data, it is determined that waiting to broadcast The multi-medium data put, including：

When the matching degree of identified multi-medium data exceeds the first matching degree threshold value, according to preset strategy, described Multi-medium data corresponding to multi-medium data is concentrated, and determines multi-medium data to be played；

The preset strategy includes at least one of：

Selection stores date earliest multi-medium data；

The multi-medium data of selection storage date the latest；

Select the multi-medium data not being played.

7. according to the method for claim 1, it is characterised in that it is described in the multi-medium data to be played, it is determined that Play position, including：

The multi-medium data to be played is matched with the phonetic feature of the voice messaging, institute is determined according to the result of matching State play position.

8. according to the method for claim 7, it is characterised in that described to believe multi-medium data to be played and the voice The phonetic feature matching of breath, the play position is determined according to the result of matching, including：

When the phonetic feature matching result of the multi-medium data to be played and the voice messaging is less than default second During with degree threshold value, one of following operation is performed：

The multimedia data location to be played corresponding with the voice messaging starting position is defined as play position；

9. according to the method for claim 7, it is characterised in that described to believe multi-medium data to be played and the voice The phonetic feature matching of breath, the play position is determined according to the result of matching, including：

When the phonetic feature matching result of the multi-medium data to be played and the voice messaging is more than or equal to default the During two matching degree threshold values, one of following operation is performed：

The multimedia data location to be played corresponding with the voice messaging end position is defined as play position；

By the multimedia number to be played of the position correspondence of the latter unit distance with the voice messaging end position It is defined as play position according to position.

10. a kind of apparatus for processing multimedia data, it is characterised in that described device includes：

Module is monitored, for monitoring voice messaging；

Multi-medium data determining module to be played, in identified multi-medium data, determining multimedia to be played Data；

11. device according to claim 10, it is characterised in that

The multi-medium data determining module, for being determined according to matching degree from candidate's multi-medium data and institute's predicate Multi-medium data corresponding to message breath.

12. device according to claim 11, it is characterised in that

The multi-medium data determining module, candidate's multimedia number of the first matching degree threshold value is exceeded for choosing matching degree According to as multi-medium data corresponding with the voice messaging.

13. device according to claim 11, it is characterised in that

The multi-medium data determining module to be played, for the selection instruction according to reception, in identified multimedia number In, the candidate's multi-medium data for determining the instruction instruction is the multi-medium data to be played；

The multi-medium data determining module to be played, for the matching degree of the multi-medium data determined by, it is determined that Multi-medium data to be played；

The matching degree is the multi-medium data and the matching result of default first matching degree threshold value.

14. device according to claim 11, it is characterised in that

The multi-medium data determining module to be played, for working as according to preset strategy, in identified multi-medium data Determine multi-medium data to be played；

The preset strategy includes at least one of：

Select the most multi-medium data of history broadcasting time；

Selection plays the most multi-medium data of barrage quantity；

Select the multi-medium data being once played；

15. device according to claim 10, it is characterised in that

The multi-medium data determining module to be played, for according to preset strategy, more corresponding to the multi-medium data Media data is concentrated, and determines multi-medium data to be played；

The preset strategy includes at least one of：

Selection stores date earliest multi-medium data；

The multi-medium data of selection storage date the latest；

Select the multi-medium data not being played.

16. device according to claim 10, it is characterised in that

The play position determining module, for by the phonetic feature of the multi-medium data to be played and the voice messaging Matching；

17. device according to claim 16, it is characterised in that

The play position determining module, for by the multimedia to be played corresponding with the voice messaging starting position Data Position is defined as play position；

The play position determining module, for being defined as the starting position of the multi-medium data to be played to play position Put.

18. device according to claim 16, it is characterised in that

The play position determining module, for by the multimedia to be played corresponding with the voice messaging end position Data Position is defined as play position；

The play position determining module, for by the position pair of the latter unit distance with the voice messaging end position The multimedia data location to be played answered is defined as play position.

19. a kind of apparatus for processing multimedia data, it is characterised in that the apparatus for processing multimedia data includes：

Monitor voice messaging；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.

20. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program quilt Computing device：

Monitor voice messaging；

In the multi-medium data to be played, play position is determined；

The multi-medium data to be played is played based on the play position.