CN106897304B

CN106897304B - Multimedia data processing method and device

Info

Publication number: CN106897304B
Application number: CN201510959105.4A
Authority: CN
Inventors: 邢学博
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2015-12-18
Filing date: 2015-12-18
Publication date: 2021-01-29
Anticipated expiration: 2035-12-18
Also published as: CN106897304A

Abstract

The embodiment of the invention provides a method and a device for processing multimedia data, wherein the method comprises the following steps: determining multimedia data to be identified; searching one or more frames of media characteristic images representing the multimedia data; and when the multimedia data is triggered, displaying the one or more frames of media image data. The embodiment of the invention avoids that the user screens out the interested part by watching the whole multimedia data again, greatly reduces the time consumption, reduces the waste of bandwidth resources and improves the efficiency.

Description

Multimedia data processing method and device

Technical Field

The present invention relates to the technical field of multimedia processing, and in particular, to a method and an apparatus for processing multimedia data.

Background

With the rapid development of the internet, the amount of information on the internet, which contains a large amount of video data such as news videos, art programs, dramas, movies, and the like, has increased dramatically.

The user's knowledge of the video data is mostly derived from the profile of the entire video data, and the user may choose to watch or not watch based on the profile of the video data.

However, the time of video data is generally long, such as a drama episode as long as 40 minutes, a drama episode as long as several tens of episodes, and a movie episode as long as 2 or more hours.

The amount of information contained in these video data with long duration is large, but not all the video data are interested by the user, and if the user needs to screen out the interested part, the user needs to browse the whole video data, which consumes a lot of time, wastes many bandwidth resources, and has low efficiency.

Disclosure of Invention

In view of the above problems, the present invention has been made to provide a multimedia data processing method and a corresponding multimedia data processing apparatus that overcome or at least partially solve the above problems.

According to an aspect of the present invention, there is provided a method for processing multimedia data, including:

determining multimedia data to be identified;

searching one or more frames of media characteristic images representing the multimedia data;

and when the multimedia data is triggered, displaying the one or more frames of media image data.

Optionally, the step of determining multimedia data to be identified includes:

detecting a target time period set for multimedia data;

and determining the multimedia data in the target time period as the multimedia data to be identified.

Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:

and when the multimedia data is video data, extracting first frame video data in the target time period and/or one frame video data in the target time period after a preset time is passed as a media characteristic image.

when the multimedia data are video data, carrying out face detection on the video data in the target time period;

and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.

when the multimedia data are video data, acquiring one or more frames of image data obtained based on the screenshot;

judging whether the image data belongs to the video data in the target time period; and if so, adopting the image data as a media characteristic image.

Optionally, the step of determining whether the image data belongs to the video data in the target time period includes:

reading the video identification and the time information carried by the image data;

judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;

when the time information is within the target time period, determining that the image data belongs to video data within the target time period.

when the multimedia data are audio data, matching the audio data in the target time period with a preset audio model;

when the matching is successful, extracting a style label corresponding to the audio model;

and searching image data matched with the style label as a media characteristic image.

when the multimedia data are audio data, searching lyric data of the audio data in the target time period;

generating text abstract information by adopting the lyric data;

and searching image data matched with the text abstract information as a media characteristic image.

when the multimedia data are audio data, inquiring video data corresponding to the audio data;

one or more frames of image data are extracted from the video data as a media feature image.

Optionally, when the multimedia data is triggered, the step of presenting the one or more frames of media image data includes:

detecting suspension operation on a playing progress bar corresponding to the target time period when the multimedia data are played;

and displaying the one or more frames of media image data according to the hovering operation.

According to another aspect of the present invention, there is provided a multimedia data processing apparatus including:

the multimedia data determining module is suitable for determining multimedia data to be identified;

the media characteristic image searching module is suitable for searching one or more frames of media characteristic images representing the multimedia data;

and the media characteristic image display module is suitable for displaying the one or more frames of media image data when the multimedia data is triggered.

Optionally, the multimedia data determination module is further adapted to:

detecting a target time period set for multimedia data;

Optionally, the media feature image lookup module is further adapted to:

generating text abstract information by adopting the lyric data;

Optionally, the media feature image lookup module is further adapted to:

Optionally, the media feature image presentation module is further adapted to:

According to the embodiment of the invention, the media characteristic image is mined for the multimedia data, and the multimedia data is displayed when being triggered, so that the situation that a user screens out an interested part by watching the whole multimedia data again is avoided, the time consumption is greatly reduced, the waste of bandwidth resources is reduced, and the efficiency is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart illustrating steps of an embodiment of a method for processing multimedia data according to an embodiment of the present invention; and

fig. 2 is a block diagram illustrating an embodiment of a multimedia data processing apparatus according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for processing multimedia data according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 101, determining multimedia data to be identified;

in a specific implementation, in a video website or other scenes, multimedia data may be stored in a database in advance.

When applied, can be extracted from the database to identify the media characteristic image of the multimedia data.

In an alternative embodiment of the present invention, step 101 may comprise the following sub-steps:

a substep S11 of detecting a target time period set for the multimedia data;

and a substep S12 of determining the multimedia data in the target time period as the multimedia data to be recognized.

In a specific implementation, when a user requests to play certain video data from an online video website, the preference of the user for the video data can be expressed by the behavior data.

In the embodiment of the invention, the behavior data of the user aiming at certain video data can be collected in the modes of log information of an online video website and the like so as to dig out valuable video clips.

In an alternative example of the embodiment of the present invention, the sub-step S11 may include the following sub-steps:

substep S111, when a first marking operation for the multimedia data is detected, recording a starting time point corresponding to the first marking operation;

a substep S112, when a second marking operation for the multimedia data is detected, recording a termination time point corresponding to the second marking operation;

and a substep S113, composing the starting time point and the ending time point into a target time period.

In the embodiment of the present invention, the first marking operation and the second marking operation may be marking operations that are performed subjectively consciously by the user.

For example, the online video website provides an AB repeat key, the user triggers the a key to be equivalent to triggering the first marking operation, the user triggers the B key to be equivalent to triggering the second marking operation, and the starting time point of the a key and the ending time point of the B key form a target time period.

The first marking operation and the second marking operation may be marking operations that the user does without subjective awareness.

For example, when playing a certain piece of video data, if the user is not interested in the current piece of video data, the user generally adjusts the playing progress by dragging the playing progress bar, clicking the right direction "→" of the physical key, clicking the shortcut control, and so on, so as to skip the piece of video data.

Therefore, the ending operation of the user adjusting the play progress may be regarded as the first marking operation, the starting operation of the user adjusting the play progress may be regarded as the second marking operation, and the start time of the adjusted ending operation and the end time point of the adjusted starting operation are made up into the target time period.

Step 102, searching one or more frames of media characteristic images representing the multimedia data;

in the embodiment of the invention, for the multimedia data in the target time period, which can be regarded as valuable multimedia data, the media characteristic image thereof, namely the image characterizing the multimedia data in the target time period, can be mined.

In a specific implementation, since the multimedia data comprises video data and audio data, and the characteristics of the video data and the audio data are different, the media feature images can be mined in two cases.

Firstly, video data;

in a media feature image, due to the video data within the target time period, when the target time period is set, it will generally be of no concern to the user, starting at the part of interest, possibly the start time point just set, or at a time slightly later than the start time point, e.g. 1 second, i.e. a time slightly later than the start time point.

Therefore, when the multimedia data is video data, a first frame of video data within the target period of time and/or a frame of video data within the target period of time after a preset time (e.g., 1 second) has elapsed may be extracted as the media feature image.

In another media characteristic image, in video data of a television show, a movie, or the like, if more characters are included, more episodes are represented, and the characters are more likely to be preferred by the user.

Therefore, when the multimedia data is video data, the face detection is carried out on the video data in the target time period;

For example, when the number of faces exceeds a certain number, such as 5, the images can be used as media feature images.

In another media feature image, due to some wonderful and popular video clips which are often favorite video clips of the user, the user tends to share screenshots more.

Therefore, when the multimedia data is video data, one or more frames of image data obtained based on the screenshots can be obtained through forums, microblogs, news and other approaches;

judging whether the image data belongs to video data in a target time period; and if so, adopting the image data as the media characteristic image.

Further, when determining the attribution of the image data, the video identifier and the time information carried by the image data may be read.

when the time information is within the target time period, it is determined that the image data belongs to the video data within the target time period.

Secondly, audio data;

in a media feature image, audio models can be generated in advance for different styles of audio data, such as music styles of jazz, classical music, pop music, and the like, and mood styles of joy, sadness, pleasure, and the like.

Therefore, when the multimedia data is audio data, the audio data in the target time period can be adopted to be matched with a preset audio model; and when the matching is successful, extracting the style label corresponding to the audio model.

And searching image data matched with the style label from a preset database or a server of a third party to be used as the media characteristic image.

In another media characteristic image, when the multimedia data is audio data, searching lyric data of the audio data in a target time period from a preset database or a third-party server;

and generating text abstract information by adopting the lyric data through a text abstract algorithm (such as TextTeaser) and the like.

And searching image data matched with the text abstract information from a preset database or a server of a third party to be used as a media characteristic image.

In another media feature image, when the multimedia data is audio data, video data corresponding to the audio data, such as MV/concert video corresponding to the audio data, tv/movie with the audio data as a score, etc., may be queried.

Of course, the above-mentioned identification method of the media characteristic image is only an example, and when implementing the embodiment of the present invention, the identification method of the media characteristic image may be set according to actual situations, which is not limited in the embodiment of the present invention. In addition, besides the above-mentioned identification method of the media characteristic image, a person skilled in the art may also adopt other identification methods of the media characteristic image according to actual needs, and the embodiment of the present invention is not limited to this.

And 103, when the multimedia data is triggered, displaying the one or more frames of media image data.

In a specific implementation, when multimedia data is played, a hover operation hover on a playing progress bar corresponding to a target time period is detected, and one or more frames of media image data are displayed according to the hover operation hover.

For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 2, a block diagram of an embodiment of a multimedia data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:

a multimedia data determination module 201 adapted to determine multimedia data to be identified;

a media characteristic image searching module 202, adapted to search one or more frames of media characteristic images representing the multimedia data;

the media characteristic image presentation module 203 is adapted to present the one or more frames of media image data when the multimedia data is triggered.

In an optional embodiment of the present invention, the multimedia data determination module 201 may be further adapted to:

detecting a target time period set for multimedia data;

In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:

generating text abstract information by adopting the lyric data;

In an optional embodiment of the present invention, the media feature image presentation module 203 may be further adapted to:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a device for processing multimedia data according to an embodiment of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of processing multimedia data, comprising:

determining multimedia data to be identified;

when the multimedia data is triggered, displaying the one or more frames of media image data;

the step of determining multimedia data to be identified comprises:

detecting a target time period set for multimedia data;

determining the multimedia data in the target time period as the multimedia data to be identified;

the step of searching for one or more frames of media characteristic images representing the multimedia data comprises:

generating text abstract information by adopting the lyric data;

2. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:

3. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:

4. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:

5. The method of claim 4, wherein the step of determining whether the image data belongs to video data within the target time period comprises:

6. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:

7. The method of any one of claims 1-6, wherein the step of finding one or more media feature images that characterize the multimedia data comprises:

8. The method of any one of claims 1-6, wherein the step of presenting the one or more frames of media image data when the multimedia data is triggered comprises:

9. A device for processing multimedia data, comprising:

the media characteristic image display module is suitable for displaying the one or more frames of media image data when the multimedia data is triggered;

the multimedia data determination module is further adapted to:

detecting a target time period set for multimedia data;

the media feature image lookup module is further adapted to:

generating text abstract information by adopting the lyric data;

10. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:

11. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:

12. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:

13. The apparatus of claim 12, wherein the media feature image lookup module is further adapted to:

14. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:

15. The apparatus of any of claims 9-14, wherein the media feature image lookup module is further adapted to:

16. The apparatus of any of claims 9-14, wherein the media feature image presentation module is further adapted to: