CN106897304B - Multimedia data processing method and device - Google Patents

Multimedia data processing method and device Download PDF

Info

Publication number
CN106897304B
CN106897304B CN201510959105.4A CN201510959105A CN106897304B CN 106897304 B CN106897304 B CN 106897304B CN 201510959105 A CN201510959105 A CN 201510959105A CN 106897304 B CN106897304 B CN 106897304B
Authority
CN
China
Prior art keywords
data
image
multimedia data
time period
media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510959105.4A
Other languages
Chinese (zh)
Other versions
CN106897304A (en
Inventor
邢学博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510959105.4A priority Critical patent/CN106897304B/en
Publication of CN106897304A publication Critical patent/CN106897304A/en
Application granted granted Critical
Publication of CN106897304B publication Critical patent/CN106897304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the invention provides a method and a device for processing multimedia data, wherein the method comprises the following steps: determining multimedia data to be identified; searching one or more frames of media characteristic images representing the multimedia data; and when the multimedia data is triggered, displaying the one or more frames of media image data. The embodiment of the invention avoids that the user screens out the interested part by watching the whole multimedia data again, greatly reduces the time consumption, reduces the waste of bandwidth resources and improves the efficiency.

Description

Multimedia data processing method and device
Technical Field
The present invention relates to the technical field of multimedia processing, and in particular, to a method and an apparatus for processing multimedia data.
Background
With the rapid development of the internet, the amount of information on the internet, which contains a large amount of video data such as news videos, art programs, dramas, movies, and the like, has increased dramatically.
The user's knowledge of the video data is mostly derived from the profile of the entire video data, and the user may choose to watch or not watch based on the profile of the video data.
However, the time of video data is generally long, such as a drama episode as long as 40 minutes, a drama episode as long as several tens of episodes, and a movie episode as long as 2 or more hours.
The amount of information contained in these video data with long duration is large, but not all the video data are interested by the user, and if the user needs to screen out the interested part, the user needs to browse the whole video data, which consumes a lot of time, wastes many bandwidth resources, and has low efficiency.
Disclosure of Invention
In view of the above problems, the present invention has been made to provide a multimedia data processing method and a corresponding multimedia data processing apparatus that overcome or at least partially solve the above problems.
According to an aspect of the present invention, there is provided a method for processing multimedia data, including:
determining multimedia data to be identified;
searching one or more frames of media characteristic images representing the multimedia data;
and when the multimedia data is triggered, displaying the one or more frames of media image data.
Optionally, the step of determining multimedia data to be identified includes:
detecting a target time period set for multimedia data;
and determining the multimedia data in the target time period as the multimedia data to be identified.
Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:
and when the multimedia data is video data, extracting first frame video data in the target time period and/or one frame video data in the target time period after a preset time is passed as a media characteristic image.
Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:
when the multimedia data are video data, carrying out face detection on the video data in the target time period;
and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.
Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:
when the multimedia data are video data, acquiring one or more frames of image data obtained based on the screenshot;
judging whether the image data belongs to the video data in the target time period; and if so, adopting the image data as a media characteristic image.
Optionally, the step of determining whether the image data belongs to the video data in the target time period includes:
reading the video identification and the time information carried by the image data;
judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;
when the time information is within the target time period, determining that the image data belongs to video data within the target time period.
Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:
when the multimedia data are audio data, matching the audio data in the target time period with a preset audio model;
when the matching is successful, extracting a style label corresponding to the audio model;
and searching image data matched with the style label as a media characteristic image.
Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:
when the multimedia data are audio data, searching lyric data of the audio data in the target time period;
generating text abstract information by adopting the lyric data;
and searching image data matched with the text abstract information as a media characteristic image.
Optionally, the step of searching for one or more frames of media feature images of the guaranteed multimedia data includes:
when the multimedia data are audio data, inquiring video data corresponding to the audio data;
one or more frames of image data are extracted from the video data as a media feature image.
Optionally, when the multimedia data is triggered, the step of presenting the one or more frames of media image data includes:
detecting suspension operation on a playing progress bar corresponding to the target time period when the multimedia data are played;
and displaying the one or more frames of media image data according to the hovering operation.
According to another aspect of the present invention, there is provided a multimedia data processing apparatus including:
the multimedia data determining module is suitable for determining multimedia data to be identified;
the media characteristic image searching module is suitable for searching one or more frames of media characteristic images representing the multimedia data;
and the media characteristic image display module is suitable for displaying the one or more frames of media image data when the multimedia data is triggered.
Optionally, the multimedia data determination module is further adapted to:
detecting a target time period set for multimedia data;
and determining the multimedia data in the target time period as the multimedia data to be identified.
Optionally, the media feature image lookup module is further adapted to:
and when the multimedia data is video data, extracting first frame video data in the target time period and/or one frame video data in the target time period after a preset time is passed as a media characteristic image.
Optionally, the media feature image lookup module is further adapted to:
when the multimedia data are video data, carrying out face detection on the video data in the target time period;
and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.
Optionally, the media feature image lookup module is further adapted to:
when the multimedia data are video data, acquiring one or more frames of image data obtained based on the screenshot;
judging whether the image data belongs to the video data in the target time period; and if so, adopting the image data as a media characteristic image.
Optionally, the media feature image lookup module is further adapted to:
reading the video identification and the time information carried by the image data;
judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;
when the time information is within the target time period, determining that the image data belongs to video data within the target time period.
Optionally, the media feature image lookup module is further adapted to:
when the multimedia data are audio data, matching the audio data in the target time period with a preset audio model;
when the matching is successful, extracting a style label corresponding to the audio model;
and searching image data matched with the style label as a media characteristic image.
Optionally, the media feature image lookup module is further adapted to:
when the multimedia data are audio data, searching lyric data of the audio data in the target time period;
generating text abstract information by adopting the lyric data;
and searching image data matched with the text abstract information as a media characteristic image.
Optionally, the media feature image lookup module is further adapted to:
when the multimedia data are audio data, inquiring video data corresponding to the audio data;
one or more frames of image data are extracted from the video data as a media feature image.
Optionally, the media feature image presentation module is further adapted to:
detecting suspension operation on a playing progress bar corresponding to the target time period when the multimedia data are played;
and displaying the one or more frames of media image data according to the hovering operation.
According to the embodiment of the invention, the media characteristic image is mined for the multimedia data, and the multimedia data is displayed when being triggered, so that the situation that a user screens out an interested part by watching the whole multimedia data again is avoided, the time consumption is greatly reduced, the waste of bandwidth resources is reduced, and the efficiency is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating steps of an embodiment of a method for processing multimedia data according to an embodiment of the present invention; and
fig. 2 is a block diagram illustrating an embodiment of a multimedia data processing apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for processing multimedia data according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, determining multimedia data to be identified;
in a specific implementation, in a video website or other scenes, multimedia data may be stored in a database in advance.
When applied, can be extracted from the database to identify the media characteristic image of the multimedia data.
In an alternative embodiment of the present invention, step 101 may comprise the following sub-steps:
a substep S11 of detecting a target time period set for the multimedia data;
and a substep S12 of determining the multimedia data in the target time period as the multimedia data to be recognized.
In a specific implementation, when a user requests to play certain video data from an online video website, the preference of the user for the video data can be expressed by the behavior data.
In the embodiment of the invention, the behavior data of the user aiming at certain video data can be collected in the modes of log information of an online video website and the like so as to dig out valuable video clips.
In an alternative example of the embodiment of the present invention, the sub-step S11 may include the following sub-steps:
substep S111, when a first marking operation for the multimedia data is detected, recording a starting time point corresponding to the first marking operation;
a substep S112, when a second marking operation for the multimedia data is detected, recording a termination time point corresponding to the second marking operation;
and a substep S113, composing the starting time point and the ending time point into a target time period.
In the embodiment of the present invention, the first marking operation and the second marking operation may be marking operations that are performed subjectively consciously by the user.
For example, the online video website provides an AB repeat key, the user triggers the a key to be equivalent to triggering the first marking operation, the user triggers the B key to be equivalent to triggering the second marking operation, and the starting time point of the a key and the ending time point of the B key form a target time period.
The first marking operation and the second marking operation may be marking operations that the user does without subjective awareness.
For example, when playing a certain piece of video data, if the user is not interested in the current piece of video data, the user generally adjusts the playing progress by dragging the playing progress bar, clicking the right direction "→" of the physical key, clicking the shortcut control, and so on, so as to skip the piece of video data.
Therefore, the ending operation of the user adjusting the play progress may be regarded as the first marking operation, the starting operation of the user adjusting the play progress may be regarded as the second marking operation, and the start time of the adjusted ending operation and the end time point of the adjusted starting operation are made up into the target time period.
Step 102, searching one or more frames of media characteristic images representing the multimedia data;
in the embodiment of the invention, for the multimedia data in the target time period, which can be regarded as valuable multimedia data, the media characteristic image thereof, namely the image characterizing the multimedia data in the target time period, can be mined.
In a specific implementation, since the multimedia data comprises video data and audio data, and the characteristics of the video data and the audio data are different, the media feature images can be mined in two cases.
Firstly, video data;
in a media feature image, due to the video data within the target time period, when the target time period is set, it will generally be of no concern to the user, starting at the part of interest, possibly the start time point just set, or at a time slightly later than the start time point, e.g. 1 second, i.e. a time slightly later than the start time point.
Therefore, when the multimedia data is video data, a first frame of video data within the target period of time and/or a frame of video data within the target period of time after a preset time (e.g., 1 second) has elapsed may be extracted as the media feature image.
In another media characteristic image, in video data of a television show, a movie, or the like, if more characters are included, more episodes are represented, and the characters are more likely to be preferred by the user.
Therefore, when the multimedia data is video data, the face detection is carried out on the video data in the target time period;
and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.
For example, when the number of faces exceeds a certain number, such as 5, the images can be used as media feature images.
In another media feature image, due to some wonderful and popular video clips which are often favorite video clips of the user, the user tends to share screenshots more.
Therefore, when the multimedia data is video data, one or more frames of image data obtained based on the screenshots can be obtained through forums, microblogs, news and other approaches;
judging whether the image data belongs to video data in a target time period; and if so, adopting the image data as the media characteristic image.
Further, when determining the attribution of the image data, the video identifier and the time information carried by the image data may be read.
Judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;
when the time information is within the target time period, it is determined that the image data belongs to the video data within the target time period.
Secondly, audio data;
in a media feature image, audio models can be generated in advance for different styles of audio data, such as music styles of jazz, classical music, pop music, and the like, and mood styles of joy, sadness, pleasure, and the like.
Therefore, when the multimedia data is audio data, the audio data in the target time period can be adopted to be matched with a preset audio model; and when the matching is successful, extracting the style label corresponding to the audio model.
And searching image data matched with the style label from a preset database or a server of a third party to be used as the media characteristic image.
In another media characteristic image, when the multimedia data is audio data, searching lyric data of the audio data in a target time period from a preset database or a third-party server;
and generating text abstract information by adopting the lyric data through a text abstract algorithm (such as TextTeaser) and the like.
And searching image data matched with the text abstract information from a preset database or a server of a third party to be used as a media characteristic image.
In another media feature image, when the multimedia data is audio data, video data corresponding to the audio data, such as MV/concert video corresponding to the audio data, tv/movie with the audio data as a score, etc., may be queried.
One or more frames of image data are extracted from the video data as a media feature image.
Of course, the above-mentioned identification method of the media characteristic image is only an example, and when implementing the embodiment of the present invention, the identification method of the media characteristic image may be set according to actual situations, which is not limited in the embodiment of the present invention. In addition, besides the above-mentioned identification method of the media characteristic image, a person skilled in the art may also adopt other identification methods of the media characteristic image according to actual needs, and the embodiment of the present invention is not limited to this.
And 103, when the multimedia data is triggered, displaying the one or more frames of media image data.
In a specific implementation, when multimedia data is played, a hover operation hover on a playing progress bar corresponding to a target time period is detected, and one or more frames of media image data are displayed according to the hover operation hover.
According to the embodiment of the invention, the media characteristic image is mined for the multimedia data, and the multimedia data is displayed when being triggered, so that the situation that a user screens out an interested part by watching the whole multimedia data again is avoided, the time consumption is greatly reduced, the waste of bandwidth resources is reduced, and the efficiency is improved.
For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 2, a block diagram of an embodiment of a multimedia data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a multimedia data determination module 201 adapted to determine multimedia data to be identified;
a media characteristic image searching module 202, adapted to search one or more frames of media characteristic images representing the multimedia data;
the media characteristic image presentation module 203 is adapted to present the one or more frames of media image data when the multimedia data is triggered.
In an optional embodiment of the present invention, the multimedia data determination module 201 may be further adapted to:
detecting a target time period set for multimedia data;
and determining the multimedia data in the target time period as the multimedia data to be identified.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
and when the multimedia data is video data, extracting first frame video data in the target time period and/or one frame video data in the target time period after a preset time is passed as a media characteristic image.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
when the multimedia data are video data, carrying out face detection on the video data in the target time period;
and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
when the multimedia data are video data, acquiring one or more frames of image data obtained based on the screenshot;
judging whether the image data belongs to the video data in the target time period; and if so, adopting the image data as a media characteristic image.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
reading the video identification and the time information carried by the image data;
judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;
when the time information is within the target time period, determining that the image data belongs to video data within the target time period.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
when the multimedia data are audio data, matching the audio data in the target time period with a preset audio model;
when the matching is successful, extracting a style label corresponding to the audio model;
and searching image data matched with the style label as a media characteristic image.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
when the multimedia data are audio data, searching lyric data of the audio data in the target time period;
generating text abstract information by adopting the lyric data;
and searching image data matched with the text abstract information as a media characteristic image.
In an optional embodiment of the present invention, the media feature image lookup module 202 may be further adapted to:
when the multimedia data are audio data, inquiring video data corresponding to the audio data;
one or more frames of image data are extracted from the video data as a media feature image.
In an optional embodiment of the present invention, the media feature image presentation module 203 may be further adapted to:
detecting suspension operation on a playing progress bar corresponding to the target time period when the multimedia data are played;
and displaying the one or more frames of media image data according to the hovering operation.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a device for processing multimedia data according to an embodiment of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (16)

1. A method of processing multimedia data, comprising:
determining multimedia data to be identified;
searching one or more frames of media characteristic images representing the multimedia data;
when the multimedia data is triggered, displaying the one or more frames of media image data;
the step of determining multimedia data to be identified comprises:
detecting a target time period set for multimedia data;
determining the multimedia data in the target time period as the multimedia data to be identified;
the step of searching for one or more frames of media characteristic images representing the multimedia data comprises:
when the multimedia data are audio data, searching lyric data of the audio data in the target time period;
generating text abstract information by adopting the lyric data;
and searching image data matched with the text abstract information as a media characteristic image.
2. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:
and when the multimedia data is video data, extracting first frame video data in the target time period and/or one frame video data in the target time period after a preset time is passed as a media characteristic image.
3. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:
when the multimedia data are video data, carrying out face detection on the video data in the target time period;
and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.
4. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:
when the multimedia data are video data, acquiring one or more frames of image data obtained based on the screenshot;
judging whether the image data belongs to the video data in the target time period; and if so, adopting the image data as a media characteristic image.
5. The method of claim 4, wherein the step of determining whether the image data belongs to video data within the target time period comprises:
reading the video identification and the time information carried by the image data;
judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;
when the time information is within the target time period, determining that the image data belongs to video data within the target time period.
6. The method of claim 1, wherein the step of locating one or more media characteristic images characterizing the multimedia data comprises:
when the multimedia data are audio data, matching the audio data in the target time period with a preset audio model;
when the matching is successful, extracting a style label corresponding to the audio model;
and searching image data matched with the style label as a media characteristic image.
7. The method of any one of claims 1-6, wherein the step of finding one or more media feature images that characterize the multimedia data comprises:
when the multimedia data are audio data, inquiring video data corresponding to the audio data;
one or more frames of image data are extracted from the video data as a media feature image.
8. The method of any one of claims 1-6, wherein the step of presenting the one or more frames of media image data when the multimedia data is triggered comprises:
detecting suspension operation on a playing progress bar corresponding to the target time period when the multimedia data are played;
and displaying the one or more frames of media image data according to the hovering operation.
9. A device for processing multimedia data, comprising:
the multimedia data determining module is suitable for determining multimedia data to be identified;
the media characteristic image searching module is suitable for searching one or more frames of media characteristic images representing the multimedia data;
the media characteristic image display module is suitable for displaying the one or more frames of media image data when the multimedia data is triggered;
the multimedia data determination module is further adapted to:
detecting a target time period set for multimedia data;
determining the multimedia data in the target time period as the multimedia data to be identified;
the media feature image lookup module is further adapted to:
when the multimedia data are audio data, searching lyric data of the audio data in the target time period;
generating text abstract information by adopting the lyric data;
and searching image data matched with the text abstract information as a media characteristic image.
10. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:
and when the multimedia data is video data, extracting first frame video data in the target time period and/or one frame video data in the target time period after a preset time is passed as a media characteristic image.
11. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:
when the multimedia data are video data, carrying out face detection on the video data in the target time period;
and extracting one or more frames of video data as a media characteristic image according to the number of the detected faces.
12. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:
when the multimedia data are video data, acquiring one or more frames of image data obtained based on the screenshot;
judging whether the image data belongs to the video data in the target time period; and if so, adopting the image data as a media characteristic image.
13. The apparatus of claim 12, wherein the media feature image lookup module is further adapted to:
reading the video identification and the time information carried by the image data;
judging whether the video identification is matched with the video data; if yes, judging whether the time information is in the target time period;
when the time information is within the target time period, determining that the image data belongs to video data within the target time period.
14. The apparatus of claim 9, wherein the media feature image lookup module is further adapted to:
when the multimedia data are audio data, matching the audio data in the target time period with a preset audio model;
when the matching is successful, extracting a style label corresponding to the audio model;
and searching image data matched with the style label as a media characteristic image.
15. The apparatus of any of claims 9-14, wherein the media feature image lookup module is further adapted to:
when the multimedia data are audio data, inquiring video data corresponding to the audio data;
one or more frames of image data are extracted from the video data as a media feature image.
16. The apparatus of any of claims 9-14, wherein the media feature image presentation module is further adapted to:
detecting suspension operation on a playing progress bar corresponding to the target time period when the multimedia data are played;
and displaying the one or more frames of media image data according to the hovering operation.
CN201510959105.4A 2015-12-18 2015-12-18 Multimedia data processing method and device Active CN106897304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510959105.4A CN106897304B (en) 2015-12-18 2015-12-18 Multimedia data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510959105.4A CN106897304B (en) 2015-12-18 2015-12-18 Multimedia data processing method and device

Publications (2)

Publication Number Publication Date
CN106897304A CN106897304A (en) 2017-06-27
CN106897304B true CN106897304B (en) 2021-01-29

Family

ID=59190418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510959105.4A Active CN106897304B (en) 2015-12-18 2015-12-18 Multimedia data processing method and device

Country Status (1)

Country Link
CN (1) CN106897304B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108924576A (en) * 2018-07-10 2018-11-30 武汉斗鱼网络科技有限公司 A kind of video labeling method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1798267A (en) * 2004-12-22 2006-07-05 上海乐金广电电子有限公司 Device for matching image to words of a song in Kara-Ok system
CN101600118A (en) * 2008-06-06 2009-12-09 株式会社日立制作所 Audio/video content information draw-out device and method
CN102572356A (en) * 2012-01-16 2012-07-11 华为技术有限公司 Conference recording method and conference system
CN103080991A (en) * 2010-10-07 2013-05-01 阿姆司教育株式会社 Music-based language-learning method, and learning device using same
CN104837059A (en) * 2014-04-15 2015-08-12 腾讯科技(北京)有限公司 Video processing method, device and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1979464A (en) * 2005-12-07 2007-06-13 联想(北京)有限公司 Method for realizing playing according to request of user in digital media player
CN101324897A (en) * 2008-07-28 2008-12-17 北京搜狗科技发展有限公司 Method and apparatus for looking up lyric
US20120109563A1 (en) * 2010-10-29 2012-05-03 President And Fellows Of Harvard College Method and apparatus for quantifying a best match between series of time uncertain measurements
CN103873512A (en) * 2012-12-13 2014-06-18 深圳市赛格导航科技股份有限公司 Method for vehicle-mounted wireless music transmission based on face recognition technology
CN103381280B (en) * 2013-07-10 2015-07-15 上海泰亿格康复医疗科技股份有限公司 Visual and auditory integrated rehabilitation training system and method based on visible brain wave induction technology
CN104090883B (en) * 2013-11-15 2017-05-17 广州酷狗计算机科技有限公司 Playing control processing method and playing control processing device for audio file
CN104202657B (en) * 2014-08-29 2018-09-18 北京奇虎科技有限公司 The method and device that multiple videos selection in same theme video group is played

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1798267A (en) * 2004-12-22 2006-07-05 上海乐金广电电子有限公司 Device for matching image to words of a song in Kara-Ok system
CN101600118A (en) * 2008-06-06 2009-12-09 株式会社日立制作所 Audio/video content information draw-out device and method
CN103080991A (en) * 2010-10-07 2013-05-01 阿姆司教育株式会社 Music-based language-learning method, and learning device using same
CN102572356A (en) * 2012-01-16 2012-07-11 华为技术有限公司 Conference recording method and conference system
CN104837059A (en) * 2014-04-15 2015-08-12 腾讯科技(北京)有限公司 Video processing method, device and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
幼儿园音乐活动中图谱运用的研究;张艳丽;《中国优秀硕士学位论文全文数据库 社会科学Ⅱ辑》;20140215(第02期);H128-19 *

Also Published As

Publication number Publication date
CN106897304A (en) 2017-06-27

Similar Documents

Publication Publication Date Title
AU2020260513B2 (en) Targeted ad redistribution
CN112753225B (en) Video processing for embedded information card positioning and content extraction
CN110149558B (en) Video playing real-time recommendation method and system based on content identification
CN110582025B (en) Method and apparatus for processing video
CN109547819B (en) Live list display method and device and electronic equipment
US9374411B1 (en) Content recommendations using deep data
CN108366278B (en) User interaction implementation method and device in video playing
JP5651231B2 (en) Media fingerprint for determining and searching content
US9256601B2 (en) Media fingerprinting for social networking
US20190392866A1 (en) Video summarization and collaboration systems and methods
US11748408B2 (en) Analyzing user searches of verbal media content
CN112602077A (en) Interactive video content distribution
US20150172787A1 (en) Customized movie trailers
CN105184616B (en) Method and device for directionally delivering business object
CN112753227A (en) Audio processing for detecting the occurrence of crowd noise in a sporting event television program
US9635337B1 (en) Dynamically generated media trailers
US10897658B1 (en) Techniques for annotating media content
CN106899879B (en) Multimedia data processing method and device
US20170272793A1 (en) Media content recommendation method and device
CN108769831B (en) Video preview generation method and device
CN114845149B (en) Video clip method, video recommendation method, device, equipment and medium
US20230052033A1 (en) Systems and methods for recommending content using progress bars
CN106897304B (en) Multimedia data processing method and device
WO2017096883A1 (en) Video recommendation method and system
CN117014649A (en) Video processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240116

Address after: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right