CN107766571B - Multimedia resource retrieval method and device - Google Patents

Multimedia resource retrieval method and device Download PDF

Info

Publication number
CN107766571B
CN107766571B CN201711108216.XA CN201711108216A CN107766571B CN 107766571 B CN107766571 B CN 107766571B CN 201711108216 A CN201711108216 A CN 201711108216A CN 107766571 B CN107766571 B CN 107766571B
Authority
CN
China
Prior art keywords
information
multimedia resource
multimedia
query request
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201711108216.XA
Other languages
Chinese (zh)
Other versions
CN107766571A (en
Inventor
柳军飞
麻志毅
杨寒
李宏强
孙博
范红杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201711108216.XA priority Critical patent/CN107766571B/en
Publication of CN107766571A publication Critical patent/CN107766571A/en
Application granted granted Critical
Publication of CN107766571B publication Critical patent/CN107766571B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for retrieving multimedia resources, wherein the method comprises the following steps: receiving a query request sent by a user; searching in a multimedia resource search library according to the query request, and returning a search result; the multi-mode information of a plurality of multimedia resources is stored in the multimedia resource search library. The invention can be used for more fully searching the multimedia resources meeting the searching conditions, thereby better meeting the searching requirements of the multimedia resources.

Description

Multimedia resource retrieval method and device
Technical Field
The present invention relates to the field of video retrieval, and in particular, to a method and an apparatus for retrieving multimedia resources.
Background
With the rapid development of internet technology and the great increase of network bandwidth, multimedia resources (videos) stored on the internet are increasing explosively. Among these huge multimedia resources, valuable resources of great commercial value are not lacked. How to perform efficient retrieval in massive multimedia resources (videos) becomes a key for efficient utilization of multimedia video resources and maximization of the value of the multimedia video resources.
Currently, the retrieval of multimedia resources (videos) mainly depends on retrieving cataloging information of the multimedia resources (videos) based on keywords; different multimedia resource producers usually define the cataloging information of the multimedia resources according to the needs of the producers; therefore, the information contained in the catalog information of the multimedia resource tends to have a limitation or one-sidedness. The retrieval based on the cataloging information cannot well meet the retrieval requirement, and many useful multimedia resources can be omitted.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for retrieving multimedia resources, which can more fully retrieve multimedia resources satisfying the retrieval condition, thereby better satisfying the retrieval requirement of the multimedia resources.
Based on the above purpose, the present invention provides a multimedia resource retrieval method, which includes:
receiving a query request sent by a user;
searching in a multimedia resource search library according to the query request, and returning a search result;
the multi-mode information of a plurality of multimedia resources is stored in the multimedia resource search library.
Preferably, the multimedia resource search library further stores: cataloging information for each multimedia asset.
Wherein the multimodal information of the multimedia resource comprises textual information; and
the text information is pre-stored in the multimedia resource search library:
identifying text information from a video of the multimedia resource;
and storing the identified text information into the multimedia resource search library.
Wherein the multimodal information of the multimedia resource comprises speech information; wherein, the voice information is pre-stored in the multimedia resource search library in an audio compression coding form and/or a text form:
extracting audio from the multimedia resource, performing voice recognition, converting the audio into text content, and storing the text content obtained by conversion into the multimedia resource retrieval library as voice information of the multimedia resource in a text form; and/or
And extracting audio from the multimedia resource, further extracting the characteristics of the audio and carrying out compression coding on the extracted audio characteristics to obtain the voice information of the multimedia resource in an audio compression coding form.
Wherein the multimodal information of the multimedia asset comprises image information; wherein, the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a character mode:
extracting key frames from the video of the multimedia resources, carrying out image content description and/or image object labeling on the key frames, and storing character contents obtained by image content description and/or image object labeling into the multimedia resource retrieval library as image information of the multimedia resources in a character form; and/or
Extracting key frames from the video of the multimedia resources, extracting picture pixel characteristics of the key frames, performing compression coding, and storing image information of the multimedia resources in a pixel compression coding form into the multimedia resource retrieval library.
Wherein, the searching in the multimedia resource search library according to the query request comprises:
analyzing the query request to obtain a keyword set K of the query request;
expanding the keyword set K to obtain an expanded keyword set K';
and searching in the multimedia resource search library according to the expanded keyword set K'.
Or, the retrieving in the multimedia resource retrieval library according to the query request includes:
analyzing the query request to obtain an audio clip in the query request;
and according to the audio segments, searching in the audio information in the audio compression coding form in the multimedia resource search library.
Or, the retrieving in the multimedia resource retrieval library according to the query request includes:
analyzing the query request to obtain a picture in the query request;
and according to the picture, searching in the image information in a pixel compression coding mode in the multimedia resource search library.
Further, after the retrieval is performed in the multimedia resource retrieval library according to the query request, the method further includes:
aiming at the same multimedia resource, obtaining cataloguing information of the multimedia resource and the integrating degrees of the information in different modes corresponding to the query request respectively;
respectively carrying out weighted average on the cataloguing information of the multimedia resources and the integrating degrees of the information in different modes corresponding to the query request, and taking the obtained weighted average as a score of the multimedia resources matched with the query request;
sorting in descending order according to the scores of the multimedia resources;
and taking the sequencing result of each multimedia resource as the retrieval result.
The invention also provides a multimedia resource retrieval device, comprising:
the multimedia resource search library is used for storing multi-modal information of a plurality of multimedia resources;
the query request receiving module is used for receiving a query request sent by a user;
and the retrieval module is used for retrieving in the multimedia resource retrieval library according to the query request and returning a retrieval result.
Further, the multimedia resource search library further stores: cataloging information for each multimedia asset.
Wherein the multi-modal information of the multimedia resource comprises at least one of the following information: text information, voice information, image information; the voice information is pre-stored in the multimedia resource search library in an audio compression coding mode and/or a text mode; the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a text mode.
Further, the apparatus further comprises: a multimodal information storage module; and
the multi-modal information storage module comprises at least one of the following units:
the text information storage unit is used for identifying text information from the video of the multimedia resource; storing the identified text information into the multimedia resource search library;
the voice information storage unit is used for extracting audio from the multimedia resources, performing voice recognition on the audio, converting the audio into text contents, and storing the text contents obtained through conversion into the multimedia resource retrieval library as voice information of the multimedia resources in a text form; and/or extracting audio from the multimedia resource, further extracting the characteristics of the audio and performing compression coding on the extracted audio characteristics to obtain voice information of the multimedia resource in an audio compression coding form, and storing the obtained voice information of the multimedia resource in the audio compression coding form into the multimedia resource retrieval library;
the image information storage unit is used for extracting key frames from the video of the multimedia resources, carrying out image content description and/or image object labeling on the key frames, and storing the text content obtained by image content description and/or the text content obtained by image object labeling into the multimedia resource retrieval library as the image information of the text form of the multimedia resources; and/or extracting key frames from the video of the multimedia resources, extracting picture pixel characteristics of the key frames, performing compression coding, and storing image information of the multimedia resources in a pixel compression coding form into the multimedia resource search library.
In the technical scheme of the invention, the multi-mode information of the multimedia resources is stored in the multimedia resource retrieval library, retrieval is carried out in the multimedia resource retrieval library according to the query request, and retrieval can be carried out based on information richer than cataloged information, so that the multimedia resources meeting retrieval conditions can be retrieved more fully, and the retrieval requirements of the multimedia resources are better met.
Drawings
Fig. 1 is a flowchart of a multimedia resource retrieval method according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for acquiring and storing text information of a multimedia resource according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for obtaining and storing voice information of a multimedia resource according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for acquiring and storing image information of a multimedia asset according to an embodiment of the present invention;
fig. 5 is a block diagram of an internal structure of a multimedia resource retrieval device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
The inventors consider that multi-modal information, such as text, speech, images, etc., is contained in a multimedia asset (video). If the information is utilized during retrieval, the multimedia resources meeting the retrieval conditions can be retrieved more fully, thereby better meeting the retrieval requirements of the multimedia resources.
The technical scheme of the invention is described in detail in the following with reference to the accompanying drawings.
Based on the above thought, in order to utilize the multimodal information of the multimedia resources during the retrieval, in the technical solution of the embodiment of the present invention, the stored multimedia resources are preprocessed, and the multimodal information is extracted from the multimedia resources and stored in the multimedia resource retrieval library. In the multimedia resource search library provided in the embodiment of the present invention, the multi-modal information of each multimedia resource may include at least one of the following information: text information, voice information, image information. The multi-modal information of the multimedia resources is pre-stored in a multimedia resource search library, wherein the voice information is pre-stored in the multimedia resource search library in an audio compression coding mode and/or a text mode; the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a text mode. How to acquire and store information of multiple modalities will be described in detail later. Of course, it is preferable that the cataloging information of the multimedia assets is also stored in the multimedia asset repository.
Based on the above multimedia resource search library, the multimedia resource search method provided in the embodiment of the present invention has a process as shown in fig. 1, and includes the following steps:
s101: and receiving a query request sent by a user.
In this step, the received query request may include a keyword to be queried, an audio clip to be queried, or a picture to be queried.
S102: and retrieving in a multimedia resource retrieval library according to the query request.
In this step, for a query request including a keyword to be queried, the query request may be first analyzed to obtain a keyword set K of the query request; for example, the query request may be analyzed by using techniques such as word segmentation, chinese word segmentation, named entity recognition, emotion analysis, and the like, so as to obtain the keyword set K of the query request.
Further, expanding the keyword set K to obtain an expanded keyword set K'; for example, the keyword set K may be expanded by a method such as a knowledge graph or synonym expansion.
Then, searching in the multi-modal information of the multimedia resource search library according to the expanded keyword set K'; or, searching in the multi-modal information and cataloguing information of the multimedia resource search library according to the expanded keyword set K'.
The keyword set is expanded here to improve the completeness of the query. For example, if the user query request includes "tomato", the technical solution of the present invention may also query a video including "tomato" content for the synonym "tomato" of "tomato". That is, the search is performed according to the expanded keyword set, so that more search results related to the query condition in the query request can be obtained.
Methods for searching according to the keyword set are well known to those skilled in the art, and are not described herein.
In the step, for the query request including the audio clip to be queried, the query request is firstly analyzed to obtain the audio clip in the query request; and further, according to the audio segments, searching the audio information in the audio compression coding form in the multimedia resource search library: and after audio features of the audio segments are extracted, carrying out compression coding, and searching for similar audio information in the audio compression coding form in the multimedia resource search library by utilizing a clustering algorithm.
In the step, for the query request including the picture to be queried, the query request is firstly analyzed to obtain the picture in the query request; and then according to the picture, searching in the image information in a pixel compression coding form in the multimedia resource search library: and after the picture pixel characteristics of the picture are extracted and compressed and coded, searching similar image information in the pixel compression coding mode in the multimedia resource retrieval library by utilizing a clustering algorithm.
Further, after retrieval is performed in the multi-modal information and the cataloguing information of the multimedia resource retrieval library, the cataloguing information of the same multimedia resource and the degrees of engagement, or matching degrees, of the information (i.e., text information, voice information, image information) of different modalities respectively corresponding to the query request can be obtained, the cataloguing information of the multimedia resource and the degrees of engagement of the information (i.e., text information, voice information, image information) of different modalities respectively corresponding to the query request are weighted-averaged, and the obtained weighted average is used as the score of the multimedia resource matching the query request. Sorting in descending order according to the scores of the multimedia resources; and taking the sequencing result of each multimedia resource as the retrieval result.
S103: and returning a retrieval result.
After the retrieval result matched with the query condition in the query request is obtained, the retrieval result is returned to the user, and the user can know the multimedia resource meeting the query condition or the multimedia resource meeting the condition similar to the query condition.
The multi-modal information of each multimedia resource in the multimedia resource search library is obtained and stored in advance, wherein a specific method flow for obtaining and storing text information of multimedia resources provided by the embodiment of the present invention is shown in fig. 2, and includes the following steps:
s201: text information is identified from the video of the multimedia asset.
Specifically, the image frames with high similarity in the multimedia resource may be deduplicated, and the image frames of the multimedia resource video after deduplication may be subjected to character recognition.
S202: and storing the identified text information into the multimedia resource search library.
In this step, preferably, the identified text information may be subjected to deduplication processing, and the deduplicated text information is stored in the multimedia resource search library. The deduplication processing is beneficial to removing a large amount of redundant information, and the space of the multimedia resource search library is saved.
The specific method flow for pre-acquiring and storing the voice information of the multimedia resource provided by the embodiment of the invention is shown in fig. 3, and comprises the following steps:
s301: audio is extracted from the multimedia asset.
S302: and performing voice recognition on the extracted audio frequency, converting the audio frequency into character content, and/or further extracting the characteristics of the audio frequency, and performing compression coding on the extracted audio frequency characteristics to obtain voice information of the multimedia resource in an audio frequency compression coding form.
S303: and storing the converted text content into the multimedia resource search library as the voice information of the multimedia resource, and/or storing the voice information of the multimedia resource in the form of audio compression coding obtained after compression coding into the multimedia resource search library.
In this step, preferably, text summary is performed on the text contents obtained by conversion, and the text contents obtained by summary are stored in the multimedia resource search library as the voice information of the multimedia resources; and/or
In this step, the voice information in the form of audio compression coding of the multimedia resource obtained after compression coding in step S302 is stored in the multimedia resource search library.
Generally, the speech content in multimedia assets is large, but only a portion of it is useful. Therefore, the text summary is made on the converted text content, and the content without practical significance is removed. And then adding the text content obtained by the abstract into a multi-mode media resource search library. Therefore, a large amount of redundant information can be removed, and the space of the multimedia resource search library is saved.
The specific method flow for acquiring and storing the image information of the multimedia resource in advance provided by the embodiment of the invention is shown in fig. 4, and comprises the following steps:
s401: and extracting key frames from the video of the multimedia resource.
In fact, the video of the multimedia resource is composed of one frame and one frame of pictures, and semantic information contained in the pictures is crucial for understanding the video content. The system firstly extracts key frames from the video to obtain the key frames.
S402: and carrying out image content description and/or image object labeling on the extracted key frames, and/or extracting picture pixel characteristics of the key frames and carrying out compression coding.
In this step, each key frame is subjected to image content description to generate text content describing the key frame, and/or each key frame is subjected to image object labeling to obtain character content labeled by the image object. Specifically, the image content description can be performed on the key frame by adopting an artificial intelligence related technology such as deep learning, and the like, so as to obtain the described text content; the image object labeling on the key frame specifically refers to character labeling on an object image identified in the key frame. And/or
In the step, after the picture pixel characteristics of each key frame are extracted and compressed and encoded, the image information of the multimedia resource in the pixel compression encoding form is obtained.
S403: and storing the text content obtained by describing the image content and/or the text content obtained by labeling the image object into the multimedia resource retrieval library as the image information of the multimedia resource in the text form, and/or storing the obtained image information of the multimedia resource in the pixel compression coding form into the multimedia resource retrieval library.
In this step, preferably, the text content obtained by describing the image content and/or the text content obtained by labeling the image object may be subjected to duplication elimination, and the duplicated text content is stored in the multimedia resource search library as the image information of the multimedia resource in the text form; and/or
In this step, the obtained image information of the multimedia resource in the form of pixel compression coding is stored in the multimedia resource search library.
Based on the above method, an internal block diagram of a multimedia resource retrieval device provided in an embodiment of the present invention is shown in fig. 5, and includes: a multimedia resource search library 501, a query request receiving module 502 and a search module 503.
The multimedia resource search library 501 is used for storing multi-modal information of a plurality of multimedia resources; preferably, the multimedia resource search library 501 may further have stored therein: cataloging information for each multimedia asset. Wherein the multi-modal information of the multimedia resource comprises at least one of the following information: text information, voice information, image information.
The query request receiving module 502 is used for receiving a query request sent by a user.
The retrieval module 503 is configured to perform retrieval in the multimedia resource retrieval library 501 according to the query request received by the query request receiving module 502, and return a retrieval result.
Preferably, the retrieving module 503 is configured to analyze the query request to obtain a keyword set K of the query request; expanding the keyword set K to obtain an expanded keyword set K'; and searching in the multimedia resource search library according to the expanded keyword set K'. The specific retrieving method of the retrieving module 503 may refer to the content in the step S102, and is not described herein again.
Further, after retrieving in the multi-modal information and the cataloguing information of the multimedia resource repository according to the expanded keyword set K', for the same multimedia resource, the retrieval module 503 may obtain the cataloguing information of the multimedia resource and the degrees of agreeing, or matching degrees, of the information of different modalities respectively corresponding to the query request, and perform weighted average on the cataloguing information of the multimedia resource and the degrees of agreeing, respectively corresponding to the query request, of the information of different modalities, and take the obtained weighted average as a score of the multimedia resource matching the query request. And returning the retrieval results to the user in a descending order according to the scores.
Alternatively, the retrieval module 503 may be further configured to analyze the query request, and obtain an audio segment in the query request; and according to the audio segments, searching in the audio information in the audio compression coding form in the multimedia resource search library.
Or, the retrieval module 503 may also be configured to analyze the query request and obtain a picture in the query request; and according to the picture, searching in the image information in a pixel compression coding mode in the multimedia resource search library.
Further, the apparatus for retrieving a multimedia resource provided in an embodiment of the present invention may further include: a multimodal information storage module 504;
the multimodal information storage module 504 includes at least one of the following: a text information storage unit 511, a voice information storage unit 512, and an image information storage unit 513.
The text information storage unit 511 is used for identifying text information from the video of the multimedia resource; the recognized text information is stored in the multimedia resource search library 501. The specific method for acquiring and storing the text information of the multimedia resource by the text information storage unit 511 can refer to the above-mentioned steps shown in fig. 2, and will not be described herein again.
The voice information storage unit 512 is configured to extract audio from the multimedia resource, perform voice recognition, convert the audio into text content, and store the text content obtained through conversion into the multimedia resource search library as voice information of the multimedia resource in a text form; extracting audio from the multimedia resource, further extracting the characteristics of the audio, and performing compression coding on the extracted audio characteristics to obtain the voice information of the multimedia resource in the form of audio compression coding, and storing the obtained voice information of the multimedia resource in the form of audio compression coding into the multimedia resource search library 501. The specific method for acquiring and storing the voice information of the multimedia resource by the voice information storage unit 512 can refer to the above steps shown in fig. 3, and is not described herein again.
The image information storage unit 513 extracts a key frame from the video of the multimedia resource, performs image content description and/or image object labeling on the key frame, and stores text content obtained through image content description and/or text content obtained through image object labeling into the multimedia resource search library as image information of the text form of the multimedia resource; and/or extracting a key frame from the video of the multimedia resource, extracting picture pixel characteristics of the key frame, performing compression coding, and storing image information of the multimedia resource in a pixel compression coding form into the multimedia resource search library 501. The specific method for acquiring and storing the image information of the multimedia resource by the image information storage unit 513 can refer to the steps shown in fig. 4, which is not described herein again.
In the technical scheme of the invention, the multi-mode information of the multimedia resources is stored in the multimedia resource retrieval library, retrieval is carried out in the multimedia resource retrieval library according to the query request, and retrieval can be carried out based on information richer than cataloged information, so that the multimedia resources meeting retrieval conditions can be retrieved more fully, and the retrieval requirements of the multimedia resources are better met.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method for retrieving a multimedia resource, comprising:
receiving a query request sent by a user;
searching in a multimedia resource search library according to the query request, and obtaining cataloguing information of the multimedia resource and the fitting degrees of different modal information corresponding to the query request respectively aiming at the same multimedia resource; respectively carrying out weighted average on the cataloguing information of the multimedia resources and the integrating degrees of the information in different modes corresponding to the query request, and taking the obtained weighted average as a score of the multimedia resources matched with the query request; sorting in descending order according to the scores of the multimedia resources; taking the sequencing result of each multimedia resource as a retrieval result, and returning the retrieval result;
the multi-mode information of a plurality of multimedia resources is stored in the multimedia resource search library.
2. The method of claim 1, wherein the multimodal information of the multimedia asset comprises textual information; and
the text information is pre-stored in the multimedia resource search library:
identifying text information from a video of the multimedia resource;
and storing the identified text information into the multimedia resource search library.
3. The method of claim 1, wherein the multimodal information of the multimedia asset comprises speech information; wherein, the voice information is pre-stored in the multimedia resource search library in an audio compression coding form and/or a text form:
extracting audio from the multimedia resource, performing voice recognition, converting the audio into text content, and storing the text content obtained by conversion into the multimedia resource retrieval library as voice information of the multimedia resource in a text form; and/or
And extracting audio from the multimedia resource, further extracting the characteristics of the audio and carrying out compression coding on the extracted audio characteristics to obtain the voice information of the multimedia resource in an audio compression coding form.
4. The method of claim 1, wherein the multimodal information of the multimedia asset comprises image information; wherein, the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a character mode:
extracting key frames from the video of the multimedia resources, carrying out image content description and/or image object labeling on the key frames, and storing character contents obtained by image content description and/or image object labeling into the multimedia resource retrieval library as image information of the multimedia resources in a character form; and/or
Extracting key frames from the video of the multimedia resources, extracting picture pixel characteristics of the key frames, performing compression coding, and storing image information of the multimedia resources in a pixel compression coding form into the multimedia resource retrieval library.
5. The method of claim 3, wherein the retrieving in the multimedia resource repository according to the query request comprises:
analyzing the query request to obtain an audio clip in the query request;
and according to the audio segments, searching in the audio information in the audio compression coding form in the multimedia resource search library.
6. The method of claim 4, wherein the retrieving in the multimedia resource repository according to the query request comprises:
analyzing the query request to obtain a picture in the query request;
and according to the picture, searching in the image information in a pixel compression coding mode in the multimedia resource search library.
7. The method according to claim 1, further comprising, after said retrieving in the multimedia resource repository according to the query request:
aiming at the same multimedia resource, obtaining cataloguing information of the multimedia resource and the integrating degrees of the information in different modes corresponding to the query request respectively;
respectively carrying out weighted average on the cataloguing information of the multimedia resources and the integrating degrees of the information in different modes corresponding to the query request, and taking the obtained weighted average as a score of the multimedia resources matched with the query request;
sorting in descending order according to the scores of the multimedia resources;
and taking the sequencing result of each multimedia resource as the retrieval result.
8. A multimedia asset retrieval apparatus, comprising:
the multimedia resource search library is used for storing multi-modal information of a plurality of multimedia resources;
the query request receiving module is used for receiving a query request sent by a user;
the retrieval module is used for retrieving in the multimedia resource retrieval library according to the query request, and obtaining cataloguing information of the multimedia resource and the fitting degrees of the information in different modes corresponding to the query request respectively aiming at the same multimedia resource; respectively carrying out weighted average on the cataloguing information of the multimedia resources and the integrating degrees of the information in different modes corresponding to the query request, and taking the obtained weighted average as a score of the multimedia resources matched with the query request; sorting in descending order according to the scores of the multimedia resources; and taking the sequencing result of each multimedia resource as a retrieval result, and returning the retrieval result.
9. The apparatus of claim 8, wherein the multimodal information of the multimedia asset comprises at least one of: text information, voice information, image information; the voice information is pre-stored in the multimedia resource search library in an audio compression coding mode and/or a text mode; the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a text mode.
10. The apparatus of claim 9, further comprising: a multimodal information storage module; and
the multi-modal information storage module comprises at least one of the following units:
the text information storage unit is used for identifying text information from the video of the multimedia resource; storing the identified text information into the multimedia resource search library;
the voice information storage unit is used for extracting audio from the multimedia resources, performing voice recognition on the audio, converting the audio into text contents, and storing the text contents obtained through conversion into the multimedia resource retrieval library as voice information of the multimedia resources in a text form; and/or extracting audio from the multimedia resource, further extracting the characteristics of the audio and performing compression coding on the extracted audio characteristics to obtain voice information of the multimedia resource in an audio compression coding form, and storing the obtained voice information of the multimedia resource in the audio compression coding form into the multimedia resource retrieval library;
the image information storage unit is used for extracting key frames from the video of the multimedia resources, carrying out image content description and/or image object labeling on the key frames, and storing the text content obtained by image content description and/or the text content obtained by image object labeling into the multimedia resource retrieval library as the image information of the text form of the multimedia resources; and/or extracting key frames from the video of the multimedia resources, extracting picture pixel characteristics of the key frames, performing compression coding, and storing image information of the multimedia resources in a pixel compression coding form into the multimedia resource search library.
CN201711108216.XA 2017-11-08 2017-11-08 Multimedia resource retrieval method and device Expired - Fee Related CN107766571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711108216.XA CN107766571B (en) 2017-11-08 2017-11-08 Multimedia resource retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711108216.XA CN107766571B (en) 2017-11-08 2017-11-08 Multimedia resource retrieval method and device

Publications (2)

Publication Number Publication Date
CN107766571A CN107766571A (en) 2018-03-06
CN107766571B true CN107766571B (en) 2021-02-09

Family

ID=61272932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711108216.XA Expired - Fee Related CN107766571B (en) 2017-11-08 2017-11-08 Multimedia resource retrieval method and device

Country Status (1)

Country Link
CN (1) CN107766571B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647245B (en) * 2018-04-13 2023-04-18 腾讯科技(深圳)有限公司 Multimedia resource matching method and device, storage medium and electronic device
CN110489594A (en) * 2018-05-14 2019-11-22 北京松果电子有限公司 Image vision mask method, device, storage medium and equipment
CN109255036B (en) * 2018-08-31 2020-02-18 北京字节跳动网络技术有限公司 Method and apparatus for outputting information
CN109446356A (en) * 2018-09-21 2019-03-08 深圳市九洲电器有限公司 A kind of multimedia document retrieval method and device
CN109684553A (en) * 2018-12-26 2019-04-26 北京百度网讯科技有限公司 For obtaining the method and device of information
CN110110099A (en) * 2019-04-12 2019-08-09 华勤通讯技术有限公司 A kind of multimedia document retrieval method and device
CN110532404B (en) * 2019-09-03 2023-08-04 北京百度网讯科技有限公司 Source multimedia determining method, device, equipment and storage medium
CN111159435B (en) * 2019-12-27 2023-09-05 新方正控股发展有限责任公司 Multimedia resource processing method, system, terminal and computer readable storage medium
CN113128285A (en) * 2019-12-31 2021-07-16 华为技术有限公司 Method and device for processing video
CN111221984B (en) * 2020-01-15 2024-03-01 北京百度网讯科技有限公司 Multi-mode content processing method, device, equipment and storage medium
CN112528053A (en) * 2020-12-23 2021-03-19 三星电子(中国)研发中心 Multimedia library classified retrieval management system
CN112818906B (en) * 2021-02-22 2023-07-11 浙江传媒学院 Intelligent cataloging method of all-media news based on multi-mode information fusion understanding
CN113507613A (en) * 2021-06-07 2021-10-15 茂名市群英网络有限公司 CDN-based video input scheduling system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN101968819A (en) * 2010-11-05 2011-02-09 中国传媒大学 Audio/video intelligent catalog information acquisition method facing to wide area network
CN107203586A (en) * 2017-04-19 2017-09-26 天津大学 A kind of automation result generation method based on multi-modal information

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7185049B1 (en) * 1999-02-01 2007-02-27 At&T Corp. Multimedia integration description scheme, method and system for MPEG-7
CN100388282C (en) * 2006-09-14 2008-05-14 浙江大学 Transmedia search method based on multi-mode information convergence analysis
CN101272397B (en) * 2008-05-05 2010-11-10 南京师范大学 Method for acquiring addressable stream media based on ASF data amalgamation technology
US20100100439A1 (en) * 2008-06-12 2010-04-22 Dawn Jutla Multi-platform system apparatus for interoperable, multimedia-accessible and convertible structured and unstructured wikis, wiki user networks, and other user-generated content repositories
US8259082B2 (en) * 2008-09-12 2012-09-04 At&T Intellectual Property I, L.P. Multimodal portable communication interface for accessing video content
CN102650993A (en) * 2011-02-25 2012-08-29 北大方正集团有限公司 Index establishing and searching methods, devices and systems for audio-video file
US9292552B2 (en) * 2012-07-26 2016-03-22 Telefonaktiebolaget L M Ericsson (Publ) Apparatus, methods, and computer program products for adaptive multimedia content indexing
US9449002B2 (en) * 2013-01-16 2016-09-20 Althea Systems and Software Pvt. Ltd System and method to retrieve relevant multimedia content for a trending topic
CN103778204A (en) * 2014-01-13 2014-05-07 北京奇虎科技有限公司 Voice analysis-based video search method, equipment and system
CN106209575B (en) * 2016-06-23 2019-09-24 厦门黑镜科技有限公司 Method for sending information, acquisition methods, device and interface system
CN106446051A (en) * 2016-08-31 2017-02-22 北京新奥特云视科技有限公司 Deep search method of Eagle media assets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN101968819A (en) * 2010-11-05 2011-02-09 中国传媒大学 Audio/video intelligent catalog information acquisition method facing to wide area network
CN107203586A (en) * 2017-04-19 2017-09-26 天津大学 A kind of automation result generation method based on multi-modal information

Also Published As

Publication number Publication date
CN107766571A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN107766571B (en) Multimedia resource retrieval method and device
US8396286B1 (en) Learning concepts for video annotation
US20110022394A1 (en) Visual similarity
CN112015949A (en) Video generation method and device, storage medium and electronic equipment
CN108446316B (en) association word recommendation method and device, electronic equipment and storage medium
CN111506771B (en) Video retrieval method, device, equipment and storage medium
US20100318532A1 (en) Unified inverted index for video passage retrieval
CN108334489B (en) Text core word recognition method and device
CN106980664B (en) Bilingual comparable corpus mining method and device
CN109710792B (en) Index-based rapid face retrieval system application
CN111008321A (en) Recommendation method and device based on logistic regression, computing equipment and readable storage medium
SG194442A1 (en) In-video product annotation with web information mining
CN106844571A (en) Recognize method, device and the computing device of synonym
CN107451120B (en) Content conflict detection method and system for open text information
CN113392265A (en) Multimedia processing method, device and equipment
CN111125457A (en) Deep cross-modal Hash retrieval method and device
CN111950261B (en) Method, device and computer readable storage medium for extracting text keywords
CN114218348A (en) Method, device, equipment and medium for acquiring live broadcast segments based on question and answer text
CN111353055A (en) Intelligent tag extended metadata-based cataloging method and system
CN118035489A (en) Video searching method and device, storage medium and electronic equipment
CN115618014A (en) Standard document analysis management system and method applying big data technology
EP3905060A1 (en) Artificial intelligence for content discovery
CN110413770B (en) Method and device for classifying group messages into group topics
JP4703487B2 (en) Image classification method, apparatus and program
CN110351183B (en) Resource collection method and device in instant messaging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210209

Termination date: 20211108

CF01 Termination of patent right due to non-payment of annual fee