CN107766571B - Multimedia resource retrieval method and device - Google Patents

Multimedia resource retrieval method and device Download PDF

Info

Publication number
CN107766571B
CN107766571B CN201711108216.XA CN201711108216A CN107766571B CN 107766571 B CN107766571 B CN 107766571B CN 201711108216 A CN201711108216 A CN 201711108216A CN 107766571 B CN107766571 B CN 107766571B
Authority
CN
China
Prior art keywords
multimedia resource
information
multimedia
retrieval
query request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201711108216.XA
Other languages
Chinese (zh)
Other versions
CN107766571A (en
Inventor
柳军飞
麻志毅
杨寒
李宏强
孙博
范红杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201711108216.XA priority Critical patent/CN107766571B/en
Publication of CN107766571A publication Critical patent/CN107766571A/en
Application granted granted Critical
Publication of CN107766571B publication Critical patent/CN107766571B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种多媒体资源的检索方法和装置,所述方法包括:接收用户发出的查询请求;根据所述查询请求在多媒体资源检索库中进行检索,并返回检索结果;其中,所述多媒体资源检索库中存储有多个多媒体资源的多模态信息。应用本发明可以更充分地检索出满足检索条件的多媒体资源,从而更好地满足多媒体资源的检索需求。

Figure 201711108216

The invention discloses a method and device for retrieving multimedia resources. The method includes: receiving a query request sent by a user; retrieving a multimedia resource retrieval database according to the query request, and returning a retrieval result; wherein, the multimedia resource The resource retrieval library stores multimodal information of multiple multimedia resources. By applying the present invention, the multimedia resources satisfying the retrieval conditions can be more fully retrieved, thereby better satisfying the retrieval requirements of the multimedia resources.

Figure 201711108216

Description

Multimedia resource retrieval method and device
Technical Field
The present invention relates to the field of video retrieval, and in particular, to a method and an apparatus for retrieving multimedia resources.
Background
With the rapid development of internet technology and the great increase of network bandwidth, multimedia resources (videos) stored on the internet are increasing explosively. Among these huge multimedia resources, valuable resources of great commercial value are not lacked. How to perform efficient retrieval in massive multimedia resources (videos) becomes a key for efficient utilization of multimedia video resources and maximization of the value of the multimedia video resources.
Currently, the retrieval of multimedia resources (videos) mainly depends on retrieving cataloging information of the multimedia resources (videos) based on keywords; different multimedia resource producers usually define the cataloging information of the multimedia resources according to the needs of the producers; therefore, the information contained in the catalog information of the multimedia resource tends to have a limitation or one-sidedness. The retrieval based on the cataloging information cannot well meet the retrieval requirement, and many useful multimedia resources can be omitted.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for retrieving multimedia resources, which can more fully retrieve multimedia resources satisfying the retrieval condition, thereby better satisfying the retrieval requirement of the multimedia resources.
Based on the above purpose, the present invention provides a multimedia resource retrieval method, which includes:
receiving a query request sent by a user;
searching in a multimedia resource search library according to the query request, and returning a search result;
the multi-mode information of a plurality of multimedia resources is stored in the multimedia resource search library.
Preferably, the multimedia resource search library further stores: cataloging information for each multimedia asset.
Wherein the multimodal information of the multimedia resource comprises textual information; and
the text information is pre-stored in the multimedia resource search library:
identifying text information from a video of the multimedia resource;
and storing the identified text information into the multimedia resource search library.
Wherein the multimodal information of the multimedia resource comprises speech information; wherein, the voice information is pre-stored in the multimedia resource search library in an audio compression coding form and/or a text form:
extracting audio from the multimedia resource, performing voice recognition, converting the audio into text content, and storing the text content obtained by conversion into the multimedia resource retrieval library as voice information of the multimedia resource in a text form; and/or
And extracting audio from the multimedia resource, further extracting the characteristics of the audio and carrying out compression coding on the extracted audio characteristics to obtain the voice information of the multimedia resource in an audio compression coding form.
Wherein the multimodal information of the multimedia asset comprises image information; wherein, the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a character mode:
extracting key frames from the video of the multimedia resources, carrying out image content description and/or image object labeling on the key frames, and storing character contents obtained by image content description and/or image object labeling into the multimedia resource retrieval library as image information of the multimedia resources in a character form; and/or
Extracting key frames from the video of the multimedia resources, extracting picture pixel characteristics of the key frames, performing compression coding, and storing image information of the multimedia resources in a pixel compression coding form into the multimedia resource retrieval library.
Wherein, the searching in the multimedia resource search library according to the query request comprises:
analyzing the query request to obtain a keyword set K of the query request;
expanding the keyword set K to obtain an expanded keyword set K';
and searching in the multimedia resource search library according to the expanded keyword set K'.
Or, the retrieving in the multimedia resource retrieval library according to the query request includes:
analyzing the query request to obtain an audio clip in the query request;
and according to the audio segments, searching in the audio information in the audio compression coding form in the multimedia resource search library.
Or, the retrieving in the multimedia resource retrieval library according to the query request includes:
analyzing the query request to obtain a picture in the query request;
and according to the picture, searching in the image information in a pixel compression coding mode in the multimedia resource search library.
Further, after the retrieval is performed in the multimedia resource retrieval library according to the query request, the method further includes:
aiming at the same multimedia resource, obtaining cataloguing information of the multimedia resource and the integrating degrees of the information in different modes corresponding to the query request respectively;
respectively carrying out weighted average on the cataloguing information of the multimedia resources and the integrating degrees of the information in different modes corresponding to the query request, and taking the obtained weighted average as a score of the multimedia resources matched with the query request;
sorting in descending order according to the scores of the multimedia resources;
and taking the sequencing result of each multimedia resource as the retrieval result.
The invention also provides a multimedia resource retrieval device, comprising:
the multimedia resource search library is used for storing multi-modal information of a plurality of multimedia resources;
the query request receiving module is used for receiving a query request sent by a user;
and the retrieval module is used for retrieving in the multimedia resource retrieval library according to the query request and returning a retrieval result.
Further, the multimedia resource search library further stores: cataloging information for each multimedia asset.
Wherein the multi-modal information of the multimedia resource comprises at least one of the following information: text information, voice information, image information; the voice information is pre-stored in the multimedia resource search library in an audio compression coding mode and/or a text mode; the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a text mode.
Further, the apparatus further comprises: a multimodal information storage module; and
the multi-modal information storage module comprises at least one of the following units:
the text information storage unit is used for identifying text information from the video of the multimedia resource; storing the identified text information into the multimedia resource search library;
the voice information storage unit is used for extracting audio from the multimedia resources, performing voice recognition on the audio, converting the audio into text contents, and storing the text contents obtained through conversion into the multimedia resource retrieval library as voice information of the multimedia resources in a text form; and/or extracting audio from the multimedia resource, further extracting the characteristics of the audio and performing compression coding on the extracted audio characteristics to obtain voice information of the multimedia resource in an audio compression coding form, and storing the obtained voice information of the multimedia resource in the audio compression coding form into the multimedia resource retrieval library;
the image information storage unit is used for extracting key frames from the video of the multimedia resources, carrying out image content description and/or image object labeling on the key frames, and storing the text content obtained by image content description and/or the text content obtained by image object labeling into the multimedia resource retrieval library as the image information of the text form of the multimedia resources; and/or extracting key frames from the video of the multimedia resources, extracting picture pixel characteristics of the key frames, performing compression coding, and storing image information of the multimedia resources in a pixel compression coding form into the multimedia resource search library.
In the technical scheme of the invention, the multi-mode information of the multimedia resources is stored in the multimedia resource retrieval library, retrieval is carried out in the multimedia resource retrieval library according to the query request, and retrieval can be carried out based on information richer than cataloged information, so that the multimedia resources meeting retrieval conditions can be retrieved more fully, and the retrieval requirements of the multimedia resources are better met.
Drawings
Fig. 1 is a flowchart of a multimedia resource retrieval method according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for acquiring and storing text information of a multimedia resource according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for obtaining and storing voice information of a multimedia resource according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for acquiring and storing image information of a multimedia asset according to an embodiment of the present invention;
fig. 5 is a block diagram of an internal structure of a multimedia resource retrieval device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
The inventors consider that multi-modal information, such as text, speech, images, etc., is contained in a multimedia asset (video). If the information is utilized during retrieval, the multimedia resources meeting the retrieval conditions can be retrieved more fully, thereby better meeting the retrieval requirements of the multimedia resources.
The technical scheme of the invention is described in detail in the following with reference to the accompanying drawings.
Based on the above thought, in order to utilize the multimodal information of the multimedia resources during the retrieval, in the technical solution of the embodiment of the present invention, the stored multimedia resources are preprocessed, and the multimodal information is extracted from the multimedia resources and stored in the multimedia resource retrieval library. In the multimedia resource search library provided in the embodiment of the present invention, the multi-modal information of each multimedia resource may include at least one of the following information: text information, voice information, image information. The multi-modal information of the multimedia resources is pre-stored in a multimedia resource search library, wherein the voice information is pre-stored in the multimedia resource search library in an audio compression coding mode and/or a text mode; the image information is pre-stored in the multimedia resource search library in a pixel compression coding mode and/or a text mode. How to acquire and store information of multiple modalities will be described in detail later. Of course, it is preferable that the cataloging information of the multimedia assets is also stored in the multimedia asset repository.
Based on the above multimedia resource search library, the multimedia resource search method provided in the embodiment of the present invention has a process as shown in fig. 1, and includes the following steps:
s101: and receiving a query request sent by a user.
In this step, the received query request may include a keyword to be queried, an audio clip to be queried, or a picture to be queried.
S102: and retrieving in a multimedia resource retrieval library according to the query request.
In this step, for a query request including a keyword to be queried, the query request may be first analyzed to obtain a keyword set K of the query request; for example, the query request may be analyzed by using techniques such as word segmentation, chinese word segmentation, named entity recognition, emotion analysis, and the like, so as to obtain the keyword set K of the query request.
Further, expanding the keyword set K to obtain an expanded keyword set K'; for example, the keyword set K may be expanded by a method such as a knowledge graph or synonym expansion.
Then, searching in the multi-modal information of the multimedia resource search library according to the expanded keyword set K'; or, searching in the multi-modal information and cataloguing information of the multimedia resource search library according to the expanded keyword set K'.
The keyword set is expanded here to improve the completeness of the query. For example, if the user query request includes "tomato", the technical solution of the present invention may also query a video including "tomato" content for the synonym "tomato" of "tomato". That is, the search is performed according to the expanded keyword set, so that more search results related to the query condition in the query request can be obtained.
Methods for searching according to the keyword set are well known to those skilled in the art, and are not described herein.
In the step, for the query request including the audio clip to be queried, the query request is firstly analyzed to obtain the audio clip in the query request; and further, according to the audio segments, searching the audio information in the audio compression coding form in the multimedia resource search library: and after audio features of the audio segments are extracted, carrying out compression coding, and searching for similar audio information in the audio compression coding form in the multimedia resource search library by utilizing a clustering algorithm.
In the step, for the query request including the picture to be queried, the query request is firstly analyzed to obtain the picture in the query request; and then according to the picture, searching in the image information in a pixel compression coding form in the multimedia resource search library: and after the picture pixel characteristics of the picture are extracted and compressed and coded, searching similar image information in the pixel compression coding mode in the multimedia resource retrieval library by utilizing a clustering algorithm.
Further, after retrieval is performed in the multi-modal information and the cataloguing information of the multimedia resource retrieval library, the cataloguing information of the same multimedia resource and the degrees of engagement, or matching degrees, of the information (i.e., text information, voice information, image information) of different modalities respectively corresponding to the query request can be obtained, the cataloguing information of the multimedia resource and the degrees of engagement of the information (i.e., text information, voice information, image information) of different modalities respectively corresponding to the query request are weighted-averaged, and the obtained weighted average is used as the score of the multimedia resource matching the query request. Sorting in descending order according to the scores of the multimedia resources; and taking the sequencing result of each multimedia resource as the retrieval result.
S103: and returning a retrieval result.
After the retrieval result matched with the query condition in the query request is obtained, the retrieval result is returned to the user, and the user can know the multimedia resource meeting the query condition or the multimedia resource meeting the condition similar to the query condition.
The multi-modal information of each multimedia resource in the multimedia resource search library is obtained and stored in advance, wherein a specific method flow for obtaining and storing text information of multimedia resources provided by the embodiment of the present invention is shown in fig. 2, and includes the following steps:
s201: text information is identified from the video of the multimedia asset.
Specifically, the image frames with high similarity in the multimedia resource may be deduplicated, and the image frames of the multimedia resource video after deduplication may be subjected to character recognition.
S202: and storing the identified text information into the multimedia resource search library.
In this step, preferably, the identified text information may be subjected to deduplication processing, and the deduplicated text information is stored in the multimedia resource search library. The deduplication processing is beneficial to removing a large amount of redundant information, and the space of the multimedia resource search library is saved.
The specific method flow for pre-acquiring and storing the voice information of the multimedia resource provided by the embodiment of the invention is shown in fig. 3, and comprises the following steps:
s301: audio is extracted from the multimedia asset.
S302: and performing voice recognition on the extracted audio frequency, converting the audio frequency into character content, and/or further extracting the characteristics of the audio frequency, and performing compression coding on the extracted audio frequency characteristics to obtain voice information of the multimedia resource in an audio frequency compression coding form.
S303: and storing the converted text content into the multimedia resource search library as the voice information of the multimedia resource, and/or storing the voice information of the multimedia resource in the form of audio compression coding obtained after compression coding into the multimedia resource search library.
In this step, preferably, text summary is performed on the text contents obtained by conversion, and the text contents obtained by summary are stored in the multimedia resource search library as the voice information of the multimedia resources; and/or
In this step, the voice information in the form of audio compression coding of the multimedia resource obtained after compression coding in step S302 is stored in the multimedia resource search library.
Generally, the speech content in multimedia assets is large, but only a portion of it is useful. Therefore, the text summary is made on the converted text content, and the content without practical significance is removed. And then adding the text content obtained by the abstract into a multi-mode media resource search library. Therefore, a large amount of redundant information can be removed, and the space of the multimedia resource search library is saved.
The specific method flow for acquiring and storing the image information of the multimedia resource in advance provided by the embodiment of the invention is shown in fig. 4, and comprises the following steps:
s401: and extracting key frames from the video of the multimedia resource.
In fact, the video of the multimedia resource is composed of one frame and one frame of pictures, and semantic information contained in the pictures is crucial for understanding the video content. The system firstly extracts key frames from the video to obtain the key frames.
S402: and carrying out image content description and/or image object labeling on the extracted key frames, and/or extracting picture pixel characteristics of the key frames and carrying out compression coding.
In this step, each key frame is subjected to image content description to generate text content describing the key frame, and/or each key frame is subjected to image object labeling to obtain character content labeled by the image object. Specifically, the image content description can be performed on the key frame by adopting an artificial intelligence related technology such as deep learning, and the like, so as to obtain the described text content; the image object labeling on the key frame specifically refers to character labeling on an object image identified in the key frame. And/or
In the step, after the picture pixel characteristics of each key frame are extracted and compressed and encoded, the image information of the multimedia resource in the pixel compression encoding form is obtained.
S403: and storing the text content obtained by describing the image content and/or the text content obtained by labeling the image object into the multimedia resource retrieval library as the image information of the multimedia resource in the text form, and/or storing the obtained image information of the multimedia resource in the pixel compression coding form into the multimedia resource retrieval library.
In this step, preferably, the text content obtained by describing the image content and/or the text content obtained by labeling the image object may be subjected to duplication elimination, and the duplicated text content is stored in the multimedia resource search library as the image information of the multimedia resource in the text form; and/or
In this step, the obtained image information of the multimedia resource in the form of pixel compression coding is stored in the multimedia resource search library.
Based on the above method, an internal block diagram of a multimedia resource retrieval device provided in an embodiment of the present invention is shown in fig. 5, and includes: a multimedia resource search library 501, a query request receiving module 502 and a search module 503.
The multimedia resource search library 501 is used for storing multi-modal information of a plurality of multimedia resources; preferably, the multimedia resource search library 501 may further have stored therein: cataloging information for each multimedia asset. Wherein the multi-modal information of the multimedia resource comprises at least one of the following information: text information, voice information, image information.
The query request receiving module 502 is used for receiving a query request sent by a user.
The retrieval module 503 is configured to perform retrieval in the multimedia resource retrieval library 501 according to the query request received by the query request receiving module 502, and return a retrieval result.
Preferably, the retrieving module 503 is configured to analyze the query request to obtain a keyword set K of the query request; expanding the keyword set K to obtain an expanded keyword set K'; and searching in the multimedia resource search library according to the expanded keyword set K'. The specific retrieving method of the retrieving module 503 may refer to the content in the step S102, and is not described herein again.
Further, after retrieving in the multi-modal information and the cataloguing information of the multimedia resource repository according to the expanded keyword set K', for the same multimedia resource, the retrieval module 503 may obtain the cataloguing information of the multimedia resource and the degrees of agreeing, or matching degrees, of the information of different modalities respectively corresponding to the query request, and perform weighted average on the cataloguing information of the multimedia resource and the degrees of agreeing, respectively corresponding to the query request, of the information of different modalities, and take the obtained weighted average as a score of the multimedia resource matching the query request. And returning the retrieval results to the user in a descending order according to the scores.
Alternatively, the retrieval module 503 may be further configured to analyze the query request, and obtain an audio segment in the query request; and according to the audio segments, searching in the audio information in the audio compression coding form in the multimedia resource search library.
Or, the retrieval module 503 may also be configured to analyze the query request and obtain a picture in the query request; and according to the picture, searching in the image information in a pixel compression coding mode in the multimedia resource search library.
Further, the apparatus for retrieving a multimedia resource provided in an embodiment of the present invention may further include: a multimodal information storage module 504;
the multimodal information storage module 504 includes at least one of the following: a text information storage unit 511, a voice information storage unit 512, and an image information storage unit 513.
The text information storage unit 511 is used for identifying text information from the video of the multimedia resource; the recognized text information is stored in the multimedia resource search library 501. The specific method for acquiring and storing the text information of the multimedia resource by the text information storage unit 511 can refer to the above-mentioned steps shown in fig. 2, and will not be described herein again.
The voice information storage unit 512 is configured to extract audio from the multimedia resource, perform voice recognition, convert the audio into text content, and store the text content obtained through conversion into the multimedia resource search library as voice information of the multimedia resource in a text form; extracting audio from the multimedia resource, further extracting the characteristics of the audio, and performing compression coding on the extracted audio characteristics to obtain the voice information of the multimedia resource in the form of audio compression coding, and storing the obtained voice information of the multimedia resource in the form of audio compression coding into the multimedia resource search library 501. The specific method for acquiring and storing the voice information of the multimedia resource by the voice information storage unit 512 can refer to the above steps shown in fig. 3, and is not described herein again.
The image information storage unit 513 extracts a key frame from the video of the multimedia resource, performs image content description and/or image object labeling on the key frame, and stores text content obtained through image content description and/or text content obtained through image object labeling into the multimedia resource search library as image information of the text form of the multimedia resource; and/or extracting a key frame from the video of the multimedia resource, extracting picture pixel characteristics of the key frame, performing compression coding, and storing image information of the multimedia resource in a pixel compression coding form into the multimedia resource search library 501. The specific method for acquiring and storing the image information of the multimedia resource by the image information storage unit 513 can refer to the steps shown in fig. 4, which is not described herein again.
In the technical scheme of the invention, the multi-mode information of the multimedia resources is stored in the multimedia resource retrieval library, retrieval is carried out in the multimedia resource retrieval library according to the query request, and retrieval can be carried out based on information richer than cataloged information, so that the multimedia resources meeting retrieval conditions can be retrieved more fully, and the retrieval requirements of the multimedia resources are better met.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1.一种多媒体资源的检索方法,其特征在于,包括:1. a retrieval method of multimedia resources, is characterized in that, comprises: 接收用户发出的查询请求;Receive query requests sent by users; 根据所述查询请求在多媒体资源检索库中进行检索,针对同一多媒体资源,得到该多媒体资源的编目信息,以及不同模态的信息所分别对应于所述查询请求的契合度;将多媒体资源的编目信息,以及不同模态的信息所分别对应于所述查询请求的契合度做加权平均,将得到的加权平均值作为该多媒体资源匹配于所述查询请求的分值;根据各多媒体资源的分值作降序排序;将各多媒体资源的排序结果作为检索结果,并返回检索结果;According to the query request, the multimedia resource retrieval database is retrieved, and for the same multimedia resource, the cataloging information of the multimedia resource and the degree of fit of the information of different modalities corresponding to the query request are obtained; information, and the degree of fit of the information of different modalities corresponding to the query request respectively, do a weighted average, and use the obtained weighted average as the score of the multimedia resource that matches the query request; according to the score of each multimedia resource Sort in descending order; take the sorting result of each multimedia resource as the retrieval result, and return the retrieval result; 其中,所述多媒体资源检索库中存储有多个多媒体资源的多模态信息。Wherein, the multimedia resource retrieval library stores multimodal information of multiple multimedia resources. 2.根据权利要求1所述的方法,其特征在于,所述多媒体资源的多模态信息包括文本信息;以及2. The method of claim 1, wherein the multimodal information of the multimedia resource comprises textual information; and 所述文本信息是预先存储到所述多媒体资源检索库的:The text information is pre-stored in the multimedia resource retrieval library: 从所述多媒体资源的视频中识别出文本信息;identifying textual information from the video of the multimedia resource; 将识别出的文本信息存储到所述多媒体资源检索库中。The recognized text information is stored in the multimedia resource retrieval library. 3.根据权利要求1所述的方法,其特征在于,所述多媒体资源的多模态信息包括语音信息;其中,所述语音信息是以音频压缩编码形式和/或文字形式预先存储到所述多媒体资源检索库的:3. The method according to claim 1, wherein the multimodal information of the multimedia resource comprises voice information; wherein, the voice information is pre-stored in the Multimedia Resource Retrieval Library: 从所述多媒体资源中提取出音频并进行语音识别后转换为文字内容,将转换得到的文字内容作为所述多媒体资源的文字形式的语音信息存储到所述多媒体资源检索库中;和/或The audio is extracted from the multimedia resource and converted into text content after voice recognition, and the converted text content is stored in the multimedia resource retrieval database as the textual voice information of the multimedia resource; and/or 将从所述多媒体资源中提取出音频,并进一步提取所述音频的特征并对提取出的音频特征进行压缩编码后,得到所述多媒体资源的音频压缩编码形式的语音信息。After extracting audio from the multimedia resource, further extracting features of the audio, and compressing and encoding the extracted audio features, voice information in the form of audio compression encoding of the multimedia resource is obtained. 4.根据权利要求1所述的方法,其特征在于,所述多媒体资源的多模态信息包括图像信息;其中,所述图像信息是以像素压缩编码形式和/或文字形式预先存储到所述多媒体资源检索库的:4. The method according to claim 1, wherein the multimodal information of the multimedia resource comprises image information; wherein, the image information is pre-stored in the Multimedia Resource Retrieval Library: 从所述多媒体资源的视频中抽取关键帧,对所述关键帧进行图像内容描述和/或进行图像物体标注,将图像内容描述得到的文字内容和/或图像物体标注得到的文字内容作为所述多媒体资源的文字形式的图像信息存储到所述多媒体资源检索库中;和/或Extract key frames from the video of the multimedia resource, perform image content description and/or image object annotation on the key frame, and use the text content obtained from the image content description and/or the text content obtained from the image object annotation as the The textual image information of the multimedia resource is stored in the multimedia resource retrieval library; and/or 将从所述多媒体资源的视频中抽取关键帧,提取所述关键帧的图片像素特征并进行压缩编码后,得到所述多媒体资源的像素压缩编码形式的图像信息存储到所述多媒体资源检索库中。Extracting key frames from the video of the multimedia resources, extracting the image pixel features of the key frames and compressing and encoding, obtaining the image information in the form of pixel compression encoding of the multimedia resources and storing them in the multimedia resource retrieval library . 5.根据权利要求3所述的方法,其特征在于,所述根据所述查询请求在多媒体资源检索库中进行检索,包括:5. The method according to claim 3, wherein the retrieval in the multimedia resource retrieval library according to the query request comprises: 分析所述查询请求,获取所述查询请求中的音频片段;analyze the query request, and obtain the audio clip in the query request; 根据所述音频片段,在所述多媒体资源检索库中的音频压缩编码形式的音频信息中进行检索。According to the audio segment, the audio information in the audio compression coding form in the multimedia resource retrieval library is retrieved. 6.根据权利要求4所述的方法,其特征在于,所述根据所述查询请求在多媒体资源检索库中进行检索,包括:6. The method according to claim 4, wherein the retrieval in the multimedia resource retrieval library according to the query request comprises: 分析所述查询请求,获取所述查询请求中的图片;analyze the query request, and obtain the picture in the query request; 根据所述图片,在所述多媒体资源检索库中的像素压缩编码形式的图像信息中进行检索。According to the picture, the image information in the form of pixel compression coding in the multimedia resource retrieval library is retrieved. 7.根据权利要求1所述的方法,其特征在于,在所述根据所述查询请求在多媒体资源检索库中进行检索后,还包括:7. The method according to claim 1, characterized in that, after the retrieval in the multimedia resource retrieval library according to the query request, further comprising: 针对同一多媒体资源,得到该多媒体资源的编目信息,以及不同模态的信息所分别对应于所述查询请求的契合度;For the same multimedia resource, obtain the cataloging information of the multimedia resource, and the degree of fit of the information of different modalities corresponding to the query request respectively; 将多媒体资源的编目信息,以及不同模态的信息所分别对应于所述查询请求的契合度做加权平均,将得到的加权平均值作为该多媒体资源匹配于所述查询请求的分值;Cataloging information of multimedia resources, and the degree of fit of the information of different modalities corresponding to the query request are weighted average, and the weighted average obtained is used as the score of the multimedia resource that matches the query request; 根据各多媒体资源的分值作降序排序;Sort in descending order according to the score of each multimedia resource; 将各多媒体资源的排序结果作为所述检索结果。The sorting result of each multimedia resource is used as the retrieval result. 8.一种多媒体资源的检索装置,包括:8. A retrieval device for multimedia resources, comprising: 多媒体资源检索库,用于存储多个多媒体资源的多模态信息;A multimedia resource retrieval library for storing multimodal information of multiple multimedia resources; 查询请求接收模块,用于接收用户发出的查询请求;The query request receiving module is used to receive the query request sent by the user; 检索模块,用于根据所述查询请求在所述多媒体资源检索库中进行检索,针对同一多媒体资源,得到该多媒体资源的编目信息,以及不同模态的信息所分别对应于所述查询请求的契合度;将多媒体资源的编目信息,以及不同模态的信息所分别对应于所述查询请求的契合度做加权平均,将得到的加权平均值作为该多媒体资源匹配于所述查询请求的分值;根据各多媒体资源的分值作降序排序;将各多媒体资源的排序结果作为检索结果,并返回检索结果。The retrieval module is used to retrieve the multimedia resource retrieval library according to the query request, and for the same multimedia resource, obtain the cataloging information of the multimedia resource, and the matching of the information of different modalities corresponding to the query request respectively The cataloging information of multimedia resources, and the degree of fit of the information of different modalities corresponding to the query request are done weighted average, and the weighted average obtained is used as this multimedia resource to match the score of the query request; Sort in descending order according to the score of each multimedia resource; take the sorting result of each multimedia resource as the retrieval result, and return the retrieval result. 9.根据权利要求8所述的装置,其特征在于,所述多媒体资源的多模态信息至少包括如下信息之一:文本信息、语音信息、图像信息;其中,所述语音信息是以音频压缩编码形式和/或文字形式预先存储到所述多媒体资源检索库的;所述图像信息是以像素压缩编码形式和/或文字形式预先存储到所述多媒体资源检索库的。9. The apparatus according to claim 8, wherein the multimodal information of the multimedia resource comprises at least one of the following information: text information, voice information, and image information; wherein, the voice information is compressed by audio The coded form and/or the text form are pre-stored in the multimedia resource retrieval library; the image information is pre-stored in the multimedia resource retrieval library in the pixel compression coding form and/or the text form. 10.根据权利要求9所述的装置,其特征在于,还包括:多模态信息存储模块;以及10. The apparatus according to claim 9, further comprising: a multimodal information storage module; and 所述多模态信息存储模块包括至少如下单元之一:The multimodal information storage module includes at least one of the following units: 文本信息存储单元,用于从所述多媒体资源的视频中识别出文本信息;将识别出的文本信息存储到所述多媒体资源检索库中;a text information storage unit for identifying text information from the video of the multimedia resource; storing the identified text information in the multimedia resource retrieval library; 语音信息存储单元,用于从所述多媒体资源中提取出音频并进行语音识别后转换为文字内容,将转换得到的文字内容作为所述多媒体资源的文字形式的语音信息存储到所述多媒体资源检索库中;和/或将从所述多媒体资源中提取出音频,并进一步提取所述音频的特征并对提取出的音频特征进行压缩编码后,得到所述多媒体资源的音频压缩编码形式的语音信息,将得到的所述多媒体资源的音频压缩编码形式的语音信息存储到所述多媒体资源检索库中;A voice information storage unit, used for extracting audio from the multimedia resource and converting it into text content after voice recognition, and storing the converted text content as the voice information in the text form of the multimedia resource in the multimedia resource retrieval and/or will extract audio from the multimedia resources, and further extract the features of the audio and compress and encode the extracted audio features to obtain voice information in the form of audio compression coding of the multimedia resources , the obtained voice information of the multimedia resource in the form of audio compression coding is stored in the multimedia resource retrieval library; 图像信息存储单元,用于从所述多媒体资源的视频中抽取关键帧,对所述关键帧进行图像内容描述和/或进行图像物体标注,将图像内容描述得到的文字内容和/或图像物体标注得到的文字内容作为所述多媒体资源的文字形式的图像信息存储到所述多媒体资源检索库中;和/或将从所述多媒体资源的视频中抽取关键帧,提取所述关键帧的图片像素特征并进行压缩编码后,得到所述多媒体资源的像素压缩编码形式的图像信息存储到所述多媒体资源检索库中。An image information storage unit, used for extracting key frames from the video of the multimedia resource, performing image content description and/or image object labeling on the key frame, and labeling the text content and/or image objects obtained from the image content description The obtained text content is stored in the multimedia resource retrieval library as the textual image information of the multimedia resource; and/or a key frame is extracted from the video of the multimedia resource, and the picture pixel feature of the key frame is extracted After the compression coding is performed, the image information in the form of pixel compression coding of the multimedia resource is obtained and stored in the multimedia resource retrieval library.
CN201711108216.XA 2017-11-08 2017-11-08 Multimedia resource retrieval method and device Expired - Fee Related CN107766571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711108216.XA CN107766571B (en) 2017-11-08 2017-11-08 Multimedia resource retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711108216.XA CN107766571B (en) 2017-11-08 2017-11-08 Multimedia resource retrieval method and device

Publications (2)

Publication Number Publication Date
CN107766571A CN107766571A (en) 2018-03-06
CN107766571B true CN107766571B (en) 2021-02-09

Family

ID=61272932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711108216.XA Expired - Fee Related CN107766571B (en) 2017-11-08 2017-11-08 Multimedia resource retrieval method and device

Country Status (1)

Country Link
CN (1) CN107766571B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647245B (en) 2018-04-13 2023-04-18 腾讯科技(深圳)有限公司 Multimedia resource matching method and device, storage medium and electronic device
CN110489594A (en) * 2018-05-14 2019-11-22 北京松果电子有限公司 Image vision mask method, device, storage medium and equipment
CN109255036B (en) * 2018-08-31 2020-02-18 北京字节跳动网络技术有限公司 Method and apparatus for outputting information
CN109446356A (en) * 2018-09-21 2019-03-08 深圳市九洲电器有限公司 A kind of multimedia document retrieval method and device
CN109684553A (en) * 2018-12-26 2019-04-26 北京百度网讯科技有限公司 For obtaining the method and device of information
CN110110099A (en) * 2019-04-12 2019-08-09 华勤通讯技术有限公司 A kind of multimedia document retrieval method and device
CN110532404B (en) * 2019-09-03 2023-08-04 北京百度网讯科技有限公司 Source multimedia determining method, device, equipment and storage medium
CN111159435B (en) * 2019-12-27 2023-09-05 新方正控股发展有限责任公司 Multimedia resource processing method, system, terminal and computer readable storage medium
CN113128285B (en) * 2019-12-31 2025-06-17 华为技术有限公司 Method and device for processing video
CN111221984B (en) * 2020-01-15 2024-03-01 北京百度网讯科技有限公司 Multi-mode content processing method, device, equipment and storage medium
CN112528053A (en) * 2020-12-23 2021-03-19 三星电子(中国)研发中心 Multimedia library classified retrieval management system
CN112818906B (en) * 2021-02-22 2023-07-11 浙江传媒学院 An intelligent cataloging method for all-media news based on multi-modal information fusion understanding
CN113507613A (en) * 2021-06-07 2021-10-15 茂名市群英网络有限公司 CDN-based video input scheduling system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN101968819A (en) * 2010-11-05 2011-02-09 中国传媒大学 Audio/video intelligent catalog information acquisition method facing to wide area network
CN107203586A (en) * 2017-04-19 2017-09-26 天津大学 A kind of automation result generation method based on multi-modal information

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7185049B1 (en) * 1999-02-01 2007-02-27 At&T Corp. Multimedia integration description scheme, method and system for MPEG-7
CN100388282C (en) * 2006-09-14 2008-05-14 浙江大学 Cross-media retrieval method based on multimodal information fusion analysis
CN101272397B (en) * 2008-05-05 2010-11-10 南京师范大学 Method for acquiring addressable stream media based on ASF data amalgamation technology
US20100100439A1 (en) * 2008-06-12 2010-04-22 Dawn Jutla Multi-platform system apparatus for interoperable, multimedia-accessible and convertible structured and unstructured wikis, wiki user networks, and other user-generated content repositories
US8259082B2 (en) * 2008-09-12 2012-09-04 At&T Intellectual Property I, L.P. Multimodal portable communication interface for accessing video content
CN102650993A (en) * 2011-02-25 2012-08-29 北大方正集团有限公司 Index establishing and searching methods, devices and systems for audio-video file
US9292552B2 (en) * 2012-07-26 2016-03-22 Telefonaktiebolaget L M Ericsson (Publ) Apparatus, methods, and computer program products for adaptive multimedia content indexing
US9449002B2 (en) * 2013-01-16 2016-09-20 Althea Systems and Software Pvt. Ltd System and method to retrieve relevant multimedia content for a trending topic
CN103778204A (en) * 2014-01-13 2014-05-07 北京奇虎科技有限公司 Voice analysis-based video search method, equipment and system
CN106209575B (en) * 2016-06-23 2019-09-24 厦门黑镜科技有限公司 Method for sending information, acquisition methods, device and interface system
CN106446051A (en) * 2016-08-31 2017-02-22 北京新奥特云视科技有限公司 Deep search method of Eagle media assets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN101968819A (en) * 2010-11-05 2011-02-09 中国传媒大学 Audio/video intelligent catalog information acquisition method facing to wide area network
CN107203586A (en) * 2017-04-19 2017-09-26 天津大学 A kind of automation result generation method based on multi-modal information

Also Published As

Publication number Publication date
CN107766571A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN107766571B (en) Multimedia resource retrieval method and device
US9489577B2 (en) Visual similarity for video content
CN116150704B (en) Multimodal Fusion Representation Method and System Based on Semantic Similarity Matching
US8126897B2 (en) Unified inverted index for video passage retrieval
CN111008321A (en) Recommendation method and device based on logistic regression, computing equipment and readable storage medium
JP2009537901A (en) Annotation by search
CN114328837B (en) Sequence labeling method, device, computer equipment, and storage medium
CN111353055A (en) Cataloging method and system for extended metadata based on smart tags
CN113806554A (en) Knowledge graph construction method for massive conference texts
CN105005616B (en) Method and system are illustrated based on the text that textual image feature interaction expands
CN107153670A (en) The video retrieval method and system merged based on multiple image
JP2020013272A (en) Feature amount generation method, feature amount generation device, and feature amount generation program
JP6397378B2 (en) Feature value generation method, feature value generation device, and feature value generation program
JP6104209B2 (en) Hash function generation method, hash value generation method, apparatus, and program
CN114218348A (en) Method, device, device and medium for obtaining live segment based on question and answer text
CN110413770B (en) Method and device for classifying group messages into group topics
CN119045880A (en) Code positioning method based on programming language migration
JP2007317133A (en) Image classification method, apparatus and program
CN107918675A (en) A kind of searching method and search system
CN118331502A (en) Cloud resource management method and device and electronic equipment
JP2015079102A (en) Hash function generation method, hash value generation method, hash function generation device, hash value generation device, hash function generation program and hash value generation program
CN117851654A (en) Archives resource retrieval system based on artificial intelligence pronunciation and image recognition
CN107729411A (en) A kind of across media big data retrieval unstructured data compatible models
CN114241361A (en) Video gene extraction and video matching method and device based on video gene
US12339906B2 (en) Method, device, and computer program product for data query

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210209

Termination date: 20211108