WO2019184519A1 - 一种媒体检索方法及装置 - Google Patents

一种媒体检索方法及装置 Download PDF

Info

Publication number
WO2019184519A1
WO2019184519A1 PCT/CN2018/125495 CN2018125495W WO2019184519A1 WO 2019184519 A1 WO2019184519 A1 WO 2019184519A1 CN 2018125495 W CN2018125495 W CN 2018125495W WO 2019184519 A1 WO2019184519 A1 WO 2019184519A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
feature
features
ranking
candidate
Prior art date
Application number
PCT/CN2018/125495
Other languages
English (en)
French (fr)
Inventor
李�根
何轶
李磊
李亦锬
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to JP2019572507A priority Critical patent/JP6991255B2/ja
Priority to US16/962,416 priority patent/US11874869B2/en
Priority to SG11201913922QA priority patent/SG11201913922QA/en
Publication of WO2019184519A1 publication Critical patent/WO2019184519A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the present disclosure relates to the field of media processing technologies, and in particular, to a media retrieval method and apparatus.
  • Media features such as video features, audio features, or media fingerprinting, and feature-based media retrieval are widely used in today's "multimedia information society.”
  • the media search is initially applied to the listening to the song, that is, inputting a piece of audio, and by extracting and retrieving the fingerprint features of the piece of audio, the corresponding song can be identified.
  • media retrieval can also be applied to content monitoring, such as media weight reduction, search-based voice advertisement monitoring, media copyright, and the like.
  • the existing media retrieval methods have the problems of poor accuracy and slow speed, which will cause huge consumption of computing resources and storage resources.
  • the media retrieval method includes the steps of: acquiring a media feature of a media to be retrieved as a first media feature, the first media feature comprising a plurality of first media feature monomers; a media feature unit performs a first ranking on a plurality of known media, and according to the result of the first ranking, fetching the first k known media as a first candidate media set, where k is a positive integer; The first media feature unit sequentially performs a second ranking on the first candidate media set, and according to the result of the second ranking, the first n first candidate media are retrieved as a retrieval result, where n is A positive integer.
  • the object of the present disclosure can also be further achieved by the following technical measures.
  • the foregoing media retrieval method further includes: pre-acquiring the media feature of the known media as a second media feature, the second media feature includes a plurality of second media feature elements; and indexing the second media feature To obtain a feature index of the known media in advance.
  • the foregoing media retrieval method wherein the performing the first ranking of the plurality of known media according to each of the individual first media feature units comprises: selecting a plurality of the first media feature pairs according to each individual The known media performs word frequency-reverse file frequency TF-IDF ranking.
  • the foregoing media retrieval method wherein the performing a word frequency-reverse file frequency TF-IDF ranking on a plurality of known media according to each of the individual first media feature units comprises: indexing the feature of the known media Matching with the first media feature unit to perform the TF-IDF ranking on the known media.
  • the foregoing media retrieval method wherein the obtaining the feature index of the known media in advance comprises: obtaining a positive row feature index and/or an inverted feature index of the known media in advance.
  • the matching the feature index of the known media with the first media feature unit comprises: performing a feature index of the media and the first media feature element as an absolute match.
  • the foregoing media retrieval method wherein the second ranking of the media in the first candidate media set by the first media feature unit arranged according to a plurality of orders comprises: indexing according to a feature of the known media And obtaining, by the first media feature, a similarity matrix of media in the first candidate media set, and ranking media in the first candidate media set according to a straight line in the similarity matrix.
  • the foregoing media retrieval method wherein the obtaining the media feature of the media to be retrieved as the first media feature comprises: acquiring a plurality of types of first media features of the media to be retrieved;
  • the media feature of the media as the second media feature includes: acquiring a plurality of types of second media features of the known media; and obtaining, according to the feature index of the known media, the first media feature
  • the similarity matrix of the media in a candidate media set includes determining the similarity matrix according to the plurality of types of second media features and the plurality of types of first media features.
  • each type of the first media feature comprises a plurality of first media feature monomers
  • each type of the second media feature comprises a plurality of second media feature cells
  • Determining the similarity matrix according to the plurality of types of second media features and the plurality of types of first media features includes: determining the second type of the second media feature unit of the same type and the first a monomer similarity between a media characteristic monomer to obtain a plurality of said monomer similarities; determining an average or minimum value of said plurality of monomer similarities, according to said plurality of monomer similarities The mean or minimum value determines the similarity matrix.
  • the foregoing media retrieval method further includes: pre-processing the search media and the known media according to a preset time length to obtain a plurality of segments of the to-be-retrieved sub-media and a plurality of known sub-medias, and the plurality of segments to be retrieved and the plurality of segments have been
  • the sub-media media extracts the media features separately to obtain a plurality of first sub-media features and a plurality of second sub-media features having the same length.
  • the foregoing media retrieval method further includes: before the performing the first ranking, slicing the obtained first media feature of the to-be-retrieved media and the second media feature of the known media according to a preset length to obtain a plurality of first sub-media features and a plurality of second sub-media features of the same length.
  • the foregoing media retrieval method further includes: determining, according to a straight line in the similarity matrix, a repeated segment of the media to be retrieved and the media in the retrieval result.
  • the media retrieval device includes: a media feature acquisition module, configured to acquire a media feature of the media to be retrieved as a first media feature, the first media feature includes a plurality of first media feature elements; a module, configured to perform a first ranking of a plurality of known media according to each of the individual first media feature units, and take the first k known media as a first candidate according to the result of the first ranking a media set, where k is a positive integer; a second ranking module, configured to perform a second ranking of the first candidate media set according to the first media feature unit arranged in a plurality of orders, according to the second ranking As a result, the first n first candidate media are taken out as a retrieval result, where n is a positive integer.
  • the object of the present disclosure can also be further achieved by the following technical measures.
  • the aforementioned media retrieval device further includes means for performing the steps of any of the foregoing media retrieval methods.
  • a media retrieval hardware device comprising: a memory for storing non-transitory computer readable instructions; and a processor for executing the computer readable instructions such that the processor Any kind of media retrieval method.
  • a computer readable storage medium for storing non-transitory computer readable instructions, when the non-transitory computer readable instructions are executed by a computer, causing the computer to perform any of the foregoing media retrievals method.
  • a terminal device includes any one of the foregoing media retrieval devices.
  • FIG. 1 is a schematic flow chart of a media retrieval method according to an embodiment of the present disclosure.
  • FIG. 2 is a block flow diagram of a media retrieval method in accordance with an embodiment of the present disclosure.
  • FIG. 3 is a block diagram of a first ranking provided by an embodiment of the present disclosure.
  • FIG. 4 is a flow chart of a second ranking provided by an embodiment of the present disclosure.
  • FIG. 5 is a flow chart of determining a sequence similarity score by using a dynamic programming method according to an embodiment of the present disclosure.
  • FIG. 6 is a flow chart of determining a sequence similarity score by using a uniform media method according to an embodiment of the present disclosure.
  • FIG. 7 is a flow chart of determining a similarity matrix based on multiple types of first media features and second media features provided by an embodiment of the present disclosure.
  • FIG. 8 is a block diagram showing the structure of a media retrieval device according to an embodiment of the present disclosure.
  • FIG. 9 is a structural block diagram of a first ranking module according to an embodiment of the present disclosure.
  • FIG. 10 is a structural block diagram of a second ranking module according to an embodiment of the present disclosure.
  • FIG. 11 is a structural block diagram of a media retrieval apparatus for determining a similarity matrix based on a plurality of types of first media features and second media features, in accordance with an embodiment of the present disclosure.
  • FIG. 12 is a hardware block diagram of a media retrieval hardware device in accordance with an embodiment of the present disclosure.
  • Figure 13 is a schematic illustration of a computer readable storage medium in accordance with one embodiment of the present disclosure.
  • FIG. 14 is a block diagram showing the structure of a terminal device according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of an embodiment of a media retrieval method according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic flow chart of an embodiment of a media retrieval method according to an embodiment of the present disclosure. Referring to FIG. 1 and FIG. 2 , the media retrieval method of the example of the present disclosure mainly includes the following steps:
  • Step S10 Obtain a media feature of the media to be retrieved (Query Media).
  • the obtained media feature is a sequence of features including one or more media feature elements, and each media feature element is sequentially arranged in the media feature.
  • the media feature of the media to be retrieved may be referred to as a first media feature, and the media feature included in the first media feature is referred to as a first media feature.
  • the media referred to in various embodiments of the present disclosure may be various types of media such as audio, video, multiple consecutive photos, and the like.
  • the media features therein may be audio features, video features or image features, etc., and in fact the retrieval of video objects may be performed by acquiring audio features of the video objects in accordance with the methods of the present disclosure.
  • Step S20 Perform a first ranking on the plurality of known media according to the first media feature, and take the first k known media as the first candidate media set according to the result of the first ranking.
  • k is a positive integer
  • the specific value of k is configurable.
  • the first ranking is a ranking based on the matching of each individual first media feature element to a known media.
  • the first ranking may be a term frequency-inverse document frequency ranking (referred to as TF-IDF ranking) according to each first media feature unit.
  • TF-IDF ranking term frequency-inverse document frequency ranking
  • Step S30 Perform a second ranking on the first candidate media set according to the first media feature, and take out the first n first candidate media in the first candidate media set as a retrieval result according to the result of the second ranking.
  • n is a positive integer
  • the specific value of n can be set.
  • the second ranking is a ranking of the media in the first candidate media collection by the first media feature unit arranged according to the plurality of orders.
  • the plurality of sequentially arranged first media feature cells include a continuous one of the first media features, the first media feature as a whole, and/or the plurality of sequentially arranged first media feature cells include the first A plurality of first media feature elements having the same interval of sequence numbers in the media feature, such as a plurality of first media feature cells numbered 1, 3, 5, 7, .
  • the media retrieval method proposed by the present disclosure can greatly improve the accuracy and efficiency of media retrieval by performing the first ranking and the second ranking to obtain the retrieval result.
  • the method for extracting media features and the type of media features are not limited.
  • the binary media feature of the media to be retrieved may be extracted, or the media feature (not limited to a specific type) obtained in advance may be binarized to obtain a binary media feature.
  • the media feature unit of each of the binary media features is a bit string composed of 0/1, so that the binary media feature is a sequence of bit strings formed by sequentially arranging a plurality of bit strings.
  • the floating point media feature of the media to be retrieved may be extracted, and each media feature element in the floating point media feature is a floating point number, so that the floating point media feature is sequentially arranged by multiple floating point numbers. The sequence of the composition.
  • the order described herein is arranged such that, in the media feature, the plurality of media feature elements are arranged in chronological order: for example, in the process of extracting the media features in advance, the media objects are first Performing frame drawing, and then generating a media feature unit according to each frame, so that each media feature unit corresponds to each frame of the media object, and then each media feature unit is arranged according to the time sequence of each frame in the media object. Get media features. Therefore, the aforementioned media feature unit can also be referred to as a frame feature.
  • the aforementioned known media may be media in a media database.
  • a plurality of media features of the known media are stored in the media database, and media features of the same type of media features obtained by using the same extraction method as the first media feature are included in the stored media features of the known media. .
  • the media retrieval method of the present disclosure further includes: pre-acquiring media features of a plurality of known media, and for convenience of description and understanding, the media feature of the known media may be referred to as a second media feature.
  • the media feature element included in the second media feature is referred to as a second media feature unit; the second media feature is indexed to obtain a feature index of the known media in advance; the feature index and the first media feature element are Matching is performed to rank TF-IDF for a plurality of known media.
  • the foregoing obtaining the feature index of the known media in advance further includes: obtaining a forward index and an inverted index of the media feature of the known media in advance, so as to compare the media features. Pair and retrieve.
  • the positive row index and the inverted feature index may be pre-stored in a media database.
  • the positive row feature index is used to record the media features of each known media, that is, the media features of the known media are specifically included in the media feature elements and the order of the media feature elements; the inverted feature index is used for It is recorded in which media feature of the known media the individual media feature elements appear.
  • the positive-order feature index and the inverted feature index may be stored in the form of a key-value pair: in the positive-row feature index, a media number is represented by a key (or, It is called media ID), and the value corresponding to the key records which media feature elements are included in the media and the order of the media feature elements.
  • a key or, It is called media ID
  • the keys and values in the positive row feature index as positive a row key, a positive row value
  • a key (key) represents a media feature unit
  • a value corresponding to the key records a media number containing the media feature unit.
  • the key and value in the inverted feature index as the inverted key and the inverted value, respectively.
  • the TF-IDF ranking is a technique for judging the importance of information by weighting the frequency of words and the frequency of reverse files for ranking.
  • the word frequency refers to the frequency at which a word (or a message) appears in an article (or a file). The higher the word frequency, the more important the word is for the article; the frequency of the file refers to a The word appears in the number of articles in the article library, and the reverse file frequency is the reciprocal of the file frequency (in actual calculation, the logarithm of the reverse file frequency can also be taken, or the inverse file frequency is the logarithm of the reciprocal of the file frequency) ), the higher the frequency of the reverse file, the better the discrimination of the word.
  • the TF-IDF ranking is ranked by the size of the product of the word frequency and the reverse file frequency.
  • the media characteristics of a media can be used as an article, and each media feature can be used as a single word, so that the known media can be ranked by the TF-IDF method.
  • the retrieval efficiency may be affected, so before the first ranking, the known media in the media database may be subjected to an exact match.
  • the absolute matching is used to select a known medium that includes the first number of media feature elements in a preset number or a preset ratio as the second candidate media set.
  • the second candidate media set is then ranked first to select the first candidate media set.
  • FIG. 3 is a schematic flow chart of a first ranking provided by an embodiment of the present disclosure.
  • the first ranking specifically includes the following steps:
  • Step S21 According to the inverted feature index, statistics are generated in which second media features of the known media are used by the respective first media feature elements to match the media database from the media database containing the preset number of the first media feature elements.
  • the media is known as the second candidate media collection. Thereafter, the process proceeds to step S22.
  • the “number” in the “predetermined number of first media feature monomers” refers to the type of the first media feature unit.
  • the preset quantity may be one, so that the matched second candidate media set is a known medium in which at least one of the first media feature elements appears in the second media feature; the preset quantity may also be Multiple, may be p (p is a positive integer), so that the matched second candidate media set is a known medium in which at least p first media feature elements appear in the second media feature.
  • Step S22 Determine, according to the positive row feature index, a word frequency of a first media feature element in a second media feature of a second candidate media.
  • the word frequency is the ratio of a first media feature element to all media feature elements included in a second media feature.
  • Step S23 determining a file frequency of a first media feature unit based on the inverted feature index.
  • the file frequency is: among a plurality of known media (for example, all known media in the media database), and the number of known media including the first media feature element in the second media feature is already Know the proportion of the total number of media. It should be noted that the file frequency of each media feature unit can be pre-calculated and stored, and the pre-calculated file frequency data is directly read during the retrieval. Thereafter, the processing proceeds to step S24.
  • Step S24 Determine a word frequency-reverse file frequency score of the second candidate medium according to a word frequency of each of the first media feature elements in a second media feature of the second candidate medium and a file frequency of each of the first media feature units. . Thereafter, the processing proceeds to step S25.
  • Step S25 ranking the second candidate media set according to the obtained word frequency-reverse file frequency score of each second candidate medium, obtaining a result of the first ranking, and extracting the first k second candidate media from the first ranking result.
  • the second media feature (positive row feature index) of each first candidate media may also be returned, so that the first candidate media set is further processed based on the second media feature in a subsequent step S30.
  • the index server may be used as an index request for the first media feature unit of the media to be retrieved, and the absolute matching and the TF-IDF ranking are performed according to the foregoing positive row index and the inverted feature index.
  • the first candidate media set is recalled and the obtained positive candidate feature index of each of the first candidate media is returned at the same time.
  • the above-described various steps can be performed using the open source Elasticsearch search engine to achieve the effect of fast retrieval.
  • the first media feature and the second media feature may be binarized in advance for the index server to perform an index recall.
  • the media retrieval method proposed by the present disclosure can greatly improve the accuracy and efficiency of media retrieval by performing absolute matching and first ranking.
  • the second ranking is a case in which a sequence of sequential media composed of a plurality of first media feature elements arranged in a plurality of orders appears in a media feature of the first candidate media, The ranking of the media in a candidate media collection.
  • the second ranking includes: obtaining a similarity matrix of the media in the first candidate media set according to the feature index of the known media and the first media feature, based on the similarity matrix in the first candidate media set The media is ranked.
  • FIG. 4 is a schematic flow chart of a second ranking provided by an embodiment of the present disclosure.
  • the second ranking specifically includes the following steps:
  • Step S31 Acquire a second media feature of a first candidate media in the first candidate media set (in fact, each of the first candidate media is a known media).
  • the second media feature may be acquired according to a feature index of the known media (eg, a positive row feature index). It may be assumed that the first media feature of the media to be retrieved includes M 1 first media feature monomers, and the second media feature of the first candidate media includes M 2 second media feature cells, where M 1 and M 2 are A positive integer. It should be noted that the first media feature and the second media feature are the same type of media feature obtained by the same media feature extraction method. Thereafter, the processing proceeds to step S32.
  • Step S32 determining a single similarity between each second media feature element included in the second media feature of the first candidate media and each of the first media feature monomers, to obtain a M 1 ⁇ M 2 monomer similarity degree.
  • Each cell similarity indicates the degree of similarity between a first media feature monomer and a second media feature cell. Specifically, the greater the cell similarity, the more similar. Thereafter, the process proceeds to step S33.
  • a distance or metric capable of determining the degree of similarity of two media feature elements may be selected as the single cell similarity according to the type of the media feature.
  • the cosine distance (cosine similarity) between the first media feature unit and the second media feature unit may be determined.
  • the monomer similarity generally the cosine distance can be directly determined as the monomer similarity.
  • the Hamming distance between the first media feature unit and the second media feature unit may be determined according to a Hamming distance.
  • Monomer similarity Specifically, first calculating a Hamming distance between the first media feature unit and the second media feature unit, and then calculating a difference between the length (bit number) of the media feature unit and the Hamming distance, and the difference is The ratio of the value to the length of the media feature cell is determined as a monomer similarity to represent the proportion of the same bit in the two binarized features.
  • the Hamming distance is a commonly used metric in the field of information theory.
  • the Hamming distance between two equal-length strings is the number of different characters corresponding to the positions of the two strings. When actually calculating the Hamming distance, the two strings can be XORed and the result is a number of 1, and this number is the Hamming distance. It should be noted that the media feature monomers extracted by the same method have the same length.
  • any distance or metric that can determine the similarity of the two media feature monomers can be utilized.
  • the monomer similarity may also be referred to as inter-frame similarity if each media feature unit corresponds to each frame of the media object.
  • Step S33 determining a similarity matrix between the first candidate medium and the medium to be retrieved according to each individual similarity.
  • each point in the similarity matrix corresponds to a single cell similarity, such that the similarity matrix records a second media feature element of the first candidate media and each of the first media feature cells.
  • each point of the similarity matrix is arranged in a horizontal direction according to a sequence of the first media feature elements of the media to be retrieved in the first media feature, and longitudinally according to each second media of the first candidate media.
  • the feature elements are arranged in a sequential order in the second media feature.
  • the point located in the i-th row and the j-th column represents the monomer similarity between the i-th first media feature element of the medium to be retrieved and the j-th second media feature element of the first candidate medium, and thus the similarity
  • the degree matrix is an M 1 ⁇ M 2 matrix.
  • step S32 it is not necessary to first perform the calculation of the individual similarity of step S32, and then perform the determining similarity matrix of step S33, but directly determine the similarity matrix, and determine the similarity matrix.
  • the corresponding monomer similarity is calculated in the process of each point.
  • Step S34 Determine a sequence similarity score of the first candidate media according to a similarity matrix of each first candidate media.
  • the sequence similarity score is used to represent the degree of similarity between the first candidate medium and the medium to be retrieved.
  • the sequence similarity score can be a score between 0 and 1. The larger the number, the more similar the two segments of media are. Thereafter, the process proceeds to step S35.
  • sequence similarity score is determined according to a straight line in the similarity matrix.
  • the similarity matrix is a finite matrix, so the so-called "straight line” is a finitely long number of points in the similarity matrix.
  • Line segment. The line has a slope that is the slope of the line connecting the plurality of points included in the line.
  • the starting point and the ending point of the straight line may be any points in the similarity matrix, and are not necessarily points located at the edge.
  • the straight line in the present disclosure includes a diagonal line in the similarity matrix, and each line segment parallel to the diagonal line, and the straight line from the upper left to the lower right in the similarity matrix has a slope of 1, and the slope is not included.
  • 1 straight line may be a straight line with a slope of approximately 1 to improve the robustness of media retrieval; it may be a straight line with a slope of 2, 3, ... or 1/2, 1/3, ..., etc. Retrieving a media object that has been throttled; it can even be a line with a negative slope (a line from the bottom left to the top right in the similarity matrix) to handle the retrieval of media objects that have undergone reverse playback.
  • the diagonal line is a line segment consisting of points at (1,1), (2,2), (3,3)... (actually a point starting from the point in the upper left corner and having a slope of 1) straight line).
  • each straight line in the similarity matrix is composed of a plurality of single similarities arranged in order, so that each straight line represents a similar situation of a plurality of sequentially arranged media feature pairs, thereby being able to represent the to-be-searched
  • Each of the media feature unit pairs includes a first media feature unit and a second media feature unit (that is, each line represents a plurality of sequentially arranged first media feature units and a plurality of sequential arrangements The degree of similarity between the second media feature monomers).
  • the slope of the line and the end point of the starting point represent the length and position of the two segments of the media.
  • a straight line composed of (1,1), (2,3), (3,5), (4,7), because the first media feature unit with a ordinal number of 1 and a second media with a ordinal number of 1
  • the similarity between the characteristic monomers, the first media feature cell with the ordinal number of 2 and the ordinal number are the similarities between the 3 second media feature cells, ..., so that the straight line can react to the ordinal number of 1, 2 3, 4, the first media feature unit corresponding to a segment of the media segment to be retrieved and a similar media segment corresponding to the second media feature number of 1, 3, 5, 7 similar situation.
  • the similarity between a first candidate medium and the medium to be retrieved can be determined according to a straight line in the similarity matrix: it is possible to define the average (or overall condition) of each individual similarity included in a straight line as the Straight line similarity of the line, the line similarity can reflect the similarity between the corresponding plurality of first media feature monomers and the plurality of second media feature cells; determining a straight line with the highest linear similarity in the similarity matrix It may be referred to as a matching straight line; the straight line similarity of the matching straight line is determined as the sequence similarity score of the first candidate medium.
  • a straight line with the highest linear similarity may be determined from a plurality of preset straight lines, for example, the preset multiple straight lines are all the slopes set to a preset slope.
  • a straight line with a fixed value such as a slope of 1
  • a straight line is fitted according to the points to generate A line that makes the straight line similarity the highest.
  • Step S35 Ranking the first candidate media set according to the sequence similarity score of each first candidate medium, obtaining a result of the second ranking, and taking the first n first candidate media from the second ranking result as a retrieval result.
  • the media retrieval method proposed by the present disclosure can greatly improve the accuracy and efficiency of media retrieval by performing the second ranking.
  • FIG. 5 is a schematic flow chart of media retrieval by using a dynamic programming method according to an embodiment of the present disclosure. Referring to FIG. 5, in an embodiment, step S34 includes the following specific steps:
  • Step S34-1a defining a plurality of straight lines whose slopes in the similarity matrix are preset preset slope values as candidate straight lines, and determining the candidate straight lines according to each individual similarity included in each candidate straight line Straight line similarity.
  • the straight line similarity of a straight line may be set as an average value of the individual similarities of the respective units included in the straight line, or may be set as the sum of the individual similarities of the respective units included in the straight line.
  • the slope setting value may be taken as 1, that is, the aforementioned alternative straight line is: a diagonal line in the similarity matrix and a straight line parallel to the diagonal line.
  • step S34-1a further includes: excluding, from the candidate line, those lines including the number of monomer similarities less than a preset straight line length setting value. Then, it proceeds to step S34-1b.
  • the candidate straight line must also satisfy that the number of included monomer similarities reaches a preset straight line length setting value.
  • Step S34-1b from the plurality of candidate straight lines, determine an alternative straight line that maximizes the similarity of the straight line, and defines it as the first matching straight line. Thereafter, the processing proceeds to step S34-1c.
  • step S34-1c the straight line similarity of the first matching straight line is determined as a sequence similarity score.
  • the preset slope setting value in step S34-1a may be multiple, that is, the candidate straight line is equal to any one of the plurality of slope setting values.
  • a straight line for example, an alternative straight line may be a straight line having a slope of 1, -1, 2, 1/2, etc., and in step S34-1b, a plurality of alternative straight lines having a slope from any one of a plurality of slope setting values Determine a first matching line.
  • the media retrieval method proposed by the present disclosure can improve the accuracy and efficiency of media retrieval by using the dynamic programming method to determine the sequence similarity score.
  • FIG. 6 is a schematic flow chart of media retrieval by using a uniform media method according to an embodiment of the present disclosure. Referring to FIG. 6, in an embodiment, step S34 includes the following specific steps:
  • step S34-2a a plurality of points having the largest single similarity are selected as the similarity extreme points in the similarity matrix.
  • the specific number of similarity extreme points taken may be preset. Thereafter, the processing proceeds to step S34-2b.
  • Step S34-2b based on the plurality of similarity extreme points, fitting a straight line as the second matching straight line in the similarity matrix.
  • a straight line having a preset slope set value or a preset slope set value is fitted as a second matching line based on the plurality of similarity extreme points, for example, fitting a line A line with a slope close to 1.
  • a random sample consensus method Random Sample Consensus method, or RANSAC method for short
  • the RANSAC method is a commonly used method for calculating the mathematical model parameters of a data according to a set of sample data sets containing abnormal data to obtain valid sample data. Thereafter, the processing proceeds to step S34-2c.
  • Step S34-2c determining a sequence similarity score according to the plurality of single cell similarities included in the second matching straight line. Specifically, an average value of individual monomer similarities on the second matching straight line may be determined as the sequence similarity score.
  • the media retrieval method proposed by the present disclosure can improve the accuracy and efficiency of media retrieval by using the uniform media method to determine the sequence similarity score.
  • the similarity matrix may be obtained by comprehensive consideration of various media similarities.
  • the media retrieval method of the present disclosure further includes: acquiring a plurality of types of first media features of the media to be retrieved, and acquiring a plurality of types of second media features of the media in the first candidate media set, according to multiple types of The second media feature and the plurality of types of first media features determine a similarity matrix.
  • the second ranking is then performed using a similarity matrix based on multiple types of media features.
  • FIG. 7 is a schematic flow chart of determining a similarity matrix for media retrieval based on multiple types of first media features and second media features according to an embodiment of the present disclosure.
  • the media retrieval method of the present disclosure specifically includes:
  • Step S41 Acquire a plurality of types of first media features of the media to be retrieved, and each type of the first media feature includes a plurality of first media feature cells. Thereafter, the processing proceeds to step S42.
  • the aforementioned floating point feature and binarization feature of the medium to be retrieved are simultaneously acquired.
  • Step S42 acquiring multiple types of second media features of a known media (specifically, the media in the foregoing first candidate media set), each type of second media feature including multiple second media features monomer.
  • a plurality of types of second media features are indexed to obtain feature indices based on a plurality of media features. Thereafter, the processing proceeds to step S43.
  • the aforementioned floating point feature and binarization feature of the known media are simultaneously acquired.
  • Step S43 determining a monomer similarity between the second media feature unit of the same type and the first media feature monomer, respectively.
  • the second media feature unit can be obtained according to the feature index.
  • Step S44 determining an average value or a minimum value of the plurality of monomer similarities, and determining a similarity matrix of the known medium according to the average value or the minimum value of the plurality of monomer similarities.
  • step S34 the sequence similarity score and the result of determining the second ranking are determined based on the similarity matrix based on the average or minimum value of the plurality of individual similarities. step.
  • the effect of using the average or minimum of multiple similarities to determine the similarity matrix is that the media retrieval using a single media feature to obtain similarity may have a mismatch, by taking the average of the similarities of multiple media features or Taking the minimum value can reduce or eliminate the mismatch problem, thereby improving the accuracy of media retrieval.
  • the various monomer similarities have a consistent range of values, for example, the values of all types of monomer similarities can be determined in advance.
  • the range is set to 0 to 1.
  • the aforementioned example of the similarity of the monomers determined according to the cosine distance and the example of the similarity of the monomers determined according to the Hamming distance have all set the range of the similarity of the monomers to 0 to 1.
  • the first media feature of the acquired media to be retrieved further includes a first credibility field for indicating the degree of trust of the first media feature, and/or the acquired known
  • the second media feature of the media further includes a second credibility field for indicating the degree of trust of the second media feature unit; further, the media search method may further include: when determining the cell similarity or determining When the sequence similarity score is used, the first credibility field is used and/or the second credibility field is used for weighting, the high credibility is given high weight, the low credibility is given low weight, and then according to the weighting The resulting sequence similarity score is ranked second.
  • the credibility field may be recorded in the media feature, or may not be included in the media feature and stored separately, and only the correspondence between the media feature and the credibility field needs to be configured.
  • the media retrieval method further includes: before the performing the first ranking, slicing the first media feature of the acquired media to be retrieved and the second media feature of the known media by a preset fixed length Obtaining a plurality of first sub-media features and second sub-media features of the same length (including the same number of media feature elements) (eg, in an embodiment including the step of indexing the second media feature, is in the index Before the media feature is acquired, the retrieved media and the known media are sliced in advance according to a preset fixed length of time, and the plurality of pieces of the media to be retrieved and the known media segments having the same length of time are obtained, and then respectively Obtaining media features of each of the media segments to be retrieved and the known media segments, and obtaining a first sub-media feature of each media segment to be retrieved, and a second sub-media feature of each known media segment.
  • the effect of slicing the media or media features by fixed length is: 1. Make the TF-IDF ranking more fair; 2. The obtained monomer similarity and sequence similarity scores are more accurate; 3. The uniform length is beneficial to the media characteristics and Storage of feature indexes.
  • the plurality of first media feature monomers of the first media feature and the plurality of second media feature cells of the second media feature are temporally arranged, for example, according to time The order of the order.
  • the media retrieval method of the present disclosure further includes determining, based on the aforementioned similarity matrix, a repeated segment of the media to be retrieved and the known media (specifically, the media in the foregoing retrieval result).
  • the start and end times of the repeated segments in the two media can be obtained from the start and end points of the straight line in the similarity matrix.
  • the start and end times of the repeated segments in the media to be retrieved and the known media may be obtained based on the start and end points of the first matching straight line or the second matching straight line described above.
  • the specific method for determining the repeated segments according to the straight line in the similarity matrix may be: the ordinal number of the first media feature unit corresponding to the starting point of the straight line (or the abscissa in the similarity matrix) Determining a start time of the repeated segments in the media to be retrieved, and determining the repeated segments in the first candidate media according to the ordinal number of the second media feature unit corresponding to the starting point (or the ordinate in the similarity matrix) The start time; similarly, the end time of the repeated segment in the medium to be retrieved is determined according to the abscissa of the end point of the straight line, and the end time of the repeated segment in the first candidate medium is determined according to the ordinate of the end point.
  • step S34 further includes: detecting the first part and the ending part of the obtained first matching line or the second matching line, and determining Whether the point (monomer similarity) of the beginning portion and the end portion of the first matching line/second matching line reaches a preset unit similarity setting value, and the beginning of the first matching line/second matching line is removed
  • the portion of the ending that does not reach the monomer similarity setting value ie, the monomer similarity is not high
  • determining the sequence similarity according to the straight line similarity of the third matching straight line The score is scored, and/or the start and end times of the repeated segments of the known media and the media to be retrieved are determined based on the start and end points of the third matching straight line.
  • the accuracy of the media retrieval can be improved, and the accuracy can be obtained more accurately. Repeating fragment.
  • the specific method for removing the portion of the matching straight line at the beginning/end of the matching line that does not reach the unit similarity setting value may be: checking from the start/end point of the matching straight line to the middle to determine whether the single similarity setting value is reached. After finding the first point that reaches the monomer similarity setpoint, remove the point to a number of points between the start/end point.
  • the monomer similarity setting value may be a specific value of a single unit similarity, and it is judged whether a point reaches the value during the inspection; or may be a proportional value, and a point is determined at the time of inspection. Whether the ratio value is reached compared to the average or maximum value of all points included in a matching straight line/second matching straight line.
  • FIG. 8 is a schematic structural diagram of an embodiment of a media retrieval device 100 of the present disclosure.
  • the media retrieval device 100 of the example of the present disclosure mainly includes:
  • the media feature obtaining module 110 is configured to obtain a media feature of the media to be retrieved (Query Media) as the first media feature.
  • the first media feature includes a plurality of first media feature elements.
  • the first ranking module 120 is configured to perform a first ranking on the plurality of known media according to the first media feature, and extract the first k known media as the first candidate media set according to the result of the first ranking.
  • k is a positive integer
  • the specific value of k is configurable.
  • the first ranking module 120 is configured to rank based on the matching of each individual first media feature element to a known media.
  • the first ranking module 120 can be configured to rank the word frequency-reverse file frequency TF-IDF according to the respective first media feature unit to the known media.
  • the second ranking module 130 is configured to perform a second ranking on the first candidate media set according to the first media feature, and extract the first n first candidate media in the first candidate media set according to the result of the second ranking As a result of the search.
  • n is a positive integer, and the specific value of n can be set.
  • the second ranking module 130 is configured to rank the media in the first candidate media set according to the first media feature unit arranged in a plurality of orders.
  • the aforementioned known media may be media in a media database.
  • a plurality of media features of the known media are stored in the media database, and media features of the same type of media features obtained by using the same extraction method as the first media feature are included in the stored media features of the known media. .
  • the media retrieval apparatus 100 of the present disclosure further includes a feature index acquisition module (not shown) for acquiring media features of a plurality of known media as a second media feature, the second The media feature includes a plurality of second media feature elements, and the second media feature is indexed to obtain a feature index of the known media.
  • the first ranking module 120 is specifically configured to match the feature index with the first media feature unit to perform TF-IDF ranking on a plurality of known media.
  • the feature index obtaining module is configured to acquire a forward index and an inverted index of the known media.
  • the first ranking module 120 of the present disclosure may include an absolute matching sub-module 121 for using the first ranking before Perform an exact match on multiple known media.
  • the absolute matching is used to select a known medium that includes the first number of media feature elements in a preset number or a preset ratio as the second candidate media set.
  • the second candidate media set is then ranked first to select the first candidate media set.
  • FIG. 9 is a schematic structural diagram of a first ranking module 120 according to an embodiment of the present disclosure.
  • the first ranking module 120 specifically includes:
  • the absolute matching sub-module 121 is configured to count, according to the inverted feature index, which second media features of the known media are present in the second media feature of the known media, to select, from the media database, the first media that includes the preset number or more The known medium of the feature unit is used as the second candidate media set.
  • the word frequency determining sub-module 122 is configured to determine a word frequency of a first media feature element in a second media feature of a second candidate media based on the positive row feature index.
  • the file frequency determining sub-module 123 is configured to determine a file frequency of a first media feature unit based on the inverted feature index.
  • the word frequency-reverse file frequency scoring sub-module 124 is configured to determine the second frequency according to a word frequency of each first media feature element in a second media feature of a second candidate medium and a file frequency of each first media feature unit The word frequency of the candidate media - the reverse file frequency score.
  • the first ranking sub-module 125 is configured to rank the second candidate media set according to the obtained word frequency-reverse file frequency score of each second candidate medium, obtain a result of the first ranking, and extract the first k from the first ranking result.
  • the second candidate media is used as the first candidate media set; the first ranking sub-module 125 is further configured to return the second media feature (the positive row feature index) of each first candidate media to the second ranking module 130, in preparation for Subsequent further processing.
  • the second ranking is a case in which a sequence of sequential media composed of a plurality of first media feature elements arranged in a plurality of orders appears in a media feature of the first candidate media,
  • the ranking of the media in a candidate media collection is configured to: obtain a similarity matrix of media in the first candidate media set according to a feature index of the known media and the first media feature, and use the similarity matrix to the first candidate media.
  • the media in the collection is ranked.
  • FIG. 10 is a schematic structural diagram of a second ranking module 130 according to an embodiment of the present disclosure.
  • the second ranking module 130 specifically includes:
  • the second media feature acquisition sub-module 131 is configured to acquire a second media feature of one of the first candidate media sets (in fact, each of the first candidate media is a known media). Specifically, the second media feature may be acquired according to a feature index of the known media (eg, a positive row feature index).
  • the unit similarity first determining sub-module 132 is configured to determine a unit similarity between each second media feature unit included in the second media feature of the first candidate medium and each of the first media feature units.
  • the similarity matrix first determining sub-module 133 is configured to determine a similarity matrix between the first candidate medium and the to-be-retrieved medium according to each individual similarity.
  • the sequence similarity score determining sub-module 134 is configured to determine a sequence similarity score of the first candidate media according to a similarity matrix of each first candidate medium. Specifically, the sequence similarity score determination sub-module 134 is configured to determine the sequence similarity score according to a straight line in the similarity matrix.
  • the second ranking sub-module 135 is configured to rank the first candidate media set according to the sequence similarity score of each first candidate medium, obtain a result of the second ranking, and take the first n first from the second ranking result.
  • Candidate media as a search result.
  • sequence similarity score determination sub-module 134 is specifically configured to determine the sequence similarity score by using various specific steps of the foregoing dynamic programming method.
  • sequence similarity score determination sub-module 134 is specifically configured to determine the sequence similarity score by using the specific steps of the foregoing uniform media method.
  • FIG. 11 is a schematic structural diagram of a media retrieval apparatus 100 for determining a similarity matrix based on a plurality of types of first media features and second media features according to an embodiment of the present disclosure.
  • the media retrieval apparatus 100 of the present disclosure further includes:
  • the multi-type first media feature obtaining module 140 is configured to acquire multiple types of first media features of the media to be retrieved, and each type of first media feature includes a plurality of first media feature cells.
  • the multi-type second media feature obtaining module 150 is configured to acquire a plurality of types of second media features of a known media (specifically, the media in the foregoing first candidate media set), each type of second The media feature includes a plurality of second media feature elements.
  • a feature index acquisition module (not shown) may be included for indexing multiple types of second media features to obtain feature indices based on a plurality of media features.
  • the unit similarity second determining sub-module 160 is configured to respectively determine a monomer similarity between the second type of the second media feature unit and the first media feature unit of the same type, thereby obtaining a plurality of monomer similarities .
  • the second media feature unit can be obtained based on the feature index.
  • the similarity matrix second determining sub-module 170 is configured to determine an average value or a minimum value of the plurality of cell similarities, and determine the similarity of the known media according to the average value or the minimum value of the plurality of single cell similarities. matrix.
  • sequence similarity score determination sub-module 134 is configured to determine the sequence similarity score according to the similarity matrix based on the average or minimum values of the plurality of single-body similarities.
  • the first media feature of the acquired media to be retrieved further includes a first credibility field for indicating the degree of trust of the first media feature, and/or the acquired known
  • the second media feature of the media further includes a second credibility field for indicating the degree of trust of the second media feature unit;
  • the second ranking module 130 is further configured to: when determining the monomer similarity or in determining When the sequence similarity score is used, the first credibility field is used and/or the second credibility field is used for weighting, the high credibility is given high weight, the low credibility is given low weight, and then according to the weighting The resulting sequence similarity score is ranked second.
  • the media retrieval device 100 further includes a media slicing module (not shown).
  • the media slicing module is configured to slice the first media feature of the acquired media to be retrieved and the second media feature of the known media according to a preset fixed length before the first ranking, to obtain the same length (including the same quantity)
  • the first sub-media feature and the second sub-media feature of the media feature unit and/or the media slicing module is configured to pre-retrieve the media and the known media according to a preset fixed length of time before acquiring the media feature
  • the plurality of segments of the same length of the media segment to be retrieved and the known media segment are obtained, and then the media features of the media segment to be retrieved and the known media segment are respectively obtained, and the first sub-media features of each media segment to be retrieved are obtained.
  • the foregoing first ranking module 120 and the second ranking module 130 are configured to perform the foregoing first ranking and second ranking according to each of the first sub-media feature and the second sub-media feature, to obtain a retrieval of each sub-media feature. As a result, the retrieval result of the original media to be retrieved is then determined based on the retrieval results of the respective sub-media features.
  • the media retrieval device 100 of the present disclosure further includes a repeating media segment determining module (not shown) for determining a repeated segment of the media to be retrieved and the known media according to the aforementioned similarity matrix.
  • the repeated media segment determining module is specifically configured to obtain start and end times of the repeated segments in the two media according to the start and end points of the straight line in the similarity matrix.
  • FIG. 12 is a hardware block diagram illustrating a media retrieval hardware device in accordance with an embodiment of the present disclosure.
  • the media retrieval hardware device 200 according to an embodiment of the present disclosure includes a memory 201 and a processor 202.
  • the components in the media retrieval hardware device 200 are interconnected by a bus system and/or other form of connection mechanism (not shown).
  • the memory 201 is for storing non-transitory computer readable instructions.
  • memory 201 can include one or more computer program products, which can include various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache or the like.
  • the nonvolatile memory may include, for example, a read only memory (ROM), a hard disk, a flash memory, or the like.
  • the processor 202 can be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and can control other components in the media retrieval hardware device 200 to perform desired functions.
  • the processor 202 is configured to execute the computer readable instructions stored in the memory 201 such that the media retrieval hardware device 200 performs all of the foregoing media retrieval methods of various embodiments of the present disclosure or Part of the steps.
  • FIG. 13 is a schematic diagram illustrating a computer readable storage medium in accordance with an embodiment of the present disclosure.
  • a computer readable storage medium 300 according to an embodiment of the present disclosure has stored thereon non-transitory computer readable instructions 301.
  • the non-transitory computer readable instructions 301 are executed by a processor, all or part of the steps of the media retrieval method of the various embodiments of the present disclosure described above are performed.
  • FIG. 14 is a diagram showing a hardware configuration of a terminal device according to an embodiment of the present disclosure.
  • the terminal device may be implemented in various forms, and the terminal device in the present disclosure may include, but is not limited to, such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP.
  • Mobile terminal devices portable multimedia players
  • navigation devices in-vehicle terminal devices, in-vehicle display terminals, in-vehicle electronic rearview mirrors, and the like
  • fixed terminal devices such as digital TVs, desktop computers, and the like.
  • the terminal device 1100 may include a wireless communication unit 1110, an A/V (audio/video) input unit 1120, a user input unit 1130, a sensing unit 1140, an output unit 1150, a memory 1160, an interface unit 1170, and control.
  • Figure 14 illustrates a terminal device having various components, but it should be understood that not all illustrated components are required to be implemented. More or fewer components can be implemented instead.
  • the wireless communication unit 1110 allows radio communication between the terminal device 1100 and a wireless communication system or network.
  • the A/V input unit 1120 is for receiving an audio or video signal.
  • the user input unit 1130 can generate key input data according to a command input by the user to control various operations of the terminal device.
  • the sensing unit 1140 detects the current state of the terminal device 1100, the location of the terminal device 1100, the presence or absence of a user's touch input to the terminal device 1100, the orientation of the terminal device 1100, the acceleration or deceleration movement and direction of the terminal device 1100, and the like, and A command or signal for controlling the operation of the terminal device 1100 is generated.
  • the interface unit 1170 serves as an interface through which at least one external device can connect with the terminal device 1100.
  • Output unit 1150 is configured to provide an output signal in a visual, audio, and/or tactile manner.
  • the memory 1160 may store a software program or the like that performs processing and control operations performed by the controller 1180, or may temporarily store data that has been output or is to be output.
  • Memory 1160 can include at least one type of storage medium.
  • the terminal device 1100 can cooperate with a network storage device that performs a storage function of the memory 1160 through a network connection.
  • Controller 1180 typically controls the overall operation of the terminal device. Additionally, the controller 1180 can include a multimedia module for reproducing or playing back multimedia data.
  • the controller 1180 can perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
  • the power supply unit 1190 receives external power or internal power under the control of the controller 1180 and provides appropriate power required to operate the various components and components.
  • various embodiments of the media retrieval methods proposed by the present disclosure can be implemented in a computer readable medium using, for example, computer software, hardware, or any combination thereof.
  • various embodiments of the media retrieval method proposed by the present disclosure may be through the use of an application specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device (PLD).
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • DSPD digital signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • a processor a controller
  • microcontroller a microcontroller
  • microprocessor an electronic unit designed to perform the functions described herein, in some cases
  • Various implementations of the proposed media retrieval method can be implemented in the controller 1180.
  • various implementations of the media retrieval methods proposed by the present disclosure can be implemented with separate software modules that allow for the execution of at least one function or operation.
  • the software code can be implemented by a software application (or program) written in any suitable programming language, which can be stored in memory 1160 and executed by controller 1180.
  • the media retrieval method, apparatus, hardware device, computer readable storage medium, and terminal device perform a first ranking of each individual media feature element based on media characteristics of the media to be retrieved and The retrieval result is obtained based on the second ranking of the plurality of sequentially arranged media feature elements in the media features of the media to be retrieved, which can greatly improve the accuracy and efficiency of the media retrieval.
  • exemplary does not mean that the described examples are preferred or better than the other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种媒体检索方法及装置,该方法包括:获取待检索媒体的媒体特征作为第一媒体特征,所述第一媒体特征包含多个第一媒体特征单体(S10);根据每个单独的所述第一媒体特征单体对多个已知媒体进行第一排名,根据所述第一排名的结果,取出前k个所述已知媒体作为第一候选媒体集合(S20),其中k为正整数;根据多个顺序排列的所述第一媒体特征单体对所述第一候选媒体集合进行第二排名,根据所述第二排名的结果,取出前n个所述第一候选媒体作为检索结果(S30),其中n为正整数。

Description

一种媒体检索方法及装置
相关申请的交叉引用
本申请要求申请号为201810272795.X、申请日为2018年3月29日的中国专利申请的优先权,该文献的全部内容以引用方式并入本文。
技术领域
本公开涉及媒体处理技术领域,特别是涉及一种媒体检索方法及装置。
背景技术
视频特征、音频特征等媒体特征(或者称为媒体指纹)以及基于特征的媒体检索在如今的“多媒体信息社会”中具有广泛的应用。媒体检索最初被应用到听歌识曲之中,也就是输入一段音频,通过提取和检索该段音频的指纹特征,就能识别出对应的歌曲。另外,媒体检索也可应用到内容监控之中,比如媒体消重、基于检索的语音广告监控、媒体版权等。
现有的媒体检索方法存在准确性差、速度慢的问题,这对运算资源和存储资源都会产生巨大消耗。
发明内容
本公开的目的在于提供一种新的媒体检索方法及装置。
本公开的目的是采用以下的技术方案来实现的。依据本公开提出的媒体检索方法,包括以下步骤:获取待检索媒体的媒体特征作为第一媒体特征,所述第一媒体特征包含多个第一媒体特征单体;根据每个单独的所述第一媒体特征单体对多个已知媒体进行第一排名,根据所述第一排名的结果,取出前k个所述已知媒体作为第一候选媒体集合,其中k为正整数;根据多个顺序排列的所述第一媒体特征单体对所述第一候选媒体集合进行第二排名,根据所述第二排名的结果,取出前n个所述第一候选媒体作为检索结果,其中n为正整数。
本公开的目的还可以采用以下的技术措施来进一步实现。
前述的媒体检索方法,还包括:预先获取所述已知媒体的媒体特征作为第二媒体特征,所述第二媒体特征包含多个第二媒体特征单体;对所述第二媒体特征进行索引,以预先得到所述已知媒体的特征索引。
前述的媒体检索方法,其中,所述根据每个单独的所述第一媒体特征单体对多个已知媒体进行第一排名包括:根据每个单独的所述第一媒体特征单体对多个已知媒体进行词频-逆向文件频率TF-IDF排名。
前述的媒体检索方法,其中,所述根据每个单独的所述第一媒体特征 单体对多个已知媒体进行词频-逆向文件频率TF-IDF排名包括:将所述已知媒体的特征索引与所述第一媒体特征单体进行匹配,以对所述已知媒体进行所述TF-IDF排名。
前述的媒体检索方法,其中,所述预先得到所述已知媒体的特征索引,包括:预先得到所述已知媒体的正排特征索引和/或倒排特征索引。
前述的媒体检索方法,其中,所述将所述已知媒体的特征索引与所述第一媒体特征单体进行匹配包括:将所述媒体的特征索引与所述第一媒体特征单体进行绝对匹配。
前述的媒体检索方法,其中,所述根据多个顺序排列的所述第一媒体特征单体对所述第一候选媒体集合中的媒体进行第二排名包括:根据所述已知媒体的特征索引与所述第一媒体特征得到所述第一候选媒体集合中的媒体的相似度矩阵,根据所述相似度矩阵中的直线对所述第一候选媒体集合中的媒体进行排名。
前述的媒体检索方法,其中,所述的获取待检索媒体的媒体特征作为第一媒体特征包括,获取所述待检索媒体的多种类型的第一媒体特征;所述的预先获取所述已知媒体的媒体特征作为第二媒体特征包括,获取所述已知媒体的多种类型的第二媒体特征;所述的根据所述已知媒体的特征索引与所述第一媒体特征得到所述第一候选媒体集合中的媒体的相似度矩阵包括,根据所述多种类型的第二媒体特征以及所述多种类型的第一媒体特征来确定所述相似度矩阵。
前述的媒体检索方法,其中,每种类型的所述第一媒体特征包含多个第一媒体特征单体,每种类型的所述第二媒体特征包含多个第二媒体特征单体;所述的根据所述多种类型的第二媒体特征以及所述多种类型的第一媒体特征来确定所述相似度矩阵包括:分别确定同种类型的所述第二媒体特征单体与所述第一媒体特征单体之间的单体相似度,以得到多种所述单体相似度;确定所述多种单体相似度的平均值或最小值,根据所述的多种单体相似度的平均值或最小值确定所述相似度矩阵。
前述的媒体检索方法,还包括:预先对待检索媒体和已知媒体按照预设的时间长度切片,得到多段待检索子媒体和多段已知子媒体,对所述多段待检索子媒体和所述多段已知子媒体分别提取媒体特征,以得到长度相同的多个第一子媒体特征和多个第二子媒体特征。
前述的媒体检索方法,还包括:在进行所述第一排名之前对获得的待检索媒体的所述第一媒体特征和已知媒体的所述第二媒体特征按照预设的长度切片,以得到长度相同的多个第一子媒体特征和多个第二子媒体特征。
前述的媒体检索方法,其中,所述多个第一媒体特征单体在所述第一媒体特征中按时间顺序排列,所述多个第二媒体特征单体在所述第二媒体 特征中按时间顺序排列。
前述的媒体检索方法,还包括:根据所述相似度矩阵中的直线确定所述待检索媒体与所述检索结果中的媒体的重复片段。
本公开的目的还采用以下技术方案来实现。依据本公开提出的媒体检索装置,包括:媒体特征获取模块,用于获取待检索媒体的媒体特征作为第一媒体特征,所述第一媒体特征包含多个第一媒体特征单体;第一排名模块,用于根据每个单独的所述第一媒体特征单体对多个已知媒体进行第一排名,根据所述第一排名的结果,取出前k个所述已知媒体作为第一候选媒体集合,其中k为正整数;第二排名模块,用于根据多个顺序排列的所述第一媒体特征单体对所述第一候选媒体集合进行第二排名,根据所述第二排名的结果,取出前n个所述第一候选媒体作为检索结果,其中n为正整数。
本公开的目的还可以采用以下的技术措施来进一步实现。
前述的媒体检索装置,其还包括执行前述任一媒体检索方法步骤的模块。
本公开的目的还采用以下技术方案来实现。依据本公开提出的一种媒体检索硬件装置,包括:存储器,用于存储非暂时性计算机可读指令;以及处理器,用于运行所述计算机可读指令,使得所述处理器执行时实现前述任意一种媒体检索方法。
本公开的目的还采用以下技术方案来实现。依据本公开提出的一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行前述任意一种媒体检索方法。
本公开的目的还采用以下技术方案来实现。依据本公开提出的一种终端设备,包括前述任意一种媒体检索装置。
上述说明仅是本公开技术方案的概述,为了能更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为让本公开的上述和其他目的、特征和优点能够更明显易懂,以下特举较佳实施例,并配合附图,详细说明如下。
附图说明
图1是本公开一个实施例的媒体检索方法的流程示意图。
图2是本公开一个实施例的媒体检索方法的流程框图。
图3是本公开一个实施例提供的第一排名的流程框图。
图4是本公开一个实施例提供的第二排名的流程框图。
图5是本公开一个实施例提供的利用动态规划法确定序列相似度评分 的流程框图。
图6是本公开一个实施例提供的利用匀速媒体法确定序列相似度评分的流程框图。
图7是本公开一个实施例提供的基于多种类型第一媒体特征、第二媒体特征确定相似度矩阵的流程框图。
图8是本公开一个实施例的媒体检索装置的结构框图。
图9是本公开一个实施例提供的第一排名模块的结构框图。
图10是本公开一个实施例提供的第二排名模块的结构框图。
图11是本公开一个实施例的基于多种类型第一媒体特征、第二媒体特征确定相似度矩阵的媒体检索装置的结构框图。
图12是本公开一个实施例的媒体检索硬件装置的硬件框图。
图13是本公开一个实施例的计算机可读存储介质的示意图。
图14是本公开一个实施例的终端设备的结构框图。
具体实施方式
为更进一步阐述本公开为达成预定发明目的所采取的技术手段及功效,以下结合附图及较佳实施例,对依据本公开提出的媒体检索方法及装置的具体实施方式、结构、特征及其功效,详细说明如后。
图1为本公开的媒体检索方法一个实施例的示意性流程图,图2为本公开的媒体检索方法一个实施例的示意性流程框图。请参阅图1和图2,本公开示例的媒体检索方法,主要包括以下步骤:
步骤S10,获取待检索媒体(Query媒体)的媒体特征。
具体地,所得的该媒体特征为包含一个或多个媒体特征单体的特征序列,各个媒体特征单体在媒体特征中按照先后顺序排列。为了便于叙述和理解,不妨将待检索媒体的媒体特征称为第一媒体特征,第一媒体特征所包含的媒体特征单体称为第一媒体特征单体。此后,处理进到步骤S20。
需要说明的是,在本公开的各个实施例中所说的媒体可以是音频、视频、多张连拍的照片等各种类型的媒体。其中的媒体特征可以是音频特征、视频特征或图像特征等等,事实上可以按照本公开的方法通过获取视频对象的音频特征来进行视频对象的检索。
步骤S20,根据该第一媒体特征,对多个已知媒体进行第一排名,根据该第一排名的结果,取出前k个已知媒体作为第一候选媒体集合。其中的k为正整数,而k的具体取值是可以设置的。具体地,该第一排名是根据每个单独的第一媒体特征单体与已知媒体的匹配情况进行的排名。进一步地,该第一排名可以是根据各个第一媒体特征单体对已知媒体进行的词频-逆向文件频率排名(term frequency–inverse document frequency ranking, 简称为TF-IDF排名)。此后,处理进到步骤S30。
步骤S30,根据该第一媒体特征,对该第一候选媒体集合进行第二排名,根据该第二排名的结果,取出第一候选媒体集合中的前n个第一候选媒体作为检索结果。其中的n为正整数,而n的具体取值是可以设置的。具体地,该第二排名为根据多个顺序排列的第一媒体特征单体,对该第一候选媒体集合中的媒体进行的排名。例如,该多个顺序排列的第一媒体特征单体包括第一媒体特征中的连续的一部分、该第一媒体特征整体,和/或该多个顺序排列的第一媒体特征单体包括第一媒体特征中的具有相同间隔的序号的多个第一媒体特征单体,例如序号为1、3、5、7、...的多个第一媒体特征单体。
本公开提出的媒体检索方法,通过进行第一排名和第二排名以得到检索结果,能够大大提高媒体检索的准确性和效率。
下面对上述各步骤分别进行详细的陈述和说明。
一、关于步骤S10。
需要说明的是,对媒体特征的提取方法以及媒体特征的类型不做限制。在本公开的一种示例中,可以提取待检索媒体的二值数媒体特征,或者将提前得到的媒体特征(不限具体类型)进行二值化,以得到二值数媒体特征。其中,二值数媒体特征中的每个媒体特征单体是一段由0/1组成的比特串,从而该二值数媒体特征为由多个比特串顺序排列而构成的比特串序列。而在另一种示例中,可以提取待检索媒体的浮点数媒体特征,浮点数媒体特征中的每个媒体特征单体是一个浮点数,从而该浮点数媒体特征由多个浮点数顺序排列而构成的序列。
在一些实施例中,这里所说的顺序排列为,在媒体特征中,多个媒体特征单体是按时间的先后顺序排列的:例如,在预先的提取媒体特征的过程中,先对媒体对象进行抽帧,再根据每一帧生成一个媒体特征单体,从而各个媒体特征单体与媒体对象的各个帧相对应,然后将各个媒体特征单体按照各个帧在媒体对象中的时间顺序进行排列得到媒体特征。因此也可以将前述的媒体特征单体称为帧特征。
二、关于步骤S20。
前述的已知媒体可以是一个媒体数据库中的媒体。在该媒体数据库中存储有大量的已知媒体的媒体特征,并且在存储的已知媒体的媒体特征中包含有与第一媒体特征利用相同提取方法得到的与第一媒体特征相同类型的媒体特征。
在本公开的一些实施例中,本公开的媒体检索方法还包括:预先获取多个已知媒体的媒体特征,为了便于叙述和理解,不妨将已知媒体的媒体特征称为第二媒体特征,第二媒体特征所包含的媒体特征单体称为第二媒 体特征单体;对该第二媒体特征进行索引,以预先得到已知媒体的特征索引;将该特征索引与第一媒体特征单体进行匹配,以对多个已知媒体进行TF-IDF排名。
具体地,前述的预先得到已知媒体的特征索引进一步包括,预先得到已知媒体的媒体特征的正排特征索引(forward index)和倒排特征索引(inverted index),以便于对媒体特征的比对和检索。该正排特征索引和倒排特征索引可以预先存储在媒体数据库中。其中,正排特征索引用于记录各个已知媒体的媒体特征,即记录了各个已知媒体的媒体特征具体包含了哪些媒体特征单体以及这些媒体特征单体的顺序;倒排特征索引用于记录各个媒体特征单体在哪个或哪些已知媒体的媒体特征中出现。具体地,可以利用键值对(key-value对)的形式来存储该正排特征索引和倒排特征索引:在正排特征索引中,用一个键(key)表示一个媒体的编号(或者,称为媒体ID),而与该键对应的值(value)记录该媒体包含了哪些媒体特征单体以及这些媒体特征单体的顺序,不妨将正排特征索引中的键、值分别称为正排键、正排值;在倒排特征索引中,用一个键(key)表示一个媒体特征单体,而与该键对应的值(value)记录包含有该媒体特征单体的媒体的编号,不妨将倒排特征索引中的键、值分别称为倒排键、倒排值。
其中的TF-IDF排名是一类通过对信息进行词频和逆向文件频率加权,来判断信息的重要程度,以进行排名的技术。其中的词频是指一个词(或者说,一个信息)在某个文章(或者说,某个文件)中出现的频率,词频越高说明该词对于该文章越重要;其中的文件频率是指一个词出现在了文章库中的多少个文章中,而逆向文件频率是文件频率的倒数(实际计算时,还可对逆向文件频率取对数,或者定义逆向文件频率是文件频率的倒数的对数),逆向文件频率越高,说明该词的区分度越好。因此,TF-IDF排名利用词频与逆向文件频率的乘积的大小进行排名。事实上,可以将一个媒体的媒体特征作为一个文章,而每个媒体特征单体作为一个词,从而能够利用TF-IDF方式对已知媒体进行排名。
另外,如果对媒体数据库中的所有已知媒体都进行第一排名,可能会影响检索效率,因此在第一排名之前,可以先对媒体数据库中的已知媒体进行绝对匹配(exact match)。其中的绝对匹配,用于选出所包含的第一媒体特征单体的数量在预设数量或预设比例以上的已知媒体作为第二候选媒体集合。然后再对该第二候选媒体集合进行第一排名,以选出第一候选媒体集合。
图3为本公开一个实施例提供的第一排名的示意性流程框图。请参阅图3,在本公开一个实施例中,第一排名具体包括以下步骤:
步骤S21,根据倒排特征索引,统计各个第一媒体特征单体在哪些已知 媒体的第二媒体特征中出现,以从媒体数据库中匹配出包含预设数量以上第一媒体特征单体的已知媒体作为第二候选媒体集合。此后,处理进到步骤S22。
需要注意的是,“预设数量以上第一媒体特征单体”中的“数量”指的是第一媒体特征单体的种类。具体地,该预设数量可以是一个,从而匹配出的第二候选媒体集合为第二媒体特征中至少出现了某一种第一媒体特征单体的已知媒体;该预设数量也可以是多个,不妨为p个(p为正整数),从而匹配出的第二候选媒体集合为第二媒体特征中至少出现了p种第一媒体特征单体的已知媒体。
步骤S22,基于正排特征索引,确定一个第一媒体特征单体在一个第二候选媒体的第二媒体特征中的词频。该词频为:一个第一媒体特征单体在一个第二媒体特征所包含的全部媒体特征单体之中所占的比例。此后,处理进到步骤S23。
步骤S23,基于倒排特征索引,确定一个第一媒体特征单体的文件频率。该文件频率为:在多个已知媒体之中(例如,可以是媒体数据库中所有的已知媒体),第二媒体特征中包含有该第一媒体特征单体的已知媒体的数量占已知媒体总数的比例。需要注意的是,可以预先计算出各个媒体特征单体的文件频率并存储起来,在检索时直接读取该预先计算出的文件频率数据。此后,处理进到步骤S24。
步骤S24,根据各个第一媒体特征单体在一个第二候选媒体的第二媒体特征中的词频以及各个第一媒体特征单体的文件频率,确定该第二候选媒体的词频-逆向文件频率评分。此后,处理进到步骤S25。
步骤S25,根据得到的各个第二候选媒体的词频-逆向文件频率评分对第二候选媒体集合进行排名,得到第一排名的结果,从该第一排名结果中取出前k个第二候选媒体作为第一候选媒体集合。同时,还可以返回各个第一候选媒体的第二媒体特征(正排特征索引),以备在后续的步骤S30中基于该第二媒体特征对第一候选媒体集合进行进一步处理。
在本实施例中,可以利用索引服务器,将待检索媒体的第一媒体特征单体的集合作为索引请求,根据前述的正排特征索引和倒排特征索引,进行绝对匹配和TF-IDF排名,以召回第一候选媒体集合并同时返回得到的各个第一候选媒体的正排特征索引。具体地,可以利用开源的Elasticsearch搜索引擎进行上述的各个步骤,以达到快速检索的效果。另外,可预先将第一媒体特征和第二媒体特征进行二值化处理,以便于索引服务器进行索引召回。
值得注意的是,绝对匹配和第一排名着重关注各个第一媒体特征单体出现在哪些已知媒体中以及第一媒体特征单体本身的检索情况,并未考虑 各个第一媒体特征单体在第一媒体特征中的顺序对检索的影响,或者说并未考虑媒体特征的整体或连续多个媒体特征单体的检索情况。
本公开提出的媒体检索方法,通过进行绝对匹配和第一排名,能够大大提高媒体检索的准确性和效率。
三、关于步骤S30。
在本公开的一些实施例中,该第二排名为根据多个顺序排列的第一媒体特征单体所组成的具有先后顺序的序列在第一候选媒体的媒体特征中出现的情况,对该第一候选媒体集合中的媒体进行的排名。具体地,该第二排名包括:根据已知媒体的特征索引与第一媒体特征得到该第一候选媒体集合中的媒体的相似度矩阵,基于该相似度矩阵对该第一候选媒体集合中的媒体进行排名。
图4为本公开一个实施例提供的第二排名的示意性流程框图。请参阅图4,在本公开的一个实施例中,第二排名具体包括以下步骤:
步骤S31,获取第一候选媒体集合中的一个第一候选媒体(事实上每个第一候选媒体都是已知媒体)的第二媒体特征。具体地,可以根据已知媒体的特征索引(例如,正排特征索引)获取该第二媒体特征。不妨假设待检索媒体的第一媒体特征包含M 1个第一媒体特征单体,该第一候选媒体的第二媒体特征包含M 2个第二媒体特征单体,其中的M 1和M 2为正整数。需要说明的是,该第一媒体特征与该第二媒体特征为通过同种媒体特征提取方法得到的同种类型的媒体特征。此后,处理进到步骤S32。
步骤S32,确定该第一候选媒体的第二媒体特征所包含的各个第二媒体特征单体与各个第一媒体特征单体之间的单体相似度,得到M 1×M 2个单体相似度。每个单体相似度表示一个第一媒体特征单体与一个第二媒体特征单体之间的相似程度,具体可以是,单体相似度越大表示越相似。此后,处理进到步骤S33。
在本公开的实施例中,可以根据媒体特征的类型,选择能够判断两个媒体特征单体的相似程度的距离或度量作为该单体相似度。
具体地,当第一媒体特征单体、第二媒体特征单体同为浮点数特征时,可根据第一媒体特征单体与第二媒体特征单体之间的余弦距离(余弦相似度)确定该单体相似度;一般可直接将该余弦距离确定为单体相似度。
而当第一媒体特征单体、第二媒体特征单体同为二值化特征时,可根据第一媒体特征单体与第二媒体特征单体之间的汉明距离(Hamming距离)确定该单体相似度。具体地,先计算第一媒体特征单体与第二媒体特征单体之间的汉明距离,再计算媒体特征单体的长度(比特数)与该汉明距离的差值,并将该差值与该媒体特征单体长度的比值确定为单体相似度,用以表示两个二值化特征中的相同比特所占的比例。其中的汉明距离是一种 信息论领域中常用的度量,两个等长字符串之间的汉明距离是两个字符串对应位置的不同字符的个数。在实际计算汉明距离时,可以对两个字符串进行异或运算,并统计结果为1的个数,而这个数就是汉明距离。需要说明的是,利用同种方法提取得到的媒体特征单体具有相同的长度。
值得注意的是,不限于利用余弦距离或汉明距离表示该单体相似度,而是可以利用任何可以判断两个媒体特征单体的相似程度的距离或度量。
需要说明的是,如果各个媒体特征单体与媒体对象的各个帧相对应则也可将单体相似度称为帧间相似度。
步骤S33,根据各个单体相似度,确定该第一候选媒体与待检索媒体之间的相似度矩阵(similarity matrix)。
具体地,该相似度矩阵中的每个点对应一个单体相似度,使得该相似度矩阵记录有一个第一候选媒体的各个第二媒体特征单体与各个第一媒体特征单体之间的单体相似度。并且,该相似度矩阵的各个点:在横向上按照待检索媒体的各个第一媒体特征单体在第一媒体特征中的先后顺序排列,且在纵向上按照第一候选媒体的各个第二媒体特征单体在第二媒体特征中的先后顺序排列。从而位于第i行第j列的点表示待检索媒体的第i个第一媒体特征单体和第一候选媒体的第j个第二媒体特征单体之间的单体相似度,进而该相似度矩阵为一个M 1×M 2矩阵。此后,处理进到步骤S34。
需要说明的是,在实际操作中,并非一定先进行步骤S32的计算各个单体相似度,再进行步骤S33的确定相似度矩阵,而是可以直接确定相似度矩阵,在确定该相似度矩阵的各个点的过程中计算对应的单体相似度。
步骤S34,根据每个第一候选媒体的相似度矩阵,确定该第一候选媒体的序列相似度评分。该序列相似度评分用于表现该第一候选媒体与待检索媒体之间的相似程度。该序列相似度评分可以是一个0到1之间的分数,数字越大表示两段媒体越相似。此后,处理进到步骤S35。
具体地,根据相似度矩阵中的直线来确定该的序列相似度评分。
需注意,由于媒体特征一般包含有穷的多个媒体特征单体,从而相似度矩阵为有穷矩阵,因此实际上所谓的“直线”是相似度矩阵中的多个点组成的有穷长的线段。该直线具有斜率,该斜率为直线所包括的多个点的连线的斜率。另外,该直线的起点和终点可以是相似度矩阵中的任意的点,不必是位于边缘的点。
本公开所说的直线包括相似度矩阵中的对角线、与该对角线相平行的各条线段这些在相似度矩阵中从左上到右下的斜率为1的直线,还包括斜率不为1的直线。例如,可以是的斜率近似于1的直线,以提高媒体检索的鲁棒性;可以是斜率为2、3、...或1/2、1/3、...等的直线,以应对经 过调速的媒体对象的检索;甚至可以是斜率为负数的直线(在相似度矩阵中从左下到右上的直线),以应对经过反向播放处理的媒体对象的检索。其中的对角线为由位于(1,1)、(2,2)、(3,3)...的点组成的线段(事实上就是以左上角的点为起点且斜率为1的一条直线)。
事实上,相似度矩阵中的每条直线均由顺序排列的多个单体相似度构成,因此由于每条直线表现了多个顺序排列的媒体特征单体对的相似情况,从而能够表现待检索媒体中的一段媒体片段与已知媒体中的一段媒体片段的相似程度。其中每个媒体特征单体对包括一个第一媒体特征单体和一个第二媒体特征单体(也就是说,每条直线表现了多个顺序排列的第一媒体特征单体与多个顺序排列的第二媒体特征单体之间的相似程度)。而直线的斜率、起点终点表现了两段媒体片段的长度、位置。例如,由(1,1)、(2,3)、(3,5)、(4,7)构成的直线,由于表现了序数为1的第一媒体特征单体与序数为1第二媒体特征单体之间的相似情况、序数为2的第一媒体特征单体与序数为3第二媒体特征单体之间的相似情况、...,从而该直线能够反应序数为1、2、3、4的第一媒体特征单体所对应的一段待检索媒体片段与序数为1、3、5、7的第二媒体特征单体所对应的一段已知媒体片段之间的相似情况。
因此,可以根据相似度矩阵中的直线来确定一个第一候选媒体与待检索媒体之间的相似情况:不妨将一个直线所包含的各个单体相似度的平均情况(或总体情况)定义为该直线的直线相似度,该直线相似度能够体现对应的多个第一媒体特征单体与多个第二媒体特征单体之间的相似情况;在相似度矩阵中确定一条直线相似度最高的直线,不妨称为匹配直线;将匹配直线的直线相似度确定为第一候选媒体的序列相似度评分。
需要注意的是,在确定匹配直线的过程中,可以是从预设的多条直线中确定一条直线相似度最高的直线,例如该预设的多条直线为所有的斜率为预设的斜率设定值(比如斜率为1)的直线,或者,也可以是先从相似度矩阵中选取使得单体相似度的大小排名靠前的多个点,再根据这些点拟合出一条直线,以生成一条使得直线相似度相对最高的直线。
步骤S35,根据各个第一候选媒体的该序列相似度评分对第一候选媒体集合进行排名,得到第二排名的结果,从该第二排名结果中取出前n个第一候选媒体作为检索结果。
本公开提出的媒体检索方法,通过进行第二排名,能够大大提高媒体检索的准确性和效率。
在本公开的一个具体实施例中,可以利用动态规划法来根据相似度矩阵确定序列相似度评分。图5为本公开一个实施例提供的利用动态规划法进行媒体检索的示意性流程框图。请参阅图5,在一种实施例中,步骤S34 包括以下具体步骤:
步骤S34-1a,将相似度矩阵中的斜率为预设的斜率设定值的多条直线定义为备选直线,根据每条备选直线所包含的各个单体相似度确定该备选直线的直线相似度。具体地,一条直线的直线相似度可以设置为该直线所包含的各个单体相似度的平均值,或者可以设置为该直线所包含的各个单体相似度的总和值。在一种具体示例中,可以将斜率设定值取为1,即前述的备选直线为:相似度矩阵中的对角线以及与该对角线平行的直线。此后,处理进到步骤S34-1b。
需要注意的是,在本公开的一种实施例中,步骤S34-1a还包括:先从备选直线中排除那些包含的单体相似度的数量少于预设的直线长度设定值的直线,然后再进到步骤S34-1b。或者说,在本实施例中,备选直线还须满足:包含的单体相似度的数量达到预设的直线长度设定值。通过排除单体相似度过少的直线,可以排除当直线包含的单体相似度过少而影响最终得到的序列相似度评分的准确性的问题。
步骤S34-1b,从该多条备选直线中,确定一条使得该直线相似度最大的备选直线,并定义为第一匹配直线。此后,处理进到步骤S34-1c。
步骤S34-1c,将该第一匹配直线的直线相似度确定为序列相似度评分。
需要注意的是,在本公开的一些实施例中,步骤S34-1a中的预设的斜率设定值可以为多个,即备选直线为斜率与多个斜率设定值中任意一个相等的直线,例如备选直线可以为斜率为1、-1、2、1/2等的直线,并且在步骤S34-1b中,从斜率为多个斜率设定值中任意一个的多条备选直线中确定一条第一匹配直线。
本公开提出的媒体检索方法,通过利用动态规划法来确定序列相似度评分,能够提高媒体检索的准确性和效率。
在本公开的另一个具体实施例中,可以利用匀速媒体法来根据相似度矩阵确定序列相似度评分。图6为本公开一个实施例提供的利用匀速媒体法进行媒体检索的示意性流程框图。请参阅图6,在一种实施例中,步骤S34包括以下具体步骤:
步骤S34-2a,在相似度矩阵中选取单体相似度最大的多个点作为相似度极值点。所取的相似度极值点的具体数量可以是预设的。此后,处理进到步骤S34-2b。
步骤S34-2b,基于该多个相似度极值点,在该相似度矩阵中拟合出一条直线作为第二匹配直线。在一些具体示例中,基于该多个相似度极值点拟合出一条具有预设的斜率设定值或接近预设的斜率设定值的直线作为第二匹配直线,例如,拟合出一条斜率接近1的直线。具体地,可以利用随机抽样一致法(Random Sample Consensus法,简称为RANSAC法)在该相 似度矩阵中拟合出一条斜率接近斜率设定值的直线。其中的RANSAC法是一种常用的根据一组包含异常数据的样本数据集,计算出数据的数学模型参数,以得到有效样本数据的方法。此后,处理进到步骤S34-2c。
步骤S34-2c,根据该第二匹配直线所包含的多个单体相似度来确定序列相似度评分。具体地,可将该第二匹配直线上的各个单体相似度的平均值确定为该序列相似度评分。
本公开提出的媒体检索方法,通过利用匀速媒体法来确定序列相似度评分,能够提高媒体检索的准确性和效率。
进一步地,其中相似度矩阵可以是由多种媒体相似度综合考量得到的。具体地,本公开的媒体检索方法还包括:获取待检索媒体的多种类型的第一媒体特征,获取第一候选媒体集合中的媒体的多种类型的第二媒体特征,根据多种类型的第二媒体特征以及多种类型的第一媒体特征来确定相似度矩阵。然后利用基于多种类型媒体特征的相似度矩阵来进行第二排名。
图7为本公开一个实施例的基于多种类型的第一媒体特征和第二媒体特征来确定相似度矩阵以进行媒体检索的示意性流程框图。请参阅图7,在本公开的一个实施例中,本公开的媒体检索方法具体包括:
步骤S41,获取待检索媒体的多种类型的第一媒体特征,每种类型的第一媒体特征包含多个第一媒体特征单体。此后,处理进到步骤S42。
例如,同时获取待检索媒体的前述的浮点数特征和二值化特征。
步骤S42,获取一个已知媒体(具体地,可以是前述的第一候选媒体集合中的媒体)的多种类型的第二媒体特征,每种类型的第二媒体特征包含多个第二媒体特征单体。对多种类型的第二媒体特征进行索引,以得到基于多种媒体特征的特征索引。此后,处理进到步骤S43。
例如,同时获取已知媒体的前述的浮点数特征和二值化特征。
步骤S43,分别确定同种类型的该第二媒体特征单体与该第一媒体特征单体之间的单体相似度。从而对应于多种类型的媒体特征,能够得到多种单体相似度。其中,第二媒体特征单体可以根据特征索引得到。此后,处理进到步骤S44。
步骤S44,确定多种单体相似度的平均值或最小值,并根据多种单体相似度的该平均值或该最小值确定该已知媒体的相似度矩阵。
此后,处理进到前述示例的步骤S34,并在步骤S34中根据该基于多种单体相似度的平均值或最小值的相似度矩阵,来确定序列相似度评分以及确定第二排名的结果等步骤。
利用多种相似度的平均值或最小值确定相似度矩阵的效果在于:利用单种媒体特征得到相似度进行媒体检索可能存在误匹配的情况,通过取多种媒体特征的相似度的平均值或取最小值,能够减少或排除该误匹配问题, 从而提高媒体检索的准确性。
需要说明的是,在取多种单体相似度的平均值或最小值之前,需要确保各种单体相似度具有一致的取值范围,例如可以预先将所有类型的单体相似度的取值范围均设置为0到1。事实上,前述的根据余弦距离确定的单体相似度的示例以及根据汉明距离确定的单体相似度的示例,均已将单体相似度的取值范围设置为0到1。
在本公开的一些实施例中,获取的待检索媒体的第一媒体特征中还包含用于表示第一媒体特征单体的可信程度的第一可信度字段,和/或获取的已知媒体的第二媒体特征中还包含用于表示第二媒体特征单体的可信程度的第二可信度字段;进而,该媒体检索方法可以还包括:在确定单体相似度时或在确定序列相似度评分时,利用该第一可信度字段和/或利用该第二可信度字段进行加权,可信度高的赋予高权重,可信度低的赋予低权重,然后根据加权后得到的序列相似度评分进行第二排名。需要说明的是,该可信度字段可以记录在媒体特征之中,或者也可以不包含于媒体特征之中而单独存储,仅需配置好媒体特征与可信度字段的对应关系。
在本公开的一些实施例中,该媒体检索方法还包括:在进行第一排名之前,对获取的待检索媒体的第一媒体特征以及已知媒体的第二媒体特征按照预设的固定长度切片,得到多个长度相同(包含相同数量的媒体特征单体)的第一子媒体特征和第二子媒体特征(例如,在包括对第二媒体特征进行索引的步骤的实施例中,是在索引之前进行切片);和/或,在获取媒体特征之前,预先对待检索媒体以及已知媒体按照预设的固定时间长度切片,得到多段时间长度相同的待检索媒体片段和已知媒体片段,然后分别获取各个待检索媒体片段和已知媒体片段的媒体特征,得到各个待检索媒体片段的第一子媒体特征、各个已知媒体片段的第二子媒体特征。之后,根据每个第一子媒体特征、第二子媒体特征进行前述的第一排名和第二排名的步骤,得到各个子媒体特征的检索结果,然后根据各个子媒体特征的检索结果确定原始的待检索媒体的检索结果。
通过对媒体或媒体特征按照固定长度切片的效果在于:1、使TF-IDF排名更加公平;2、求得的单体相似度、序列相似度评分更加准确;3、统一长度有利于媒体特征及特征索引的存储。
在本公开的一些实施例中,第一媒体特征中的多个第一媒体特征单体以及第二媒体特征中的多个第二媒体特征单体在排列上具有时间性,例如,是按照时间的先后顺序排列的。这时,本公开的媒体检索方法还包括:根据前述的相似度矩阵确定待检索媒体与已知媒体(具体地,可以是前述的检索结果中的媒体)的重复片段。具体地,可以根据相似度矩阵中的直线的起点和终点得到两个媒体中的重复片段的起止时间。例如,可以根据前 述的第一匹配直线或第二匹配直线的起点和终点得到待检索媒体与已知媒体中的重复片段的起止时间。
其中的根据相似度矩阵中的直线(例如匹配直线)来确定重复片段的具体方法可以是:根据直线的起点所对应的第一媒体特征单体的序数(或者说,相似度矩阵中的横坐标)确定待检索媒体中的重复片段的开始时间,而根据该起点所对应的第二媒体特征单体的序数(或者说,相似度矩阵中的纵坐标)确定第一候选媒体中的重复片段的开始时间;类似地,根据直线的终点的横坐标确定待检索媒体中的重复片段的结束时间,而根据该终点的纵坐标确定第一候选媒体中的重复片段的结束时间。
在本公开的一些实施例中(例如前述的图5和图6所示的实施例),步骤S34还包括:检测所得到的第一匹配直线或第二匹配直线的开头部分和结尾部分,判断该第一匹配直线/第二匹配直线的开头部分和结尾部分的点(单体相似度)是否达到预设的单体相似度设定值,去掉第一匹配直线/第二匹配直线的开头和结尾的未达到该单体相似度设定值(即单体相似度不高)的部分,保留中间一段直线并定义为第三匹配直线;根据该第三匹配直线的直线相似度来确定序列相似度评分,和/或根据该第三匹配直线的起点和终点确定已知媒体与待检索媒体的重复片段的起止时间。通过去掉匹配直线开头结尾的相似度不高的部分、保留中间一段相似度较高的直线之后,再确定已知媒体与待检索媒体的相似情况,能够提高媒体检索的准确性,能够得到更准确的重复片段。
其中的去掉匹配直线开头/结尾的未达到该单体相似度设定值的部分的具体方法可以是:从匹配直线的起点/终点向中间依次检查,判断是否达到该单体相似度设定值,在找到第一个达到该单体相似度设定值的点后,去掉该点到起点/终点之间的多个点。
需要注意的是,该单体相似度设定值可以是一个单体相似度的具体数值,在检查时判断一个点是否达到该数值;也可以是一个比例值,在检查时判断一个点与第一匹配直线/第二匹配直线所包含的所有点的平均值或最大值相比,是否达到该比例值。
图8为本公开的媒体检索装置100一个实施例的示意性结构图。请参阅图8,本公开示例的媒体检索装置100,主要包括:
媒体特征获取模块110,用于获取待检索媒体(Query媒体)的媒体特征作为第一媒体特征。该第一媒体特征包含多个第一媒体特征单体。
第一排名模块120,用于根据该第一媒体特征,对多个已知媒体进行第一排名,根据该第一排名的结果,取出前k个已知媒体作为第一候选媒体集合。其中的k为正整数,而k的具体取值是可以设置的。具体地,该第一排名模块120用于根据每个单独的第一媒体特征单体与已知媒体的匹配 情况进行的排名。进一步地,该第一排名模块120可以用于根据各个第一媒体特征单体对已知媒体进行的词频-逆向文件频率TF-IDF排名。
第二排名模块130,用于根据该第一媒体特征,对该第一候选媒体集合进行第二排名,根据该第二排名的结果,取出第一候选媒体集合中的前n个第一候选媒体作为检索结果。其中的n为正整数,而n的具体取值是可以设置的。具体地,该第二排名模块130用于根据多个顺序排列的第一媒体特征单体对第一候选媒体集合中的媒体进行的排名。
前述的已知媒体可以是一个媒体数据库中的媒体。在该媒体数据库中存储有大量的已知媒体的媒体特征,并且在存储的已知媒体的媒体特征中包含有与第一媒体特征利用相同提取方法得到的与第一媒体特征相同类型的媒体特征。
在本公开的一些实施例中,本公开的媒体检索装置100还包括特征索引获取模块(图中未示出),用于获取多个已知媒体的媒体特征作为第二媒体特征,该第二媒体特征包含多个第二媒体特征单体,对该第二媒体特征进行索引,以得到已知媒体的特征索引。而第一排名模块120具体用于将该特征索引与第一媒体特征单体进行匹配,以对多个已知媒体进行TF-IDF排名。
进一步地,该特征索引获取模块用于获取已知媒体的正排特征索引(forward index)和倒排特征索引(inverted index)。
另外,如果对媒体数据库中的所有已知媒体都进行第一排名,可能会影响检索效率,因此本公开的第一排名模块120可以包括绝对匹配子模块121,用于在第一排名之前,先对多个已知媒体进行绝对匹配(exact match)。其中的绝对匹配,用于选出所包含的第一媒体特征单体的数量在预设数量或预设比例以上的已知媒体作为第二候选媒体集合。然后再对该第二候选媒体集合进行第一排名,以选出第一候选媒体集合。
图9为本公开一个实施例提供的第一排名模块120的示意性结构图。请参阅图9,在本公开的一个实施例中,该第一排名模块120具体包括:
绝对匹配子模块121,用于根据倒排特征索引,统计各个第一媒体特征单体在哪些已知媒体的第二媒体特征中出现,以从媒体数据库中选取出包含预设数量以上第一媒体特征单体的已知媒体作为第二候选媒体集合。
词频确定子模块122,用于基于正排特征索引,确定一个第一媒体特征单体在一个第二候选媒体的第二媒体特征中的词频。
文件频率确定子模块123,用于基于倒排特征索引,确定一个第一媒体特征单体的文件频率。
词频-逆向文件频率评分子模块124,用于根据各个第一媒体特征单体在一个第二候选媒体的第二媒体特征中的词频以及各个第一媒体特征单体 的文件频率,确定该第二候选媒体的词频-逆向文件频率评分。
第一排名子模块125,用于根据得到的各个第二候选媒体的词频-逆向文件频率评分对第二候选媒体集合进行排名,得到第一排名的结果,从该第一排名结果中取出前k个第二候选媒体作为第一候选媒体集合;该第一排名子模块125还可用于将各个第一候选媒体的第二媒体特征(正排特征索引)返回给第二排名模块130,以备在后续的进一步处理。
在本公开的一些实施例中,该第二排名为根据多个顺序排列的第一媒体特征单体所组成的具有先后顺序的序列在第一候选媒体的媒体特征中出现的情况,对该第一候选媒体集合中的媒体进行的排名。具体地,该第二排名模块130用于:根据已知媒体的特征索引与第一媒体特征得到该第一候选媒体集合中的媒体的相似度矩阵,基于该相似度矩阵对该第一候选媒体集合中的媒体进行排名。
图10为本公开一个实施例提供的第二排名模块130的示意性结构图。请参阅图10,在本公开的一个实施例中,该第二排名模块130具体包括:
第二媒体特征获取子模块131,用于获取第一候选媒体集合中的一个第一候选媒体(事实上每个第一候选媒体都是已知媒体)的第二媒体特征。具体地,可以根据已知媒体的特征索引(例如,正排特征索引)获取该第二媒体特征。
单体相似度第一确定子模块132,用于确定该第一候选媒体的第二媒体特征所包含的各个第二媒体特征单体与各个第一媒体特征单体之间的单体相似度。
相似度矩阵第一确定子模块133,用于根据各个单体相似度,确定该第一候选媒体与待检索媒体之间的相似度矩阵。
序列相似度评分确定子模块134,用于根据每个第一候选媒体的相似度矩阵,确定该第一候选媒体的序列相似度评分。具体地,序列相似度评分确定子模块134用于根据相似度矩阵中的直线来确定该的序列相似度评分。
第二排名子模块135,用于根据各个第一候选媒体的该序列相似度评分对第一候选媒体集合进行排名,得到第二排名的结果,从该第二排名结果中取出前n个第一候选媒体作为检索结果。
在本公开的一个实施例中,该序列相似度评分确定子模块134具体用于利用前述的动态规划法的各个具体步骤来确定该序列相似度评分。
在本公开的一个实施例中,该序列相似度评分确定子模块134具体用于利用前述的匀速媒体法的各个具体步骤来确定该序列相似度评分。
进一步地,其中相似度矩阵是由多种媒体的相似度综合考量得到的。图11为本公开一个实施例提供的基于多种类型的第一媒体特征和第二媒体特征确定相似度矩阵的媒体检索装置100的示意性结构图。请参阅图11, 在本公开的一个实施例中,本公开的媒体检索装置100还包括:
多类型第一媒体特征获取模块140,用于获取待检索媒体的多种类型的第一媒体特征,每种类型的第一媒体特征包含多个第一媒体特征单体。
多类型第二媒体特征获取模块150,用于获取一个已知媒体(具体地,可以是前述的第一候选媒体集合中的媒体)的多种类型的第二媒体特征,每种类型的第二媒体特征包含多个第二媒体特征单体。在一些示例中,可以包括特征索引获取模块(图中未示出),用于对多种类型的第二媒体特征进行索引,以得到基于多种媒体特征的特征索引。
单体相似度第二确定子模块160,用于分别确定同种类型的该第二媒体特征单体与该第一媒体特征单体之间的单体相似度,从而得到多种单体相似度。其中,第二媒体特征单体可以基于特征索引得到。
相似度矩阵第二确定子模块170,用于确定多种单体相似度的平均值或最小值,并根据多种单体相似度的该平均值或该最小值确定该已知媒体的相似度矩阵。
进而前述的序列相似度评分确定子模块134用于根据该基于多种单体相似度的平均值或最小值的相似度矩阵来确定序列相似度评分。
在本公开的一些实施例中,获取的待检索媒体的第一媒体特征中还包含用于表示第一媒体特征单体的可信程度的第一可信度字段,和/或获取的已知媒体的第二媒体特征中还包含用于表示第二媒体特征单体的可信程度的第二可信度字段;该第二排名模块130还用于:在确定单体相似度时或在确定序列相似度评分时,利用该第一可信度字段和/或利用该第二可信度字段进行加权,可信度高的赋予高权重,可信度低的赋予低权重,然后根据加权后得到的序列相似度评分进行第二排名。
在本公开的一些实施例中,该媒体检索装置100还包括媒体切片模块(图中未示出)。该媒体切片模块用于在进行第一排名之前,对获取的待检索媒体的第一媒体特征以及已知媒体的第二媒体特征按照预设的固定长度切片,得到多个长度相同(包含相同数量的媒体特征单体)的第一子媒体特征和第二子媒体特征;和/或,该媒体切片模块用于在获取媒体特征之前,预先对待检索媒体以及已知媒体按照预设的固定时间长度切片,得到多段时间长度相同的待检索媒体片段和已知媒体片段,然后分别获取各个待检索媒体片段和已知媒体片段的媒体特征,得到各个待检索媒体片段的第一子媒体特征、各个已知媒体片段的第二子媒体特征。而前述的第一排名模块120和第二排名模块130用于根据每个第一子媒体特征、第二子媒体特征进行前述的第一排名和第二排名的步骤,得到各个子媒体特征的检索结果,然后根据各个子媒体特征的检索结果确定原始的待检索媒体的检索结果。
在本公开的一些实施例中,第一媒体特征中的第一媒体特征单体以及第二媒体特征中的第二媒体特征单体在排列上具有时间性。这时,本公开的媒体检索装置100还包括重复媒体片段确定模块(图中未示出),用于根据前述的相似度矩阵确定待检索媒体与已知媒体的重复片段。具体地,该重复媒体片段确定模块具体用于根据相似度矩阵中的直线的起点和终点得到两个媒体中的重复片段的起止时间。
图12是图示根据本公开的实施例的媒体检索硬件装置的硬件框图。如图12所示,根据本公开实施例的媒体检索硬件装置200包括存储器201和处理器202。媒体检索硬件装置200中的各组件通过总线系统和/或其它形式的连接机构(未示出)互连。
该存储器201用于存储非暂时性计算机可读指令。具体地,存储器201可以包括一个或多个计算机程序产品,该计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。该易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。该非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。
该处理器202可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制媒体检索硬件装置200中的其它组件以执行期望的功能。在本公开的一个实施例中,该处理器202用于运行该存储器201中存储的该计算机可读指令,使得该媒体检索硬件装置200执行前述的本公开各实施例的媒体检索方法的全部或部分步骤。
图13是图示根据本公开的实施例的计算机可读存储介质的示意图。如图13所示,根据本公开实施例的计算机可读存储介质300,其上存储有非暂时性计算机可读指令301。当该非暂时性计算机可读指令301由处理器运行时,执行前述的本公开各实施例的媒体检索方法的全部或部分步骤。
图14是图示根据本公开实施例的终端设备的硬件结构示意图。终端设备可以以各种形式来实施,本公开中的终端设备可以包括但不限于诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置、车载终端设备、车载显示终端、车载电子后视镜等等的移动终端设备以及诸如数字TV、台式计算机等等的固定终端设备。
如图14所示,终端设备1100可以包括无线通信单元1110、A/V(音频/视频)输入单元1120、用户输入单元1130、感测单元1140、输出单元1150、存储器1160、接口单元1170、控制器1180和电源单元1190等等。图14示出了具有各种组件的终端设备,但是应理解的是,并不要求实施所有示出的组件。可以替代地实施更多或更少的组件。
其中,无线通信单元1110允许终端设备1100与无线通信系统或网络之间的无线电通信。A/V输入单元1120用于接收音频或视频信号。用户输入单元1130可以根据用户输入的命令生成键输入数据以控制终端设备的各种操作。感测单元1140检测终端设备1100的当前状态、终端设备1100的位置、用户对于终端设备1100的触摸输入的有无、终端设备1100的取向、终端设备1100的加速或减速移动和方向等等,并且生成用于控制终端设备1100的操作的命令或信号。接口单元1170用作至少一个外部装置与终端设备1100连接可以通过的接口。输出单元1150被构造为以视觉、音频和/或触觉方式提供输出信号。存储器1160可以存储由控制器1180执行的处理和控制操作的软件程序等等,或者可以暂时地存储己经输出或将要输出的数据。存储器1160可以包括至少一种类型的存储介质。而且,终端设备1100可以与通过网络连接执行存储器1160的存储功能的网络存储装置协作。控制器1180通常控制终端设备的总体操作。另外,控制器1180可以包括用于再现或回放多媒体数据的多媒体模块。控制器1180可以执行模式识别处理,以将在触摸屏上执行的手写输入或者图片绘制输入识别为字符或图像。电源单元1190在控制器1180的控制下接收外部电力或内部电力并且提供操作各元件和组件所需的适当的电力。
本公开提出的媒体检索方法的各种实施方式可以以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,本公开提出的媒体检索方法的各种实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施,在一些情况下,本公开提出的媒体检索方法的各种实施方式可以在控制器1180中实施。对于软件实施,本公开提出的媒体检索方法的各种实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器1160中并且由控制器1180执行。
以上,根据本公开实施例的媒体检索方法、装置、硬件装置、计算机可读存储介质以及终端设备,通过进行基于待检索媒体的媒体特征中的每个单独的媒体特征单体的第一排名以及基于待检索媒体的媒体特征中的多个顺序排列的媒体特征单体的第二排名得到检索结果,能够大大提高媒体检索的准确性和效率。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开 的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
另外,如在此使用的,在以“至少一个”开始的项的列举中使用的“或”指示分离的列举,以便例如“A、B或C的至少一个”的列举意味着A或B或C,或AB或AC或BC,或ABC(即A和B和C)。此外,措辞“示例的”不意味着描述的例子是优选的或者比其他例子更好。
还需要指出的是,在本公开的系统和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
可以不脱离由所附权利要求定义的教导的技术而进行对在此所述的技术的各种改变、替换和更改。此外,本公开的权利要求的范围不限于以上所述的处理、机器、制造、事件的组成、手段、方法和动作的具体方面。可以利用与在此所述的相应方面进行基本相同的功能或者实现基本相同的结果的当前存在的或者稍后要开发的处理、机器、制造、事件的组成、手段、方法或动作。因而,所附权利要求包括在其范围内的这样的处理、机器、制造、事件的组成、手段、方法或动作。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。
Figure PCTCN2018125495-appb-000001

Claims (12)

  1. 根据所述已知媒体的特征索引与所述第一媒体特征得到所述第一候选媒体集合中的媒体的相似度矩阵,根据所述相似度矩阵中的直线对所述第一候选媒体集合中的媒体进行排名。
  2. 根据权利要求7所述的媒体检索方法,其中,
    所述的获取待检索媒体的媒体特征作为第一媒体特征包括,获取所述待检索媒体的多种类型的第一媒体特征;
    所述的预先获取所述已知媒体的媒体特征作为第二媒体特征包括,获取所述已知媒体的多种类型的第二媒体特征;
    所述的根据所述已知媒体的特征索引与所述第一媒体特征得到所述第一候选媒体集合中的媒体的相似度矩阵包括,根据所述多种类型的第二媒体特征以及所述多种类型的第一媒体特征来确定所述相似度矩阵。
  3. 根据权利要求8所述的媒体检索方法,其中,
    每种类型的所述第一媒体特征包含多个第一媒体特征单体,每种类型的所述第二媒体特征包含多个第二媒体特征单体;
    所述的根据所述多种类型的第二媒体特征以及所述多种类型的第一媒体特征来确定所述相似度矩阵包括:
    分别确定同种类型的所述第二媒体特征单体与所述第一媒体特征单体之间的单体相似度,以得到多种所述单体相似度;确定所述多种单体相似度的平均值或最小值,根据所述的多种单体相似度的平均值或最小值确定所述相似度矩阵。
  4. 根据权利要求2所述的媒体检索方法,还包括:
    预先对待检索媒体和已知媒体按照预设的时间长度切片,得到多段待检索子媒体和多段已知子媒体,对所述多段待检索子媒体和所述多段已知子媒体分别提取媒体特征,以得到长度相同的多个第一子媒体特征和多个第二子媒体特征。
  5. 根据权利要求2所述的媒体检索方法,还包括:
    在进行所述第一排名之前对获得的待检索媒体的所述第一媒体特征和已知媒体的所述第二媒体特征按照预设的长度切片,以得到长度相同的多个第一子媒体特征和多个第二子媒体特征。
  6. 根据权利要求7所述的媒体检索方法,其中,所述多个第一媒体特征单体在所述第一媒体特征中按时间顺序排列,所述多个第二媒体特征单体在所述第二媒体特征中按时间顺序排列。
  7. 根据权利要求12所述的媒体检索方法,还包括:
    根据所述相似度矩阵中的直线确定所述待检索媒体与所述检索结果中的媒体的重复片段。
  8. 一种媒体检索装置,所述装置包括:
    媒体特征获取模块,用于获取待检索媒体的媒体特征作为第一媒体特征,所述第一媒体特征包含多个第一媒体特征单体;
    第一排名模块,用于根据每个单独的所述第一媒体特征单体对多个已知媒体进行第一排名,根据所述第一排名的结果,取出前k个所述已知媒体作为第一候选媒体集合,其中k为正整数;
    第二排名模块,用于根据多个顺序排列的所述第一媒体特征单体对所述第一候选媒体集合进行第二排名,根据所述第二排名的结果,取出前n个所述第一候选媒体作为检索结果,其中n为正整数。
  9. 根据权利要求14所述的媒体检索装置,所述装置还包括执行权利要求2到13中任一权利要求所述步骤的模块。
  10. 一种媒体检索硬件装置,包括:
    存储器,用于存储非暂时性计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,使得所述计算机可读指令被所述处理器执行时实现根据权利要求1到13中任意一项所述的媒体检索方法。
  11. 一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行权利要求1到13中任意一项所述的媒体检索方法。
  12. 一种终端设备,包括权利要求14或15所述的一种媒体检索装置。
PCT/CN2018/125495 2018-03-29 2018-12-29 一种媒体检索方法及装置 WO2019184519A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019572507A JP6991255B2 (ja) 2018-03-29 2018-12-29 メディア検索方法及び装置
US16/962,416 US11874869B2 (en) 2018-03-29 2018-12-29 Media retrieval method and apparatus
SG11201913922QA SG11201913922QA (en) 2018-03-29 2018-12-29 Media retrieval method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810272795.X 2018-03-29
CN201810272795.XA CN110555114A (zh) 2018-03-29 2018-03-29 一种媒体检索方法及装置

Publications (1)

Publication Number Publication Date
WO2019184519A1 true WO2019184519A1 (zh) 2019-10-03

Family

ID=68062463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125495 WO2019184519A1 (zh) 2018-03-29 2018-12-29 一种媒体检索方法及装置

Country Status (5)

Country Link
US (1) US11874869B2 (zh)
JP (1) JP6991255B2 (zh)
CN (1) CN110555114A (zh)
SG (1) SG11201913922QA (zh)
WO (1) WO2019184519A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569373B (zh) * 2018-03-29 2022-05-13 北京字节跳动网络技术有限公司 一种媒体特征的比对方法及装置
CN112749334B (zh) * 2020-08-21 2023-12-12 深圳市雅阅科技有限公司 信息推荐方法、装置、电子设备及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778276A (zh) * 2015-04-29 2015-07-15 北京航空航天大学 一种基于改进tf-idf的多索引合并排序算法
US20160140231A1 (en) * 2014-11-18 2016-05-19 Oracle International Corporation Term selection from a document to find similar content
CN107402965A (zh) * 2017-06-22 2017-11-28 中国农业大学 一种音频检索方法
CN107577773A (zh) * 2017-09-08 2018-01-12 科大讯飞股份有限公司 一种音频匹配方法与装置、电子设备

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3571162B2 (ja) * 1997-03-03 2004-09-29 日本電信電話株式会社 類似オブジェクト検索方法および装置
JP2001134584A (ja) 1999-11-04 2001-05-18 Nippon Telegr & Teleph Corp <Ntt> 類似データの検索方法,検索装置および類似データ検索プログラム記録媒体
US20070162497A1 (en) 2003-12-08 2007-07-12 Koninklijke Philips Electronic, N.V. Searching in a melody database
US7433895B2 (en) * 2005-06-24 2008-10-07 Microsoft Corporation Adding dominant media elements to search results
US20090112830A1 (en) 2007-10-25 2009-04-30 Fuji Xerox Co., Ltd. System and methods for searching images in presentations
JP2011128903A (ja) 2009-12-17 2011-06-30 Toyohashi Univ Of Technology 系列信号検索装置および系列信号検索方法
US8861844B2 (en) * 2010-03-29 2014-10-14 Ebay Inc. Pre-computing digests for image similarity searching of image-based listings in a network-based publication system
US10331785B2 (en) * 2012-02-17 2019-06-25 Tivo Solutions Inc. Identifying multimedia asset similarity using blended semantic and latent feature analysis
US10187674B2 (en) * 2013-06-12 2019-01-22 Netflix, Inc. Targeted promotion of original titles
CN103440313B (zh) * 2013-08-27 2018-10-16 复旦大学 基于音频指纹特征的音乐检索系统
KR101627398B1 (ko) * 2013-12-27 2016-06-13 삼성전자주식회사 내용기반의 검색엔진을 이용한 개인 콘텐츠 저작권 관리 시스템 및 방법
CN107666638B (zh) * 2016-07-29 2019-02-05 腾讯科技(深圳)有限公司 一种估计录音延迟的方法及终端设备
CN106649440B (zh) 2016-09-13 2019-10-25 西安理工大学 融合全局r特征的近似重复视频检索方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160140231A1 (en) * 2014-11-18 2016-05-19 Oracle International Corporation Term selection from a document to find similar content
CN104778276A (zh) * 2015-04-29 2015-07-15 北京航空航天大学 一种基于改进tf-idf的多索引合并排序算法
CN107402965A (zh) * 2017-06-22 2017-11-28 中国农业大学 一种音频检索方法
CN107577773A (zh) * 2017-09-08 2018-01-12 科大讯飞股份有限公司 一种音频匹配方法与装置、电子设备

Also Published As

Publication number Publication date
JP2020525949A (ja) 2020-08-27
US20210073262A1 (en) 2021-03-11
JP6991255B2 (ja) 2022-01-12
SG11201913922QA (en) 2020-01-30
CN110555114A (zh) 2019-12-10
US11874869B2 (en) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2019184522A1 (zh) 一种重复视频的判断方法及装置
WO2019184518A1 (zh) 一种音频检索识别方法及装置
US11048966B2 (en) Method and device for comparing similarities of high dimensional features of images
EP3477506B1 (en) Video detection method, server and storage medium
CN105917359B (zh) 移动视频搜索
WO2017045443A1 (zh) 一种图像检索方法及系统
WO2015184992A1 (zh) 一种识别重复图片的方法、图片搜索去重方法及其装置
CN110162665B (zh) 视频搜索方法、计算机设备及存储介质
CN111859004B (zh) 检索图像的获取方法、装置、设备及可读存储介质
WO2015196964A1 (zh) 搜索匹配图片的方法、图片搜索方法及装置
WO2017156963A1 (zh) 一种指纹解锁的方法及终端
CN110826365A (zh) 一种视频指纹生成方法和装置
WO2019184519A1 (zh) 一种媒体检索方法及装置
CN106033417B (zh) 视频搜索系列剧的排序方法和装置
Zhang et al. Large‐scale video retrieval via deep local convolutional features
EP3477505B1 (en) Fingerprint clustering for content-based audio recogntion
US11429660B2 (en) Photo processing method, device and computer equipment
US11593582B2 (en) Method and device for comparing media features
JP6017277B2 (ja) 特徴ベクトルの集合で表されるコンテンツ間の類似度を算出するプログラム、装置及び方法
Karlsson et al. Mobile photo album management with multiscale timeline
CN113505257A (zh) 图像检索方法、商标检索方法、电子设备以及存储介质
CN112784106A (zh) 内容数据的处理方法、报告数据的处理方法、计算机设备、存储介质
Paisitkriangkrai et al. Clip-based hierarchical representation for near-duplicate video detection
CN113987234A (zh) 图像处理、检索方法、装置、设备和存储介质
Aboobacker et al. Instant Video Search

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18912271

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019572507

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18912271

Country of ref document: EP

Kind code of ref document: A1