WO2018205736A1 - 多媒体信息检索方法、装置及存储介质 - Google Patents

多媒体信息检索方法、装置及存储介质 Download PDF

Info

Publication number
WO2018205736A1
WO2018205736A1 PCT/CN2018/078759 CN2018078759W WO2018205736A1 WO 2018205736 A1 WO2018205736 A1 WO 2018205736A1 CN 2018078759 W CN2018078759 W CN 2018078759W WO 2018205736 A1 WO2018205736 A1 WO 2018205736A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
multimedia information
sample multimedia
sample
fingerprint
Prior art date
Application number
PCT/CN2018/078759
Other languages
English (en)
French (fr)
Inventor
江佳伟
崔斌
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018205736A1 publication Critical patent/WO2018205736A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the invention relates to an electrical digital data processing technology, in particular to a multimedia information retrieval method, device and storage medium.
  • the embodiments of the present invention are expected to provide a multimedia information retrieval method, apparatus, and storage medium, which can implement a diverse multimedia information retrieval service.
  • an embodiment of the present invention provides a multimedia information retrieval method, where the method includes:
  • the multimedia information retrieval request carrying a sample multimedia information retrieval identifier
  • the fingerprint information of the sample multimedia information includes: first fingerprint information used to represent text features of the sample multimedia information, and used for characterization Second fingerprint information of the visual feature of the sample multimedia information;
  • the multimedia information associated with the sample multimedia information is acquired.
  • an embodiment of the present invention provides a multimedia information retrieval device, where the device includes: an acquisition module, a processing module, and a matching module;
  • the acquiring module is configured to receive a multimedia information retrieval request; the multimedia information retrieval request carries a sample multimedia information retrieval identifier;
  • the processing module is configured to determine fingerprint information of the sample multimedia information based on attribute information of the sample multimedia information;
  • the fingerprint information of the sample multimedia information includes: first, used to represent a text feature of the sample multimedia information Fingerprint information, and second fingerprint information for characterizing a visual feature of the sample multimedia information;
  • the matching module is configured to match fingerprint information of the sample multimedia information with fingerprint information in a preset fingerprint database to obtain a matching result
  • the multimedia information associated with the sample multimedia information is acquired.
  • an embodiment of the present invention provides a multimedia information retrieval apparatus, where the apparatus includes:
  • the processor is configured to execute the multimedia information retrieval method described above when the computer program is executed.
  • an embodiment of the present invention provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the multimedia information retrieval method.
  • an embodiment of the present invention provides a multimedia information retrieval method, where the method is performed by a server, where the server includes one or more processors and a memory, and one or more programs, where One or more programs are stored in the memory, and the program can include one or more units each corresponding to a set of instructions, the one or more processors being configured to execute the instructions; the method comprising:
  • the multimedia information retrieval request carrying a sample multimedia information retrieval identifier
  • the fingerprint information of the sample multimedia information includes: first fingerprint information used to represent text features of the sample multimedia information, and used for characterization Second fingerprint information of the visual feature of the sample multimedia information;
  • the multimedia information associated with the sample multimedia information is acquired.
  • the multimedia information retrieval method, device and storage medium provided by the embodiment of the present invention, because the fingerprint information of the sample multimedia information includes the first fingerprint information and the second fingerprint information, and the processing feature of the text feature corresponding to the first fingerprint information is faster. It is suitable for the multimedia information retrieval service sensitive to delay.
  • the visual feature corresponding to the second fingerprint information has a higher degree of representation on the multimedia content, and is suitable for the multimedia information retrieval service with high retrieval accuracy; and based on the sample multimedia information.
  • the multimedia information retrieval whether the multimedia information retrieval service is a delay-sensitive multimedia information retrieval service or a multimedia information retrieval service requiring high retrieval accuracy, the fingerprint information of the sample multimedia information and the fingerprint database can be realized.
  • the matching of fingerprint information therefore, the above multimedia information retrieval scheme can adapt to the retrieval requirements of different multimedia information retrieval services, and has strong versatility.
  • 1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart 1 of a multimedia information retrieval method according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a Storm Topology
  • FIG. 4 is a schematic diagram of data processing between components based on a preset data stream transmission protocol according to an embodiment of the present invention
  • FIG. 5 is a second schematic flowchart of a multimedia information retrieval method according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram 1 of an application scenario of a video retrieval method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of video frame division of a video file according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart 3 of a multimedia information retrieval method according to an embodiment of the present invention.
  • FIG. 9 is a second schematic diagram of an application scenario of a video retrieval method according to an embodiment of the present invention.
  • FIG. 10A to FIG. 10C are schematic diagrams 3 of an application scenario of a video retrieval method according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a multimedia information retrieval apparatus according to an embodiment of the present invention.
  • FIG. 12 is a diagram showing an example of a multimedia information retrieval apparatus as a hardware entity according to an embodiment of the present invention.
  • Video fingerprinting is a technology that software recognizes, extracts, and compresses video. It can use a unique "fingerprint" generated during processing to represent a video file. Video fingerprinting can be based on any visual video features including, but not limited to, key frame sequence analysis, color and motion changes in the video stream.
  • FIG. 1 is a schematic diagram of hardware entities of each party performing information interaction according to an embodiment of the present invention.
  • FIG. 1 includes: a server 11...1n, a terminal device 21-24, and a terminal device 21-24 performs a connection with a server through a wired network or a wireless network.
  • Information exchange, terminal equipment includes mobile phones, desktops, PCs, all-in-ones and so on.
  • the user uploads a sample video file (ie, a video file to be retrieved, such as a movie clip or a music video) to a video website through the terminal device, and the server performs video search based on the video file uploaded by the user, and returns a video search result.
  • a sample video file ie, a video file to be retrieved, such as a movie clip or a music video
  • the embodiment of the present invention provides a multimedia information retrieval method.
  • the multimedia information retrieval method in the embodiment of the present invention includes:
  • Step 101 Receive a multimedia information retrieval request; the multimedia information retrieval request carries a sample multimedia information retrieval identifier.
  • the multimedia information includes video information, audio information, and the like;
  • the multimedia information retrieval request is used to request multimedia information retrieval, such as video retrieval;
  • the sample multimedia information retrieval identifier is used to identify the sample multimedia information, such as when the sample
  • the search identifier may be a video identifier (ID, identification), and may also be a title of the sample multimedia file.
  • Step 102 Acquire attribute information of the sample multimedia information based on the multimedia information retrieval request.
  • the multimedia information retrieval needs to be based on the attribute information of the sample multimedia information provided by the user. Therefore, the sample multimedia information retrieval identifier carried by the multimedia information retrieval request submitted by the user needs to be obtained, and the attribute information of the sample multimedia information is acquired.
  • the multimedia information database of the identifier query may be retrieved based on the sample multimedia information, and the attribute information of the sample multimedia information (including the search identifier of the sample multimedia information, the title, the sample multimedia information text description, and the sample multimedia file may be obtained).
  • the preset multimedia information library stores a mapping relationship between the at least one set of sample multimedia information retrieval identifiers and the attribute information of the sample multimedia information, which may be based on The search identifier is searched for corresponding attribute information; exemplarily, the preset video database is queried according to the video ID of the sample video to obtain metadata (video ID, video title, video text description, video file, etc.) of the sample video.
  • the attribute information of the sample multimedia information is presented in the form of a data stream message, for example, in the form of a Stream Shared Message (SSM).
  • SSM Stream Shared Message
  • the attribute information of the sample multimedia information that is, all subsequent processing of the attribute information based on the sample multimedia information is real-time streaming processing (processing of the data stream), and exemplary, by calling pre-programmed N
  • N is a positive integer not less than 2; thus, on the one hand, the real-time requirement for the end-to-end delay of the delay-sensitive multimedia information retrieval service can be satisfied, on the other hand It has greater scalability to handle the ever-increasing amount of multimedia data and the increasingly complex user retrieval needs.
  • the attribute information of the sample multimedia information is encapsulated in the form of a key-value data pair (such as a key-value data pair) based on a preset data stream transmission protocol, to obtain an attribute corresponding to the sample multimedia information.
  • the SSM of the information is encapsulated in the form of a key-value data pair to obtain an SSM corresponding to the metadata of the sample video.
  • Step 103 Determine fingerprint information of the sample multimedia information based on the attribute information of the sample multimedia information.
  • the fingerprint information of the sample multimedia information includes: first fingerprint information used to represent a text feature of the sample multimedia information, and second fingerprint information used to represent a visual feature of the sample multimedia information.
  • the performing the step operation needs to decapsulate the corresponding multimedia information according to the preset data stream transmission protocol. a data stream message of the attribute information, obtaining a key-value data pair storing the attribute information of the sample multimedia information; and then performing string conversion on the obtained key-value data pair to obtain attribute information of the sample multimedia information .
  • the step 103 may include: performing feature extraction on the attribute information of the sample multimedia information, obtaining feature information of the sample multimedia information, and then generating fingerprint information of the sample multimedia information based on the feature information of the sample multimedia information;
  • the feature information of the sample multimedia information includes: a text feature of the sample multimedia information and a visual feature of the sample multimedia information;
  • the feature information of the sample video is subjected to feature extraction to obtain a video feature of the sample video, and then the fingerprint of the sample video is generated based on the video feature of the sample video; wherein the video features of the sample video include: Video text features of the sample video and video visual features of the sample video.
  • performing feature extraction on the attribute information of the sample video to obtain a video feature of the sample video may be performed by dividing the sample video into multiple video frames, based on a predetermined feature extraction algorithm.
  • a plurality of feature frames representing the plurality of video frame specific feature information are extracted from the plurality of video frames to perform encoding processing on the extracted feature frames to obtain a video fingerprint of the sample video.
  • the fingerprint information of the sample multimedia information is encapsulated in the form of a key-value data pair based on the preset data stream transmission protocol, and the corresponding information is obtained.
  • a data stream message (such as SSM) of the fingerprint information of the sample multimedia information.
  • Step 104 Match the fingerprint information of the sample multimedia information with the fingerprint information in the preset fingerprint database to obtain multimedia information associated with the sample multimedia information.
  • a fingerprint database of the multimedia information corresponding thereto may be established based on the stored multimedia information; and the plurality of multimedia files may be subjected to feature extraction and fingerprint generation to obtain the plurality of The fingerprint of the multimedia file is then stored in the fingerprint database of the obtained fingerprints of the plurality of multimedia files.
  • matching the fingerprint information of the sample multimedia information with the fingerprint information in the preset fingerprint database may be performed by comparing the similarity of the fingerprints, and when the similarity exceeds the preset similarity threshold, determining If the matching is successful, the matching failure is determined.
  • the similarity threshold may be set according to actual conditions.
  • the matching When the matching is successful, acquiring multimedia information corresponding to the fingerprint matching the fingerprint information of the sample multimedia information, and using the multimedia information as the multimedia information associated with the sample multimedia information; and when the matching fails, indicating that there is no existence The multimedia information associated with the sample multimedia information.
  • the video fingerprint of the sample video is matched with the fingerprint in the preset fingerprint database.
  • one or more video files corresponding to one or more fingerprints matching the video fingerprint of the sample video are A video file (which may be one or more video files that are similar to the sample video file) associated with the sample video.
  • the foregoing multimedia information retrieval method may be implemented on a Storm platform (not limited to the platform, such as Flink or Spark Streaming) by calling a pre-programmed component, and the SSM is transmitted between the components;
  • the Storm platform is an open source distributed real-time streaming computing platform. Data flows are transmitted between logical nodes in the Storm platform in real time.
  • Figure 3 shows the structure of the Storm Topology. Topology is the logical topology of the tasks in the Storm platform. It can be executed on the engine in the Storm platform; Spout is the logical node that generates the data stream in the Storm Topology, that is, the data stream production node, as shown by the label 1 in FIG. 3, and the Bolt is the logical node of the consumption data stream in the Storm Topology, that is, The data stream consumption node is shown as reference numeral 2 in FIG.
  • TopologyBuilder builder new TopologyBuilder()
  • GetTaskSpout, TextFeaBolt, TextSigBolt, etc. are pre-programmed components, and components with data flow between them are connected by after(). After completing the definition of Topology, submit the topology to the cluster through the Storm interface.
  • the upstream components downstream component receives input SSM 1, SSM 1 the received decapsulates obtained in SSM 1
  • the key-value data pair (string type, stored in JSON format), then converts the obtained key-value data pair into real data, and generates new data based on the converted data (ie, the data)
  • Corresponding processing such as feature extraction, fingerprint generation, etc., uses JSON tools to encapsulate new and existing data into SSM 2 and output the generated SSM 2 to its own downstream components.
  • the first component to the fourth component are pre-programmed components on the Storm platform, and when the first component generates an SSM corresponding to the attribute information of the sample multimedia information, outputting the same to the second component, the second component Performing decapsulation and string conversion on the received SSM to obtain attribute information of the sample multimedia information based on the preset data stream transmission protocol, and then performing feature extraction on the attribute information of the sample multimedia information to obtain the sample multimedia information.
  • the feature information is then encapsulated into an SSM corresponding to the feature information of the sample multimedia information and output to the third component; the third component decapsulates the received SSM based on a preset data stream transmission protocol, converts the string, and then based on the Generating the feature information of the sample multimedia information, generating the fingerprint information of the sample multimedia information, encapsulating it into an SSM corresponding to the fingerprint information of the sample multimedia information, and outputting the SSM to the fourth component, so that the fourth component performs decapsulation and a string After the conversion, the fingerprint information is matched, thereby obtaining more correlations with the sample multimedia information.
  • Body information is then encapsulated into an SSM corresponding to the feature information of the sample multimedia information and output to the third component; the third component decapsulates the received SSM based on a preset data stream transmission protocol, converts the string, and then based on the Generating the feature information of the sample multimedia information, generating the fingerprint information of the sample multimedia information, encapsulating it into an SSM
  • the second component when the first component obtains the SSM corresponding to the attribute information of the sample video, the second component is output to the second component, and the second component is based on the preset data stream transmission protocol for the sample video.
  • the third component Performing feature extraction on the attribute information, obtaining feature information of the sample video (video low-level feature, represented as a high-dimensional vector), and outputting an SSM corresponding to the feature information of the sample video to the third component; the third component is based on the sample
  • the feature information of the video generates fingerprint information of the sample video (characteristic information for uniquely identifying the video signal, generally a vector), and outputs an SSM corresponding to the fingerprint information of the sample video to the fourth component, so that the The fourth component performs fingerprint information matching to obtain a video associated with the sample video.
  • the fingerprint information of the sample multimedia information used for matching includes the first fingerprint information and the second fingerprint information
  • the text feature corresponding to the first fingerprint information is processed faster, and is suitable for delay sensitivity.
  • the multimedia information retrieval service, the visual feature corresponding to the second fingerprint information has a high degree of representation on the multimedia content, and is suitable for the multimedia information retrieval service with high retrieval accuracy; therefore, the above multimedia information retrieval scheme can simultaneously adapt to different multimedia Search requirements for information retrieval services.
  • the whole multimedia information retrieval scheme can be completed in the form of SSM based on the Storm platform, real-time and efficient processing of multimedia information retrieval is realized; and the scheme can be realized by calling pre-programmed components, thereby improving the development efficiency of the multimedia information retrieval service, Saves on development, deployment, and maintenance costs.
  • the embodiment of the present invention provides a multimedia information retrieval method, for example, a video retrieval method, which is applied to a Storm platform.
  • the video retrieval method in this embodiment can call a pre-programmed general component based on different processing layers.
  • the implementation as shown in FIG. 5, includes: a first component corresponding to the video source layer, indicated by reference numeral 51, corresponding to the second component of the video feature layer, indicated by reference numeral 52, corresponding to the third component of the video fingerprint layer, reference numeral 53
  • the fourth component of the corresponding fingerprint comparison layer, indicated by the reference numeral 54 and the fifth component of the corresponding output layer, is indicated by reference numeral 55.
  • FIG. 6 is a schematic diagram of an application scenario of the video retrieval method of the embodiment, and the server of FIG.
  • the first component, the second component, the third component, the fourth component, and the fifth component shown in FIG. 5 implementing the video retrieval method are disposed thereon, and the server 61 performs video based on the video retrieval request sent by the client 62 in step 1.
  • the search process is performed, and then the step 2 is performed, and the search result is returned to the client 62.
  • the video search method in the embodiment of the present invention includes:
  • Step 201 The first component acquires corresponding video metadata from the video database based on the video retrieval request input by the user, and inputs the video metadata to the second component.
  • the video retrieval request may carry a video ID of a sample video (a video to be retrieved), and acquire video metadata corresponding to the sample video based on the video ID, including: a video ID, a video title, and a video text. Description, video files, etc.
  • the data transmitted between the common components in the embodiment is transmitted in the form of an SSM based on a preset data stream transmission protocol, that is, after the first component acquires the video metadata corresponding to the sample video, the video element needs to be encapsulated.
  • the data is presented in the form of SSM and then input to the second component.
  • the second component de-encapsulates the received SSM and converts the string to obtain the video metadata, and then processes the video metadata;
  • Step 202 The second component performs video feature extraction on the video metadata to obtain video text features and video visual features of the sample video.
  • the sample video file when the video feature extraction is performed, may be divided into multiple video frames (as shown in FIG. 7 , a video frame division is performed on the video file, and the video stream may be divided into multiple scenarios.
  • Each scene may be further divided into a plurality of shots, each of which may be divided into a plurality of frames, and the like, and a plurality of video frames are extracted from the plurality of video frames based on a predetermined feature extraction algorithm.
  • Feature frame of text feature information and visual feature information are examples of text feature information and visual feature information.
  • Step 203 The third component generates a video text fingerprint and a video visual fingerprint corresponding to the sample video based on the video text feature and the video visual feature of the sample video.
  • the feature information of the text feature information and the visual feature information extracted by the third component is encoded, and then a video text fingerprint and a video visual fingerprint corresponding to the sample video are generated to uniquely identify the Sample video.
  • Step 204 The fourth component matches the video text fingerprint and the video visual fingerprint of the sample video with the fingerprint in the preset fingerprint database to obtain a matching result.
  • the fourth component performs similarity matching between the fingerprint of the sample video and the fingerprint in the fingerprint database (the video text fingerprint and the video visual fingerprint are respectively matched), when the similarity reaches a preset similarity threshold (When the similarity between the video text fingerprint and the video visual fingerprint reaches the preset similarity threshold, the matching is determined to be successful; here, the fingerprint matching the fingerprint of the sample video may have one or more fingerprints, and the matching succeeds.
  • the fingerprint may result in one or more video files that are similar to the sample video.
  • Step 205 Output a video retrieval result based on the matching result.
  • the output video retrieval result may include one or more video files similar to the sample video, and when the video retrieval result includes a plurality of video files, the outputting the video retrieval result based on the matching result,
  • the method may be as follows: sorting a plurality of retrieved video files according to a preset sorting strategy (such as video file size, video creation time, etc.), and outputting the sorting result, so that the user can select according to actual needs.
  • the embodiment of the present invention provides a multimedia information retrieval method, for example, a video retrieval method, which is applied to a Storm platform.
  • the video retrieval method in this embodiment can call a pre-programmed general component based on different processing layers.
  • FIG. 8 is a schematic flowchart of a video search method in the embodiment
  • FIG. 9 is a schematic diagram of an application scenario of the video search method according to the embodiment, and the video in the embodiment of the present invention is shown in FIG. 8 and FIG. Search methods include:
  • Step 301 The user uploads a sample video file through a video application (APP, Application) on the terminal.
  • APP video application
  • the sample video file uploaded by the user may be a video file recorded by the user, a movie clip taken by the user, a video file sent by other users, and the like.
  • Step 302 The server invokes the video retrieval device to perform video retrieval on the received sample video file to obtain a retrieval result.
  • the video retrieval device performs feature extraction (video text feature and video visual feature), fingerprint generation (video text fingerprint and video visual fingerprint), and fingerprint matching processing on the sample video file to determine the Whether there is a video file similar to the sample video in the video database corresponding to the video application.
  • the video retrieval processing process performed by the video retrieval apparatus based on the sample video may be a streaming processing process based on a preset data stream transmission protocol, so that the real-time requirement of the video retrieval service may be satisfied.
  • Step 303 Return a video upload indication to the terminal based on the retrieval result.
  • the video upload indication is used to indicate whether the user is allowed to upload the sample video file.
  • the retrieval result indicates that there is a video file similar to the sample video in the video database corresponding to the video application
  • the user is rejected to upload the sample video
  • the retrieval result is used to represent the video database corresponding to the video application
  • the user is allowed to upload the sample video; wherein the video file similar to the sample video has a video fingerprint similarity exceeding a preset similarity threshold (eg, 80%, A video file that can be set according to actual needs.
  • a preset similarity threshold eg, 80%, A video file that can be set according to actual needs.
  • Copyright infringement Since the processing of the video file uploaded by the user is real-time streaming processing, the video retrieval efficiency is improved, and at the same time, the video retrieval of the video file uploaded by the user is based on the video text fingerprint and the video visual fingerprint of the video file, which can simultaneously satisfy Different retrieval needs for video retrieval services.
  • FIG. 10A to FIG. 10C are schematic diagrams showing application scenarios of the video retrieval method according to the embodiment, as shown in FIG. 10A to FIG.
  • the user A receives the video file X sent by the user B through the instant messaging APP on the terminal 2 through the instant messaging APP (such as WeChat, QQ) on the terminal 1, as shown in FIG. 10A.
  • the user A wants to view more video files related to the video file X, and sends a video retrieval request (carrying the instant messaging APP identifier and the video file X identifier) to the server.
  • the implementation may be Long press the video file X, click on the video search, as shown in FIG. 10B; the server invokes the video retrieval device to perform video retrieval on the video file X to determine whether the video database corresponding to the instant messaging APP is similar to the video file X.
  • Video file then return the video retrieval result to the conversation window of user A and user B, for example, if the video file similar to the video file X is not retrieved, return the words "the relevant video file is not retrieved" to the user A a dialog window of the user B; if the video file similar to the video file X is retrieved, return the retrieved video file, or the URL corresponding to the retrieved video file, or the hyperlink corresponding to the retrieved video file, etc.
  • the dialog window of User A and User B is as shown in FIG. 10C.
  • the video retrieval function is used to perform video retrieval, thereby obtaining more interest.
  • the related video files enrich the user selection and improve the user experience.
  • the embodiment of the present invention provides a multimedia information retrieval device.
  • the multimedia information retrieval device in the embodiment of the present invention includes: an acquisition module 31, a processing module 32, and a matching module 33;
  • the obtaining module 31 is configured to receive a multimedia information retrieval request; the multimedia information retrieval request carries a sample multimedia information retrieval identifier; and, according to the multimedia information retrieval request, acquire attribute information of the sample multimedia information;
  • the processing module 32 is configured to determine fingerprint information of the sample multimedia information based on attribute information of the sample multimedia information; the fingerprint information of the sample multimedia information includes: a text feature for characterizing the sample multimedia information a fingerprint information, and second fingerprint information for characterizing a visual feature of the sample multimedia information;
  • the matching module 33 is configured to match the fingerprint information of the sample multimedia information with the fingerprint information in the preset fingerprint database to obtain multimedia information associated with the sample multimedia information.
  • the acquiring module 31 is further configured to: according to the sample multimedia information retrieval identifier carried by the multimedia information retrieval request, query a preset multimedia information database, obtain attribute information of the sample multimedia information, and obtain The attribute information of the sample multimedia information is presented in the form of a data stream message.
  • the obtaining module 31 is further configured to encapsulate the attribute information of the sample multimedia information in the form of a key-value data pair based on a preset data stream transmission protocol, to obtain a multimedia information corresponding to the sample.
  • Data stream messages for attribute information (such as SSM).
  • the processing module 32 is further configured to decapsulate the data stream message corresponding to the attribute information of the sample multimedia information according to the preset data stream transmission protocol, to obtain the sample multimedia. a key-value data pair of attribute information of the information; performing string conversion on the obtained key-value data pair to obtain attribute information of the sample multimedia information; and generating the sample multimedia according to the attribute information of the sample multimedia information Fingerprint information of the information; based on the preset data stream transmission protocol, encapsulating the fingerprint information of the sample multimedia information in the form of a key-value data pair to obtain a data stream message corresponding to the fingerprint information of the sample multimedia information.
  • the processing module 32 includes:
  • a feature extraction unit configured to perform feature extraction on the attribute information of the sample multimedia information to obtain feature information of the sample multimedia information;
  • the feature information of the sample multimedia information includes: a text feature of the sample multimedia information and the Visual characteristics of the sample multimedia information;
  • the fingerprint generating unit is configured to generate fingerprint information of the sample multimedia information based on the feature information of the sample multimedia information.
  • the acquiring module 31, the processing module 32, and the matching module 33 in the multimedia information retrieval device may be a central processing unit (CPU) or a digital signal processor (DSP) in the terminal. Digital Signal Processor), or Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC).
  • CPU central processing unit
  • DSP digital signal processor
  • Digital Signal Processor or Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC).
  • ASIC Application Specific Integrated Circuit
  • An embodiment of the present invention provides a multimedia information retrieval apparatus, where the apparatus includes: a multimedia information input component, a feature processing component, and a matching component; the multimedia information input component, the feature processing component, and the matching component support based on a user The input instruction is invoked, and a logical connection relationship between the multimedia information input component, the feature processing component, and the matching component is established;
  • the multimedia information input component is configured to receive a multimedia information retrieval request based on a user input; the multimedia information retrieval request carries a sample multimedia information retrieval identifier; and acquiring attribute information of the sample multimedia information based on the multimedia information retrieval request ;
  • the feature processing component is configured to obtain attribute information of the sample multimedia information based on a logical connection relationship with the multimedia information input component, and determine fingerprint information of the sample multimedia information based on attribute information of the sample multimedia information.
  • the fingerprint information of the sample multimedia information includes: first fingerprint information for characterizing a text feature of the sample multimedia information, and second fingerprint information for characterizing a visual feature of the sample multimedia information;
  • the matching component is configured to obtain fingerprint information including the first fingerprint information and the second fingerprint information based on a logical connection relationship with the feature processing component, and the fingerprint information and a preset fingerprint database The fingerprint information in the match is matched to obtain multimedia information associated with the sample multimedia information.
  • the feature processing component includes: a feature extraction component and a fingerprint generation component; wherein
  • the feature extraction component is configured to perform feature extraction on attribute information of the sample multimedia information to obtain feature information of the sample multimedia information; and the feature information of the sample multimedia information includes: text feature of the sample multimedia information a visual characteristic of the sample multimedia information;
  • the fingerprint generating component is configured to generate fingerprint information of the sample multimedia information based on feature information of the sample multimedia information.
  • the apparatus further includes an output component configured to output multimedia information associated with the sample multimedia information.
  • the multimedia information input component, the feature extraction component, the fingerprint generation component, the matching component, and the output component in the embodiment respectively correspond to the video source layer, the video feature layer, the video fingerprint layer, and the video comparison layer in FIG. 5, And when the call is called, the corresponding functions of each processing layer are executed.
  • An embodiment of the present invention further provides a multimedia information retrieval apparatus, the multimedia information retrieval apparatus comprising: a processor and a memory configured to store a computer program executable on the processor,
  • processor configured to execute the computer program, executes:
  • the multimedia information retrieval request carrying a sample multimedia information retrieval identifier
  • the fingerprint information of the sample multimedia information includes: first fingerprint information used to represent text features of the sample multimedia information, and used for characterization Second fingerprint information of the visual feature of the sample multimedia information;
  • the fingerprint information of the sample multimedia information is matched with the fingerprint information in the preset fingerprint database to obtain multimedia information associated with the sample multimedia information.
  • the processor further configured to execute the computer program, executes:
  • the processor further configured to execute the computer program, executes:
  • the attribute information of the sample multimedia information is encapsulated in the form of a key-value data pair based on a preset data stream transmission protocol, and a data stream message corresponding to the attribute information of the sample multimedia information is obtained.
  • the processor further configured to execute the computer program, executes:
  • the fingerprint information of the sample multimedia information is encapsulated in the form of a key-value data pair according to the preset data stream transmission protocol, to obtain a data stream message corresponding to the fingerprint information of the sample multimedia information.
  • the processor further configured to execute the computer program, executes:
  • the feature information of the sample multimedia information includes: a text feature of the sample multimedia information and a visual feature of the sample multimedia information ;
  • the embodiment of the invention further provides a storage medium on which computer instructions are stored, and when executed by the processor, the following steps are implemented:
  • the multimedia information retrieval request carrying a sample multimedia information retrieval identifier
  • the fingerprint information of the sample multimedia information includes: first fingerprint information used to represent text features of the sample multimedia information, and used for characterization Second fingerprint information of the visual feature of the sample multimedia information;
  • the fingerprint information of the sample multimedia information is matched with the fingerprint information in the preset fingerprint database to obtain multimedia information associated with the sample multimedia information.
  • the session management apparatus includes a processor 71, a storage medium 72, and at least one external communication interface 73; the storage medium 72 stores an executable program 721; the processor 71, the storage medium 72, and the external communication interface 73 pass The bus 74 is connected.
  • the foregoing storage medium includes: a mobile storage device, a random access memory (RAM), a read-only memory (ROM), a magnetic disk, or an optical disk.
  • RAM random access memory
  • ROM read-only memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making A computer device (which may be a personal computer, terminal, or network device, etc.) performs all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a RAM, a ROM, a magnetic disk, or an optical disk.

Abstract

本发明实施例公开了一种多媒体信息检索方法、装置及存储介质,所述方法包括:接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,以获取与所述样本多媒体信息相关联的多媒体信息。

Description

多媒体信息检索方法、装置及存储介质
相关申请的交叉引用
本申请基于申请号为201710326718.3、申请日为2017年05月10日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本发明涉及电数字数据处理技术,尤其涉及多媒体信息检索方法、装置及存储介质。
背景技术
随着网络媒体技术的飞速发展,越来越多的信息通过视频、音频等多媒体形式展现在互联网中,使得多媒体数据呈现爆发式的增长。海量的多媒体视频丰富了人们生活的同时,对这些视频数据的管理存在巨大的技术问题。
在实际视频检索业务中,对视频检索的需求不同导致了视频检索业务的丰富性及复杂性,然而,相关技术并不存在视频检索方案能够实现多样性的视频检索业务。
发明内容
本发明实施例期望提供一种多媒体信息检索方法、装置及存储介质,能够实现多样性的多媒体信息检索业务。
本发明实施例的技术方案是这样实现的:
第一方面,本发明实施例提供了一种多媒体信息检索方法,所述方法 包括:
接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联的多媒体信息。
第二方面,本发明实施例提供了一种多媒体信息检索装置,所述装置包括:获取模块、处理模块及匹配模块;其中,
所述获取模块,配置为接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
以及,基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
所述处理模块,配置为基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
所述匹配模块,配置为将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联 的多媒体信息。
第三方面,本发明实施例提供了一种多媒体信息检索装置,所述装置包括:
处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,
所述处理器用于运行所述计算机程序时,执行上述的多媒体信息检索方法。
第四方面,本发明实施例提供了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令用于执行上述的多媒体信息检索方法。
第五方面,本发明实施例提供了一种多媒体信息检索方法,所述方法由服务器执行,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;所述方法包括:
接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联的多媒体信息。
本发明实施例提供的多媒体信息检索方法、装置及存储介质,由于样本多媒体信息的指纹信息包含了第一指纹信息和第二指纹信息,而第一指纹信息对应的文本特征的处理速度较快,适用于对时延敏感的多媒体信息检索业务,第二指纹信息对应的视觉特征对多媒体内容的代表程度较高,适用于对检索准确率要求较高的多媒体信息检索业务;在基于样本多媒体信息进行多媒体信息检索时,无论该多媒体信息检索业务是对时延敏感的多媒体信息检索业务,还是对检索准确率要求较高的多媒体信息检索业务,均可实现样本多媒体信息的指纹信息与指纹数据库中的指纹信息的匹配;因此,上述多媒体信息检索方案能够适应不同多媒体信息检索业务的检索需求,具有较强的通用性。
附图说明
图1为本发明实施例中进行信息交互的各方硬件实体的示意图;
图2为本发明实施例中多媒体信息检索方法的流程示意图一;
图3所示为Storm Topology的结构示意图;
图4为本发明实施例中组件之间基于预设的数据流传输协议进行数据处理的示意图;
图5为本发明实施例中多媒体信息检索方法的流程示意图二;
图6为本发明实施例中视频检索方法的应用场景示意图一;
图7为本发明实施例中对视频文件进行视频帧划分的示意图;
图8为本发明实施例中多媒体信息检索方法的流程示意图三;
图9为本发明实施例中视频检索方法的应用场景示意图二;
图10A至图10C为本发明实施例中视频检索方法的应用场景示意图三;
图11为本发明实施例中多媒体信息检索装置的组成结构示意图;
图12为本发明实施例中多媒体信息检索装置作为硬件实体的示例图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
视频指纹是一种软件识别、提取、压缩视频的技术,可以使用处理过程中产生的唯一的“指纹”来代表一个视频文件。视频指纹分析可以基于任何视觉的视频特征,包括(但不限于),视频流中的关键帧序列分析,色彩和运动的变化等特征。
图1为本发明实施例中进行信息交互的各方硬件实体的示意图,图1中包括:服务器11……1n、终端设备21-24,终端设备21-24通过有线网络或者无线网络与服务器进行信息交互,终端设备包括手机、台式机、PC机、一体机等类型。一个示例中,用户通过终端设备上传样本视频文件(即待检索视频文件,如电影片段、音乐视频)至某一视频网站,服务器基于用户上传的视频文件进行视频检索,并返回视频检索结果。
本发明实施例提供了一种多媒体信息检索方法,如图2所示,本发明实施例中多媒体信息检索方法包括:
步骤101:接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识。
这里,多媒体信息包括视频信息、音频信息等;多媒体信息检索请求用于请求进行多媒体信息检索,如进行视频检索;所述样本多媒体信息检索标识用于标识所述样本多媒体信息,如当所述样本多媒体信息为样本视频(或称为检索视频)信息时,该检索标识可以为视频标识(ID,Identification),还可以为样本多媒体文件的标题等。
步骤102:基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息。
在实际应用中,进行多媒体信息检索需要基于用户提供的样本多媒体 信息的属性信息,因此,需要根据用户提交的多媒体信息检索请求携带的样本多媒体信息检索标识,获取所述样本多媒体信息的属性信息,在一实施例中,可以基于样本多媒体信息检索标识查询预设的多媒体信息库,得到所述样本多媒体信息的属性信息(包括样本多媒体信息的检索标识、标题、样本多媒体信息文本描述、样本多媒体文件等),如所述样本多媒体信息的元数据;在实际应用中,所述预设的多媒体信息库中存储了至少一组样本多媒体信息检索标识与样本多媒体信息的属性信息的映射关系,可基于检索标识查找得到对应的属性信息;示例性地,根据样本视频的视频ID查询预设的视频数据库,得到所述样本视频的元数据(视频ID、视频标题、视频文本描述、视频文件等)。
在一实施例中,获取所述样本多媒体信息的属性信息之后,以数据流消息的形式呈现所述样本多媒体信息的属性信息,如以数据流共享消息(SSM,Stream Shared Message)的形式呈现所述样本多媒体信息的属性信息,也就是说,后续基于样本多媒体信息的属性信息的所有处理均为实时的流式处理(对数据流的处理),示例性地,可以通过调用预编程的N个组件实现本实施例所述方法的各个操作步骤,N为不小于2的正整数;如此,一方面可以满足对时延敏感的多媒体信息检索业务对于端到端延时的实时需求,另一方面具有较大的可扩展性,以有能力处理越来大的多媒体数据量及日益复杂的用户检索需求。
在一实施例中,基于预设的数据流传输协议,以键-值数据对(如key-value数据对)的形式封装所述样本多媒体信息的属性信息,得到对应所述样本多媒体信息的属性信息的SSM。示例性地,基于预设的数据流传输协议,以key-value数据对的形式封装得到的样本视频的元数据,得到对应所述样本视频的元数据的SSM。
步骤103:基于样本多媒体信息的属性信息确定样本多媒体信息的指纹 信息。本实施例中,所述样本多媒体信息的指纹信息包括:用于表征样本多媒体信息的文本特征的第一指纹信息,以及用于表征样本多媒体信息的视觉特征的第二指纹信息。
在一实施例中,由于样本多媒体信息的属性信息以SSM的形式呈现,因此,在执行本步骤操作时需要根据所述预设的数据流传输协议,解封装所述对应所述样本多媒体信息的属性信息的数据流消息,得到存储所述样本多媒体信息的属性信息的键-值数据对;然后,对得到的所述键-值数据对进行字符串转换,得到所述样本多媒体信息的属性信息。
在实际应用中,步骤103可以包括:对样本多媒体信息的属性信息进行特征提取,得到样本多媒体信息的特征信息,然后,基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息;其中,所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
示例性地,对样本视频的属性信息进行特征提取,得到样本视频的视频特征,然后,基于所述样本视频的视频特征生成所述样本视频的指纹;其中,所述样本视频的视频特征包括:所述样本视频的视频文本特征和所述样本视频的视频视觉特征。
在一实施例中,所述对样本视频的属性信息进行特征提取,得到样本视频的视频特征,可以采用这样的方式:将所述样本视频划分为多个视频帧,基于预定的特征提取算法从所述多个视频帧中提取若干表示所述多个视频帧特定特征信息(文本特征信息及视觉特征信息)的特征帧,以对提取的特征帧进行编码处理得到所述样本视频的视频指纹。
在一实施例中,生成所述样本多媒体信息的指纹信息之后,基于所述预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的指纹信息,得到对应所述样本多媒体信息的指纹信息的数据流消息(如 SSM)。
步骤104:将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,以获取与所述样本多媒体信息相关联的多媒体信息。
在一实施例中,在步骤104之前,还可以基于存储的多媒体信息建立与之对应的多媒体信息的指纹数据库;可通过对存储的多个多媒体文件进行特征提取、指纹生成后得到所述多个多媒体文件的指纹,然后将得到的所述多个多媒体文件的指纹存入指纹数据库。
在一实施方式中,将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,可通过比较指纹的相似度来执行,当相似度超过预设相似度阈值时,判定匹配成功,否则判定匹配失败;其中,所述相似度阈值可依据实际情况进行设定。
当匹配成功时,获取与所述样本多媒体信息的指纹信息匹配的指纹对应的多媒体信息,并将其作为与所述样本多媒体信息相关联的多媒体信息;而当匹配失败时,表明不存在与所述样本多媒体信息相关联的多媒体信息。
示例性地,将样本视频的视频指纹与预设的指纹数据库中的指纹进行匹配,当匹配成功时,与该样本视频的视频指纹匹配的一个或多个指纹对应的一个或多个视频文件即为与该样本视频相关联的视频文件(可以为与样本视频文件近似的一个或多个视频文件)。
在一实施例中,本发明实施例上述多媒体信息检索方法可在Storm平台(并不限于该平台,如Flink、Spark Streaming亦可)上通过调用预编程的组件实现,各组件之间传输SSM;Storm平台为开源分布式实时流式计算平台,Storm平台中的各个逻辑节点间实时进行数据流传输,如图3所示为Storm Topology的结构示意图,其中,Topology为Storm平台中任务的逻辑拓扑,可以在Storm平台中的引擎上执行;Spout为Storm Topology中产生数据流的逻辑节点,即数据流生产节点,如图3中标号1所示,Bolt 为Storm Topology中消费数据流的逻辑节点,即数据流消费节点,如图3中标号2所示。
对于某个特定的视频检索业务,用户在Java IDE中选用需要的可复用组件,并将它们连接成Storm Topology:
TopologyBuilder builder=new TopologyBuilder();
builder.setSpout(“getTask”,new GetTaskSpout());
builder.setBolt(“textFea”,new TextFeaBolt())after(“getTask”);
builder.setBolt(“textSig”,new TextSigBolt())after(“textFea”);
builder.setBolt(“textSim”,new TextSimBolt())after(“textSig”);
builder.setBolt(“visualFea”,new VisualFeaBolt())after(“textSim”);
builder.setBolt(“visualSig”,new VisualSigBolt())after(“visualFea”);
builder.setBolt(“visualSim”,new VisualSimBolt())after(“visualSig”);
builder.setBolt(“output”,new OutputBolt())after(“visualSim”);
StormSubmitter submitTopology(builder);
其中,GetTaskSpout、TextFeaBolt、TextSigBolt等为预编程的组件,相互之间存在数据流的组件通过after()来连接在一起。完成Topology的定义之后,通过Storm的接口将此Topology提交到集群上运行。
作为示例,组件之间基于预设的数据流传输协议进行数据处理的过程如图4所示,首先,下游组件接收上游组件输入的SSM 1,对接收的SSM 1进行解封装,获得SSM 1中的key-value数据对(字符串类型,以JSON格式存储),然后对获得的key-value数据对进行字符串转换成真实的数据,并基于转换得到的数据产生新的数据(即对该数据进行相应处理,如特征提取、指纹生成等),使用JSON工具将新的数据和已有的数据封装成SSM 2,并将生成的SSM 2输出给自身的下游组件。
示例性地,第一组件至第四组件均为Storm平台上预编程的组件,当 第一组件生成对应所述样本多媒体信息的属性信息的SSM后,将其输出给第二组件,第二组件基于所述预设的数据流传输协议对接收的SSM进行解封装、字符串转换得到样本多媒体信息的属性信息,然后对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息,然后封装成对应所述样本多媒体信息的特征信息的SSM并输出给第三组件;第三组件基于预设的数据流传输协议对接收的SSM进行解封装、字符串转换,然后基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息,封装成对应所述样本多媒体信息的指纹信息的SSM并输出给第四组件,以使所述第四组件进行解封装、字符串转换后进行指纹信息匹配,进而得到与所述样本多媒体信息相关联的多媒体信息。
例如:当进行视频检索时,当第一组件得到对应所述样本视频的属性信息的SSM后,输出至第二组件,第二组件基于所述预设的数据流传输协议对所述样本视频的属性信息进行特征提取,得到所述样本视频的特征信息(视频低级特征,表示为高维向量),并输出对应所述样本视频的特征信息的SSM给第三组件;第三组件基于所述样本视频的特征信息生成所述样本视频的指纹信息(用于唯一性辨识视频信号的特征信息,一般为向量),并输出对应所述样本视频的指纹信息的SSM给第四组件,以使所述第四组件进行指纹信息匹配,进而得到与所述样本视频相关联的视频。
应用本发明实施例,由于用于匹配的样本多媒体信息的指纹信息包含了第一指纹信息和第二指纹信息,而第一指纹信息对应的文本特征的处理速度较快,适用于对时延敏感的多媒体信息检索业务,第二指纹信息对应的视觉特征对多媒体内容的代表程度较高,适用于对检索准确率要求较高的多媒体信息检索业务;因此,上述多媒体信息检索方案能够同时适应不同多媒体信息检索业务的检索需求。由于整个多媒体信息检索方案可基于Storm平台采用SSM的形式完成,实现了对多媒体信息检索实时、高效的 处理;且由于可调用预编程的组件实现该方案,提升了多媒体信息检索业务的开发效率、节约了开发、部署和维护的成本。
本发明实施例提供了一种多媒体信息检索方法,例如为一种视频检索方法,所述方法应用于Storm平台,本实施例中的视频检索方法可调用预编程的通用组件基于不同的处理层来实现,如图5所示,包括:对应视频源层的第一组件,标号51所示,对应视频特征层的第二组件,标号52所示,对应视频指纹层的第三组件,标号53所示,对应指纹比较层的第四组件,标号54所示,以及对应输出层的第五组件,标号55所示;图6所示为本实施例视频检索方法的应用场景示意图,图6中服务器上设置有实现视频检索方法的图5中所示的第一组件、第二组件、第三组件、第四组件及第五组件,服务器61基于步骤1中客户端62发送的视频检索请求进行视频检索处理,然后执行步骤2,即将检索结果返回给客户端62,结合图5、图6所示,本发明实施例中视频检索方法包括:
步骤201:第一组件基于用户输入的视频检索请求从视频数据库中获取相应的视频元数据,输入视频元数据至第二组件。
在一实施例中,所述视频检索请求可携带样本视频(待检索视频)的视频ID,基于所述视频ID获取所述样本视频对应的视频元数据,包括:视频ID、视频标题、视频文本描述、视频文件等。
需要说明的是,本实施例中各个通用组件间传输的数据基于预设的数据流传输协议以SSM的形式传输,也即,第一组件获取样本视频对应的视频元数据后需要封装该视频元数据以SSM的形式呈现该视频元数据,然后输入至第二组件,第二组件对接收到的SSM解封装、字符串转换得到该视频元数据后再进行处理;如图4所示为本发明实施例中组件之间基于预设的数据流传输协议进行数据处理的过程示意图。
步骤202:第二组件对所述视频元数据进行视频特征提取,得到所述样 本视频的视频文本特征和视频视觉特征。
在一实施例中,在进行上述视频特征提取时,可以将样本视频文件划分为多个视频帧(如图7所示为对视频文件进行视频帧划分的示意图,视频流可划分为多个场景,每一个场景又可划分为多个镜头,每一个镜头可划分为多个帧,等等),基于预定的特征提取算法从所述多个视频帧中提取若干表示所述多个视频帧的文本特征信息及视觉特征信息的特征帧。
步骤203:第三组件基于所述样本视频的视频文本特征和视频视觉特征生成对应所述样本视频的视频文本指纹和视频视觉指纹。
在实际应用中,基于第三组件提取的文本特征信息及视觉特征信息的特征帧,对其进行编码处理,进而生成对应所述样本视频的视频文本指纹和视频视觉指纹,以唯一的标识所述样本视频。
步骤204:第四组件将所述样本视频的视频文本指纹和视频视觉指纹与预设的指纹数据库中的指纹进行匹配,得到匹配结果。
在一实施例中,第四组件将所述样本视频的指纹与指纹数据库中的指纹进行相似度匹配(视频文本指纹和视频视觉指纹分别进行匹配),当相似度达到预设的相似度阈值(视频文本指纹和视频视觉指纹的相似度均达到预设相似度阈值)时,确定匹配成功;这里,与所述样本视频的指纹匹配成功的指纹可以有一个或多个,而通过所述匹配成功的指纹可得到一个或多个与所述样本视频近似的视频文件。
步骤205:基于所述匹配结果输出视频检索结果。
在一实施例中,输出的视频检索结果可包括一个或多个与所述样本视频相似的视频文件,当视频检索结果包括多个视频文件时,所述基于所述匹配结果输出视频检索结果,可以采用这样的方式:根据预设的排序策略(如视频文件大小、视频创建时间等)对检索到的多个视频文件进行排序,输出排序结果,以供用户依据实际需要进行选择。
本发明实施例提供了一种多媒体信息检索方法,例如为一种视频检索方法,所述方法应用于Storm平台,本实施例中的视频检索方法可调用预编程的通用组件基于不同的处理层来实现;图8所示为本实施例中视频检索方法流程示意图,图9所示为本实施例视频检索方法的应用场景示意图,结合图8、图9所示所示,本发明实施例中视频检索方法包括:
步骤301:用户通过终端上的视频应用(APP,Application)上传样本视频文件。
例如,用户上传的样本视频文件可以为用户自己录制的视频文件、用户截取的电影片段、用户接收的其它用户发送的视频文件等等。
步骤302:服务器调用视频检索装置对接收的样本视频文件进行视频检索,获得检索结果。
在一实施例中,所述视频检索装置对所述样本视频文件进行特征提取(视频文本特征及视频视觉特征)、指纹生成(视频文本指纹及视频视觉指纹)、指纹匹配处理,以确定所述视频应用对应的视频数据库中是否存在与所述样本视频相似的视频文件。
需要说明的是,所述视频检索装置基于所述样本视频进行的视频检索处理过程可以为基于预设的数据流传输协议的流式处理过程,如此可以满足视频检索业务的实时性需求。
步骤303:基于所述检索结果返回视频上传指示给所述终端。
所述视频上传指示用于指示是否允许所述用户上传所述样本视频文件。
例如,当检索结果表征所述视频应用对应的视频数据库中存在与所述样本视频相似的视频文件时,拒绝所述用户上传所述样本视频,当检索结果表征所述视频应用对应的视频数据库中不存在与所述样本视频相似的视频文件时,允许所述用户上传所述样本视频;其中,与所述样本视频相似 的视频文件为视频指纹相似度超过预设相似度阈值(如80%,可依据实际需要进行设定)的视频文件。
应用本发明实施例,对用户上传视频网站的视频文件进行视频检索,以确定该视频网站对应的视频数据库中是否有类似视频,进而基于检索结果判定是否允许用户进行视频上传,如此,防止用户进行版权侵权。由于对用户上传的视频文件的处理为实时性流式处理,提高了视频检索效率,同时由于对用户上传的视频文件进行视频检索基于的是视频文件的视频文本指纹及视频视觉指纹,能够同时满足对视频检索业务的不同检索需求。
本发明实施例提供了一种多媒体信息检索方法,例如为一种视频检索方法,图10A至图10C所示为本实施例视频检索方法的应用场景示意图,如图10A至图10C所示所示,本发明实施例中视频检索方法中:用户A通过终端1上的即时通讯APP(如微信、QQ)接收用户B通过终端2上的即时通讯APP发送的视频文件X,如图10A所示。由于实际需要,用户A希望观看更多与视频文件X相关的视频文件,发送视频检索请求(携带所述即时通讯APP标识及所述视频文件X标识)给服务器,在实际应用中,实现可以为长按该视频文件X,点击视频检索,如图10B;服务器调用视频检索装置对所述视频文件X进行视频检索,以确定所述即时通讯APP对应的视频数据库是否存在与所述视频文件X相似的视频文件,然后返回视频检索结果至用户A和用户B的对话窗口,例如,如果未检索到与所述视频文件X相似的视频文件,返回“未检索到相关视频文件”等字样至用户A和用户B的对话窗口;如果检索到与所述视频文件X相似的视频文件,返回检索到的视频文件、或检索到的视频文件对应的网址、或检索到的视频文件对应的超级链接等至用户A和用户B的对话窗口,如图10C所示。
应用本发明实施例,当用户通过即时通讯APP与其她用户聊天过程中发现自身感兴趣的视频,想要进一步观看更多类似视频文件时,通过视频 检索功能进行视频检索,进而获得更多感兴趣的相关视频文件,丰富了用户选择、提高了用户体验。
本发明实施例提供了一种多媒体信息检索装置,如图11所示,本发明实施例中所述多媒体信息检索装置包括:获取模块31、处理模块32及匹配模块33;其中,
所述获取模块31,配置为接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;以及,基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
所述处理模块32,配置为基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
所述匹配模块33,配置为将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,以获取与所述样本多媒体信息相关联的多媒体信息。
在一实施例中,所述获取模块31,还配置为基于所述多媒体信息检索请求携带的样本多媒体信息检索标识,查询预设的多媒体信息库,得到所述样本多媒体信息的属性信息,并以数据流消息的形式呈现所述样本多媒体信息的属性信息。
在一实施例中,所述获取模块31,还配置为基于预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的属性信息,得到对应所述样本多媒体信息的属性信息的数据流消息(如SSM)。
在一实施例中,所述处理模块32,还配置为根据所述预设的数据流传输协议,解封装所述对应所述样本多媒体信息的属性信息的数据流消息,得到存储所述样本多媒体信息的属性信息的键-值数据对;对得到的所述键- 值数据对进行字符串转换,得到所述样本多媒体信息的属性信息;根据所述样本多媒体信息的属性信息生成所述样本多媒体信息的指纹信息;基于所述预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的指纹信息,得到对应所述样本多媒体信息的指纹信息的数据流消息。
在一实施例中,所述处理模块32包括:
特征提取单元,配置为对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
指纹生成单元,配置为基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息。
在本发明实施例中,所述多媒体信息检索装置中的获取模块31、处理模块32及匹配模块33,均可由终端中的中央处理器(CPU,Central Processing Unit)或数字信号处理器(DSP,Digital Signal Processor)、或现场可编程门阵列(FPGA,Field Programmable Gate Array)、或集成电路(ASIC,Application Specific Integrated Circuit)实现。
本发明实施例提供了一种多媒体信息检索装置,所述装置包括:多媒体信息输入组件、特征处理组件及匹配组件;所述多媒体信息输入组件、所述特征处理组件以及所述匹配组件支持基于用户输入的指令被调用,且建立所述多媒体信息输入组件、所述特征处理组件以及所述匹配组件之间的逻辑连接关系;其中,
所述多媒体信息输入组件,配置为基于用户的输入接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
所述特征处理组件,配置为基于与所述多媒体信息输入组件之间的逻 辑连接关系获得所述样本多媒体信息的属性信息,基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
所述匹配组件,配置为基于与所述特征处理组件之间的逻辑连接关系获得包括所述第一指纹信息和所述第二指纹信息的指纹信息,将所述指纹信息与预设的指纹数据库中的指纹信息进行匹配,以获取与所述样本多媒体信息相关联的多媒体信息。
在一实施例中,所述特征处理组件包括:特征提取组件及指纹生成组件;其中,
所述特征提取组件,配置为对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
所述指纹生成组件,配置为基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息。
在一实施例中,所述装置还包括输出组件,配置为输出与所述样本多媒体信息相关联的多媒体信息。
在一实施例中本实施例中所述多媒体信息输入组件、特征提取组件、指纹生成组件、匹配组件及输出组件分别对应图5中视频源层、视频特征层、视频指纹层、视频比较层,并在被调用时执行各个处理层相应的功能。
需要说明的是:上述实施例提供的多媒体信息检索装置在进行多媒体信息检索时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外, 上述实施例提供的多媒体信息检索装置中未披露的技术细节,可以参照图2的相关说明而理解。
本发明实施例还提供了一种多媒体信息检索装置,该多媒体信息检索装置包括:处理器和配置为存储能够在处理器上运行的计算机程序的存储器,
其中,所述处理器,配置为运行所述计算机程序时,执行:
接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,以获取与所述样本多媒体信息相关联的多媒体信息。
所述处理器,还配置为运行所述计算机程序时,执行:
基于所述多媒体信息检索请求携带的样本多媒体信息检索标识,查询预设的多媒体信息库,得到所述样本多媒体信息的属性信息,并以数据流消息的形式呈现所述样本多媒体信息的属性信息。
所述处理器,还配置为运行所述计算机程序时,执行:
基于预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的属性信息,得到对应所述样本多媒体信息的属性信息的数据流消息。
所述处理器,还配置为运行所述计算机程序时,执行:
根据所述预设的数据流传输协议,解封装所述对应所述样本多媒体信 息的属性信息的数据流消息,得到存储所述样本多媒体信息的属性信息的键-值数据对;
对得到的所述键-值数据对进行字符串转换,得到所述样本多媒体信息的属性信息;
根据所述样本多媒体信息的属性信息生成所述样本多媒体信息的指纹信息;
基于所述预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的指纹信息,得到对应所述样本多媒体信息的指纹信息的数据流消息。
所述处理器,还配置为运行所述计算机程序时,执行:
对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息。
本发明实施例还提供了一种存储介质,其上存储有计算机指令,该指令被处理器执行时实现以下步骤:
接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息 进行匹配,以获取与所述样本多媒体信息相关联的多媒体信息。
本发明实施例中,多媒体信息检索装置作为硬件实体的一个示例如图12所示。所述会话管理装置包括处理器71、存储介质72以及至少一个外部通信接口73;所述存储介质72中存储有可执行程序721;所述处理器71、存储介质72以及外部通信接口73均通过总线74连接。
本领域的技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、随机存取存储器(RAM,Random Access Memory)、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、终端、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、RAM、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (23)

  1. 一种多媒体信息检索方法,包括:
    接收到多媒体信息检索请求;
    基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
    基于所述样本多媒体信息的属性信息,确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息至少包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息;
    将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
    当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联的多媒体信息。
  2. 根据权利要求1所述的方法,其中,所述样本多媒体信息的指纹信息还包括用于表征所述样本多媒体信息的视觉特征的第二指纹信息,所述将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,包括:
    将所述样本多媒体信息的第一指纹信息以及第二指纹信息至少其中之一者与预设的指纹数据库中的指纹信息进行匹配。
  3. 根据权利要求1所述的方法,其中,所述基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息,包括:
    基于所述多媒体信息检索请求携带的样本多媒体信息检索标识,查询预设的多媒体信息库,得到所述样本多媒体信息的属性信息;
    以数据流消息的形式呈现所述样本多媒体信息的属性信息。
  4. 根据权利要求3所述的方法,其中,所述以数据流消息的形式呈现所述样本多媒体信息的属性信息,包括:
    基于预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒 体信息的属性信息,得到对应所述样本多媒体信息的属性信息的数据流消息。
  5. 根据权利要求4所述的方法,其中,所述以数据流消息的形式呈现所述样本多媒体信息的属性信息,包括:
    根据所述预设的数据流传输协议,解封装所述对应所述样本多媒体信息的属性信息的数据流消息,得到存储所述样本多媒体信息的属性信息的键-值数据对;
    对得到的所述键-值数据对进行字符串转换,得到所述样本多媒体信息的属性信息。
  6. 根据权利要求4所述的方法,其中,所述基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息,包括:
    基于所述预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的指纹信息,得到对应所述样本多媒体信息的指纹信息的数据流消息。
  7. 根据权利要求1或2所述的方法,其中,所述基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息,包括:
    对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;其中,所述样本多媒体信息的特征信息至少包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
    基于所述样本多媒体信息的特征信息,分别生成所述样本多媒体信息的第一指纹信息和第二指纹信息。
  8. 根据权利要求7所述的方法,其中,所述基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息,包括:
    将所述样本多媒体信息划分为多个视频帧;
    基于预定的特征提取算法从所述多个视频帧中提取表示所述多个视频 帧的文本特征信息及视觉特征信息的特征帧;以及
    对提取的特征帧进行编码处理,得到所述样本多媒体信息的视频指纹信息。
  9. 一种多媒体信息检索装置,所述装置包括:获取模块、处理模块及匹配模块;其中,
    所述获取模块,配置为接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
    以及,基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
    所述处理模块,配置为基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息至少包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
    所述匹配模块,配置为将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
    以及当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联的多媒体信息。
  10. 根据权利要求9所述的装置,其中,
    所述获取模块,还配置为基于所述多媒体信息检索请求携带的样本多媒体信息检索标识,查询预设的多媒体信息库,得到所述样本多媒体信息的属性信息,并以数据流消息的形式呈现所述样本多媒体信息的属性信息。
  11. 根据权利要求10所述的装置,其中,
    所述获取模块,还配置为基于预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的属性信息,得到对应所述样本多媒体信息的属性信息的数据流消息。
  12. 根据权利要求11所述的装置,其中,
    所述处理模块,还配置为根据所述预设的数据流传输协议,解封装所述对应所述样本多媒体信息的属性信息的数据流消息,得到存储所述样本多媒体信息的属性信息的键-值数据对;
    对得到的所述键-值数据对进行字符串转换,得到所述样本多媒体信息的属性信息;
    根据所述样本多媒体信息的属性信息生成所述样本多媒体信息的指纹信息;
    基于所述预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的指纹信息,得到对应所述样本多媒体信息的指纹信息的数据流消息。
  13. 根据权利要求9或10所述的装置,其中,所述处理模块包括:
    特征提取单元,配置为对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
    指纹生成单元,配置为基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的第一指纹信息和第二指纹信息。
  14. 一种多媒体信息检索装置,所述装置包括:多媒体信息输入组件、特征处理组件及匹配组件;所述多媒体信息输入组件、所述特征处理组件以及所述匹配组件支持基于用户输入的指令被调用,且建立所述多媒体信息输入组件、所述特征处理组件以及所述匹配组件之间的逻辑连接关系;其中,
    所述多媒体信息输入组件,配置为基于用户的输入接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;基于 所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
    所述特征处理组件,配置为基于与所述多媒体信息输入组件之间的逻辑连接关系获得所述样本多媒体信息的属性信息,基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息至少包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
    所述匹配组件,配置为基于与所述特征处理组件之间的逻辑连接关系获得包括所述第一指纹信息和所述第二指纹信息的指纹信息,将所述指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
    以及,当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联的多媒体信息。
  15. 根据权利要求14所述的装置,其中,所述特征处理组件包括:特征提取组件及指纹生成组件;其中,
    所述特征提取组件,配置为对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
    所述指纹生成组件,配置为基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息。
  16. 根据权利要求13或14所述的装置,其中,所述装置还包括输出组件,配置为输出与所述样本多媒体信息相关联的多媒体信息。
  17. 一种多媒体信息检索装置,所述装置包括:
    处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,
    所述处理器用于运行所述计算机程序时,执行权利要求1至8任一项所述的多媒体信息检索方法。
  18. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至8任一项所述的多媒体信息检索方法。
  19. 一种多媒体信息检索方法,所述方法由服务器执行,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;所述方法包括:
    接收到多媒体信息检索请求;所述多媒体信息检索请求携带样本多媒体信息检索标识;
    基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息;
    基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息;所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息;
    将所述样本多媒体信息的指纹信息与预设的指纹数据库中的指纹信息进行匹配,获得匹配结果;
    当所述匹配结果表征匹配成功时,获取与所述样本多媒体信息相关联的多媒体信息。
  20. 根据权利要求19所述的方法,其中,所述基于所述多媒体信息检索请求,获取所述样本多媒体信息的属性信息,包括:
    基于所述多媒体信息检索请求携带的样本多媒体信息检索标识,查询预设的多媒体信息库,得到所述样本多媒体信息的属性信息,并以数据流消息的形式呈现所述样本多媒体信息的属性信息。
  21. 根据权利要求20所述的方法,其中,所述以数据流消息的形式呈 现所述样本多媒体信息的属性信息,包括:
    基于预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的属性信息,得到对应所述样本多媒体信息的属性信息的数据流消息。
  22. 根据权利要求21所述的方法,其中,所述基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息,包括:
    根据所述预设的数据流传输协议,解封装所述对应所述样本多媒体信息的属性信息的数据流消息,得到存储所述样本多媒体信息的属性信息的键-值数据对;
    对得到的所述键-值数据对进行字符串转换,得到所述样本多媒体信息的属性信息;
    根据所述样本多媒体信息的属性信息生成所述样本多媒体信息的指纹信息;
    基于所述预设的数据流传输协议,以键-值数据对的形式封装所述样本多媒体信息的指纹信息,得到对应所述样本多媒体信息的指纹信息的数据流消息。
  23. 根据权利要求19或20所述的方法,其中,所述基于所述样本多媒体信息的属性信息确定所述样本多媒体信息的指纹信息,包括:
    对所述样本多媒体信息的属性信息进行特征提取,得到所述样本多媒体信息的特征信息;所述样本多媒体信息的特征信息包括:所述样本多媒体信息的文本特征和所述样本多媒体信息的视觉特征;
    基于所述样本多媒体信息的特征信息生成所述样本多媒体信息的指纹信息,其中,所述样本多媒体信息的指纹信息包括:用于表征所述样本多媒体信息的文本特征的第一指纹信息,以及用于表征所述样本多媒体信息的视觉特征的第二指纹信息。
PCT/CN2018/078759 2017-05-10 2018-03-12 多媒体信息检索方法、装置及存储介质 WO2018205736A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710326718.3 2017-05-10
CN201710326718.3A CN108287859B (zh) 2017-05-10 2017-05-10 一种多媒体信息检索方法及装置

Publications (1)

Publication Number Publication Date
WO2018205736A1 true WO2018205736A1 (zh) 2018-11-15

Family

ID=62831278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078759 WO2018205736A1 (zh) 2017-05-10 2018-03-12 多媒体信息检索方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN108287859B (zh)
WO (1) WO2018205736A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829061A (zh) * 2019-01-14 2019-05-31 北京雷石天地电子技术有限公司 一种多媒体信息查找方法及系统
CN110532404B (zh) * 2019-09-03 2023-08-04 北京百度网讯科技有限公司 一种源多媒体确定方法、装置、设备及存储介质
CN111639198A (zh) * 2020-06-03 2020-09-08 北京字节跳动网络技术有限公司 媒体文件识别方法、装置、可读介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593356A (zh) * 2012-08-16 2014-02-19 丁瑞彭 基于多媒体信息指纹技术的信息搜索方法、系统和应用
CN103947214A (zh) * 2011-11-28 2014-07-23 雅虎公司 背景相关的交互式电视
CN105138617A (zh) * 2015-08-07 2015-12-09 中国人民大学 一种音乐自动定位和注解系统及方法
CN105631247A (zh) * 2014-10-31 2016-06-01 腾讯科技(深圳)有限公司 一种多媒体版权管理方法及装置
US20170068671A1 (en) * 2008-06-18 2017-03-09 Gracenote, Inc. Media Fingerprinting and Identification System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170068671A1 (en) * 2008-06-18 2017-03-09 Gracenote, Inc. Media Fingerprinting and Identification System
CN103947214A (zh) * 2011-11-28 2014-07-23 雅虎公司 背景相关的交互式电视
CN103593356A (zh) * 2012-08-16 2014-02-19 丁瑞彭 基于多媒体信息指纹技术的信息搜索方法、系统和应用
CN105631247A (zh) * 2014-10-31 2016-06-01 腾讯科技(深圳)有限公司 一种多媒体版权管理方法及装置
CN105138617A (zh) * 2015-08-07 2015-12-09 中国人民大学 一种音乐自动定位和注解系统及方法

Also Published As

Publication number Publication date
CN108287859B (zh) 2021-03-12
CN108287859A (zh) 2018-07-17

Similar Documents

Publication Publication Date Title
CN103970793B (zh) 信息查询方法、客户端及服务器
WO2018205736A1 (zh) 多媒体信息检索方法、装置及存储介质
EP3707612B1 (en) Duplicative data detection
WO2020044096A1 (zh) 信息搜索方法、装置及设备/终端/服务器
WO2020044099A1 (zh) 一种基于对象识别的业务处理方法和装置
JP2019536122A (ja) 情報インタラクションのための方法および装置
WO2017114190A1 (zh) 一种文件上传处理方法及装置
CN113392236A (zh) 一种数据分类方法、计算机设备及可读存储介质
US11792157B1 (en) Detection of DNS beaconing through time-to-live and transmission analyses
EP4331188A1 (en) Automated recording highlights for conferences
US11570403B2 (en) Automated recording highlights for conferences
US20220353100A1 (en) Automated Recording Highlights For Conferences
CN116974948B (zh) 业务系统测试方法、系统、设备和介质
WO2021103594A1 (zh) 一种默契度检测方法、设备、服务器及可读存储介质
CN112883088B (zh) 一种数据处理方法、装置、设备及存储介质
US11863711B2 (en) Speaker segment analysis for conferences
CN108509059B (zh) 一种信息处理方法、电子设备和计算机存储介质
WO2019127652A1 (zh) 基于用户id和片尾内容的识别有害视频的方法及系统
Ou et al. Machine learning-based object recognition technology for bird identification system
JP2010198111A (ja) メタデータ抽出サーバ、メタデータ抽出方法およびプログラム
CN114510564A (zh) 视频知识图谱生成方法及装置
WO2019127653A1 (zh) 基于片尾内容的识别有害视频的方法及其系统
CN113114968A (zh) 一种视频处理方法、装置、设备及存储介质
WO2019127656A1 (zh) 基于用户ip和视频拷贝的识别有害视频的方法及系统
WO2019127651A1 (zh) 一种识别有害视频的方法及其系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18798032

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18798032

Country of ref document: EP

Kind code of ref document: A1