CN112905829A - Cross-modal artificial intelligence information processing system and retrieval method - Google Patents

Cross-modal artificial intelligence information processing system and retrieval method Download PDF

Info

Publication number
CN112905829A
CN112905829A CN202110320317.3A CN202110320317A CN112905829A CN 112905829 A CN112905829 A CN 112905829A CN 202110320317 A CN202110320317 A CN 202110320317A CN 112905829 A CN112905829 A CN 112905829A
Authority
CN
China
Prior art keywords
modality
information
module
data
artificial intelligence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110320317.3A
Other languages
Chinese (zh)
Inventor
王芳
连芷萱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110320317.3A priority Critical patent/CN112905829A/en
Publication of CN112905829A publication Critical patent/CN112905829A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A cross-modal artificial intelligence information processing system and a cross-modal information retrieval method are provided. The system comprises: a separation module configured to separate the first-modality information into a plurality of continuous pieces of first-modality information; the characteristic extraction module is configured to perform characteristic extraction on the content expressed by each piece of the first modality information to form an event map; an identification module configured to identify elements in the event map with second modality information to form second modality identification information; a second encoding module configured to encode the second modality identification information to form second modality information data; the association module is configured to associate the second modality information data with each frame of data in the first modality information fragment of the corresponding segment to generate an association identifier; a first insertion module configured to insert the association identifier into a first modality data frame; a second insertion module configured to associate the identified insertion into the second modality data frame.

Description

Cross-modal artificial intelligence information processing system and retrieval method
Technical Field
The invention relates to a cross-modal artificial intelligence information processing system and a retrieval method, and belongs to the technical field of artificial intelligence.
Background
In the prior art, text information can be searched for full text through keywords, and for audio/video information, it is impossible to search for information of interest in an audio time period and a video time period of a certain time length.
Disclosure of Invention
The invention aims to provide a cross-modal artificial intelligence information processing system and a retrieval method, which can quickly retrieve and reproduce cross-modal information.
To achieve the above object, the present invention provides a cross-modal artificial intelligence information processing system, comprising: a separation module configured to separate the first-modality information into a plurality of continuous pieces of first-modality information; the characteristic extraction module is configured to perform characteristic extraction on the content expressed by each piece of first modality information fragment to form an event map representing events and the relationship thereof in the content expressed by each piece of first modality data fragment; an identification module configured to identify elements in the event map with second modality information to form second modality identification information; a second encoding module configured to encode the second modality identification information to form second modality information data; the association module is configured to associate the second modality information data with each frame of data in the first modality information fragment of the corresponding segment to generate an association identifier; the first insertion module is configured to insert the association identifier into the first modality data frame and then store the association identifier in the first modality information database; a second insertion module configured to associate the identified insertion into the second-modality data frame and then store in the second-modality information database.
Preferably, the first modality information includes voice and/or video; the second modality information includes text.
Preferably, the feature extraction module comprises an event map establishing module configured to establish an event map according to contents expressed by the first modality information source and an accumulation module configured to accumulate durations of consecutive identical event maps; the separation module is further configured to separate the first-modality information according to the duration to obtain a plurality of continuous first-modality information fragments.
Preferably, the cross-modal artificial intelligence information processing system further comprises a first encoding module, and the first encoding module is configured to encode the separated first-modal information segment to generate first-modal information data.
Preferably, the first modality information includes video data; the second modality information includes text.
Preferably, the feature extraction module comprises a conversion module, an artificial intelligence module, an event map building module and an accumulation module, wherein the conversion module converts the first modal information data into a two-dimensional image; the artificial intelligence module is configured to identify characteristic values of each frame of two-dimensional image, wherein the characteristic values comprise foreground image characteristic values and background image characteristic values; the event map establishing module is configured to establish an event map according to the relation between the primitives represented by the foreground image characteristic value and the primitives represented by the background image characteristic value of each frame of image; the accumulation module is configured to accumulate durations of consecutive identical event maps; the allocation module is further configured to divide the first-modality information into a plurality of continuous first-modality information segments according to the time length.
In order to achieve the above object, the present invention further provides a method for cross-modal information retrieval using the above system, comprising the steps of: searching corresponding second modal data in a second modal information database according to the input second modal information; extracting an association head of the second modality data; and retrieving the first-mode information data frame from the first-mode information database according to the associated head, and reproducing the first-mode information by using the first-mode information data frame.
Compared with the prior art, the invention aims to provide a cross-modal artificial intelligence information processing system and a retrieval method, which can quickly perform cross-modal information retrieval.
Drawings
FIG. 1 is a block diagram of a cross-modal artificial intelligence information handling system provided in a first embodiment of the present invention;
FIG. 2 is a schematic diagram showing the separation of information in a first modality into a plurality of information fragments;
FIG. 3 is a block diagram of a first encoding module in a cross-modal artificial intelligence information handling system, according to an embodiment of the present invention;
FIG. 4 is a block diagram of an inter-frame prediction processing module according to an embodiment of the present invention;
FIG. 5 is a block diagram of a cross-modal artificial intelligence information handling system provided by a second embodiment of the present invention;
FIG. 6 is a flowchart of a cross-modal information retrieval method provided by the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
First embodiment
Fig. 1 is a block diagram of a cross-modality artificial intelligence information processing system according to a first embodiment of the present invention, and as shown in fig. 1, the cross-modality artificial intelligence information processing system according to the first embodiment includes: a first modality information source 510, which is, for example, an audio information source acquired by an acoustic-electric converter or an image information source acquired by a photoelectric converter; a separating module 520 configured to separate the first-modality information into a plurality of consecutive pieces of first-modality information; the characteristic extraction module is configured to perform characteristic extraction on the content expressed by each section of the first modality information fragment to form an event map representing events and relations thereof in the content expressed by each section of the first modality data fragment, wherein the event map is organized in a tree structure, and each node in the tree structure is called as an element; an identifying module 580 configured to identify elements in the event graph with second modality information to form second modality identification information; a second encoding module 590 configured to encode the second modality identification information to form second modality information data, that is, encode the second modality information by using a character string, where the character string includes a binary character string; an association module 570 configured to associate the second modality information data with the first modality information pieces of the respective segments to generate association identifications (or association pointers); a first inserting module 540, configured to insert the association identifier into each frame of data of the first-modality data information data fragment, and then store the association identifier in the first-modality information database or send the association identifier to a channel encoder, and send the association identifier to the communication unit after channel encoding; a second insertion module 600 configured to insert the associated identifier into the second-modality data frame and then store it in the second-modality information database or send it to the channel encoder, channel-encode it and then send it to the communication unit.
In a first embodiment, the first modality information includes voice and/or video, and the voice includes multi-language voice, dialect and the like; the second modality information includes text, the text including a plurality of language words.
In the first embodiment, each frame of data in the first modality information data has the following format:
first mode information data head First modality information data
Each frame of data in the second modality information data has the following format:
second mode information data head Second modality information data
The first-modality information data inserted into the association header has the following format:
correlation head First mode information data head First modality information data
The second modality information data inserted into the association header has the following format:
correlation head Second mode information data head Second modality information data
In the first embodiment, the feature extraction module comprises an event map establishing module 550 and an accumulating module 560, the event map establishing module 550 is configured to establish an event map according to the content expressed by the first-modality information sources, and the accumulating module 560 is configured to accumulate the durations of consecutive same event maps, namely, the first-modality information sources represent the same event time period; the assigning module 520 is further configured to separate consecutive pieces of first-modality information having a certain duration according to the duration to obtain consecutive pieces of first-modality information. As shown in fig. 2, the video information with a set duration T expresses four events, event 1, event 2, event 3 and event 4, and the partitioning module divides the video into four segments with durations T1, T2, T3 and T4, respectively. Preferably each event 1 can be further subdivided according to the different contents of the expression.
In the first embodiment, the cross-modal artificial intelligence information processing system further includes a first encoding module 530, which is configured to encode the separated first-modal information segment to generate first-modal information data. In the present invention, when the first mode information is video information, the first encoding module adopts the structural form shown in fig. 3 to 4.
FIG. 3 is a block diagram of a first encoding module according to the present invention. As shown in fig. 3, in the first encoding block, the prediction residual signal generation block 103 obtains a difference between the input video signal and the prediction signal that is the output of the inter prediction processing block 102, and outputs the difference as a prediction residual signal. The conversion module 104 performs orthogonal transformation such as discrete cosine transformation on the prediction residual signal, quantizes the transform coefficient, and outputs the quantized transform coefficient. The entropy coding module 105 entropy codes the quantized transform coefficient and outputs it as a coded stream. On the other hand, the quantized transform coefficients are also input to the inverse transform module 106, where inverse quantization and inverse orthogonal transform are performed to output prediction residual signals. The decoded video signal generation module 107 adds the prediction residual signal to the prediction signal output from the inter-prediction processing module 102, and generates a decoded video signal of the block to be encoded after encoding. The decoded video signal is output to the loop filter processing block 108 so as to be used as a reference image in the inter prediction processing block 102. The loop filter processing block 108 performs filtering processing for reducing coding distortion, and outputs the image after the filtering processing to the inter-frame prediction processing block 102 as a decoded video signal.
Fig. 3 is a block diagram of the inter prediction processing module 102 according to the present invention, and as shown in fig. 3, the inter prediction processing module 102 includes a reduced image generating unit 291, a pre-search processing unit 292, a first mode decision unit 293, an integer pixel search processing unit 294, a fractional image generating unit 295, a fractional pixel search processing unit 296, and a second mode decision unit 297. The reduced image generation unit 291 receives the current frame image signal and the previous frame image signal as input, performs reduction processing using, for example, a convolutional neural network CNN, and outputs the signals. The pre-search processing unit 292 inputs the reduced current frame image signal and previous frame image signal, performs motion search processing on the reduced current frame image signal, and transfers the searched motion vector to the integer pixel search processing unit 294. First mode decision section 293 receives encoding mode information from pre-search processing section 292 as input. The integer pixel search processing unit 294 performs search processing of integer pixels according to the motion vector and the encoding mode. The decimal image generating unit 207 generates a decimal pixel interpolation image of the corresponding previous frame image position, and outputs to the decimal pixel search processing unit 296; the second mode decision unit 297 receives encoding mode information from the integer pixel search processing unit 203 and inputs to the fractional pixel search processing unit 296; the fractional pixel search processing unit 296 performs search processing of fractional pixels by the motion vector and the encoding mode respectively specified by the integer pixel search processing unit 294 and the second mode decision unit 297. The fractional pixel search processing unit 296 searches for a prediction residual image and motion vector information, and extracts a feature value from the prediction residual image and the motion vector information. The first embodiment of the present invention can improve the coding efficiency by the above scheme.
Second embodiment
FIG. 5 is a block diagram of a cross-modal artificial intelligence information processing system according to a second embodiment of the present invention, and as shown in FIG. 5, the cross-modal artificial intelligence information processing system according to the second embodiment includes: a first modality data source 310 configured to acquire first modality information data from a plurality of information sources, such as audio data and/or video data acquired through a channel decoder, audio data and/or video data acquired through a network, the first modality information data having a plurality of time-series data frames, the first modality information data being displayable via a display component of a development process expressing one or more events; a separating module 320 configured to separate the first modality information data into a plurality of consecutive segments of the first modality information data, each segment of the first modality information data having a plurality of time series data frames; the characteristic extraction module is configured to perform characteristic extraction on the content expressed by each piece of first modality information data to form an event map representing the events and the relationship of the events for reproducing each piece of first modality information data; an identification module 370 configured to identify the elements in the event graph with second modality information to form second modality identification information; a second encoding module 390 configured to encode second modality identification information to form second modality information data; an association module 380 configured to associate the second modality information data with each frame of data in the first modality information data segment of the corresponding segment to generate an association identifier; the first inserting module 340 is configured to insert the associated identifier into the first modality information data frame, and then store the associated identifier in the first modality information database or send the associated identifier to the channel encoder, and then send the associated identifier to the communication unit after channel encoding; a second insertion module 400 configured to insert the associated identifier into the second-modality data frame and then store it in the second-modality information database or send it to the channel encoder, channel-encode it and then send it to the communication unit.
In a first embodiment, the first modality information includes voice data and/or video data; the second modality information includes text.
In the first embodiment, the feature extraction module includes a conversion module 330, an artificial intelligence module 340, an event map creation module 350 and an accumulation module 370, wherein the conversion module 330 converts the first modality information data into a two-dimensional image in a time series; the artificial intelligence module is configured to identify deep image characteristic values of each frame of two-dimensional image, wherein the deep image characteristic values comprise background image characteristic values and a plurality of foreground image characteristic values; an event map establishing module 350 configured to establish an event map according to a relationship between primitives represented by a plurality of foreground image feature values of each frame of image and a relationship between the primitives represented by the background image feature values; the accumulation module 360 is configured to accumulate durations of consecutive identical event maps; the separating module 320 may further be configured to separate the first-modality information according to time duration to obtain a plurality of continuous first-modality information fragments
In the second embodiment, each frame of data in the second modality information data has the following format:
first mode information data head First modality information data
Each frame of data in the second modality information data has the following format:
second mode information data head Second modality information data
The first-modality information data inserted into the association header has the following format:
correlation head First mode information data head First modality information data
The second modality information data inserted into the association header has the following format:
correlation head Second mode information data head Second modality information data
In a second embodiment, the artificial intelligence module includes a Convolutional Neural Network (CNN) configured to classify an input image into background image feature values and foreground image feature values, and to classify the foreground image feature values into a plurality of foreground primitive feature values. The convolutional neural network is applied to an image recognition technique for recognizing a predetermined shape or pattern from image data as input data, and has an intermediate layer and a full-key layer. The intermediate layer is formed by hierarchically connecting a plurality of feature extraction processing layers. The intermediate layer includes a convolutional layer and a pooling layer.
Fig. 6 is a flowchart of the artificial intelligence cross-modal information retrieval method provided by the present invention, and as shown in fig. 6, the method for performing cross-modal information retrieval by using the system provided by the present invention includes the following steps: searching corresponding second modal information data in a second modal information database according to second modal information (such as text keywords) input by a user; extracting an association head of the second modal information data; first modality information data (such as video data stream and audio data stream) is retrieved from a first modality information database according to the associated head, and the first modality information is reproduced by using the retrieved first modality data, such as images are reproduced by a display device, and sounds are reproduced by using a loudspeaker.
When the technical scheme provided by the invention is used for searching the text keywords, the associated audio/video data segments can be quickly found according to the event map and reproduced by utilizing the audio/video data segments, and the audio/video data in the whole process is not required to be converted into audio and/or video, so that the cross-modal information search can be realized, and the search efficiency is improved; meanwhile, the user can watch the video hoped to be concerned and/or listen to the audio clip hoped to be listened to, and does not need to concern the part not wanted to be concerned, so that the time utilization rate of the user is improved.
The present invention can be realized by a computer that implements the embodiments of the embodiments described above, or by recording a program for implementing the embodiments in a computer-readable recording medium and causing a computer system to read and execute the program recorded in the recording medium. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. The "computer-readable recording medium" refers to a storage device such as a flexible disk, a magneto-optical disk, a removable medium such as a ROM or a CD-ROM, or a hard disk incorporated in a computer system.
Further, the "computer-readable recording medium" may include a medium that dynamically holds the program for a short time, for example, a communication line that transmits the program through a network such as the internet or a communication line such as a telephone line, or may include a medium that holds the program for a predetermined time, for example, a volatile memory in a computer system serving as a server or a client in this case. The program may be a program for realizing a part of the above-described functions, a program for realizing the above-described functions by combining with a program already recorded in a computer system, or a program realized by using hardware such as PLD or FPGA.
The above embodiments are only used for illustrating the present invention, and the structure, the arrangement position, the connection mode, and the like of each component can be changed, and all equivalent changes and improvements based on the technical scheme of the present invention should not be excluded from the protection scope of the present invention.

Claims (7)

1. A cross-modal artificial intelligence information processing system, comprising: a separation module configured to separate the first-modality information into a plurality of continuous pieces of first-modality information; the characteristic extraction module is configured to perform characteristic extraction on the content expressed by each piece of first modality information fragment to form an event map representing events and the relationship thereof in the content expressed by each piece of first modality data fragment; an identification module configured to identify elements in the event map with second modality information to form second modality identification information; a second encoding module configured to encode the second modality identification information to form second modality information data; the association module is configured to associate the second modality information data with each frame of data in the first modality information fragment of the corresponding segment to generate an association identifier; the first insertion module is configured to insert the association identifier into the first modality data frame and then store the association identifier in the first modality information database; a second insertion module configured to associate the identified insertion into the second-modality data frame and then store in the second-modality information database.
2. The cross-modality artificial intelligence information handling system of claim 1 wherein the first modality information includes voice and/or video; the second modality information includes text.
3. The cross-modality artificial intelligence information handling system of claim 2 wherein the feature extraction module includes an event map creation module configured to create an event map from content expressed by the first modality information source and an accumulation module configured to accumulate durations of consecutive identical event maps; the separation module is further configured to separate the first-modality information according to the duration to obtain a plurality of continuous first-modality information fragments.
4. A cross-modality artificial intelligence information handling system according to claim 3, further comprising a first encoding module for encoding the separated first-modality information fragments to generate first-modality information data.
5. The cross-modality artificial intelligence information handling system of claim 1 wherein the first modality information includes video data; the second modality information includes text.
6. The cross-modality artificial intelligence information processing system of claim 5, wherein the feature extraction module comprises a conversion module, an artificial intelligence module, an event map creation module, and an accumulation module, wherein the conversion module converts the first modality information data into a two-dimensional image; the artificial intelligence module is configured to identify characteristic values of each frame of two-dimensional image, wherein the characteristic values comprise foreground image characteristic values and background image characteristic values; the event map establishing module is configured to establish an event map according to the relation between the primitives represented by the foreground image characteristic value and the primitives represented by the background image characteristic value of each frame of image; the accumulation module is configured to accumulate durations of consecutive identical event maps; the allocation module is further configured to divide the first-modality information into a plurality of continuous first-modality information segments according to the time length.
7. A method for cross-modal information retrieval using the system of any of claims 1-7, comprising the steps of:
searching corresponding second modal data in a second modal information database according to the input second modal information; extracting an association head of the second modality data; and retrieving the first-mode information data frame from the first-mode information database according to the associated head, and reproducing the first-mode information by using the first-mode information data frame.
CN202110320317.3A 2021-03-25 2021-03-25 Cross-modal artificial intelligence information processing system and retrieval method Pending CN112905829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110320317.3A CN112905829A (en) 2021-03-25 2021-03-25 Cross-modal artificial intelligence information processing system and retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110320317.3A CN112905829A (en) 2021-03-25 2021-03-25 Cross-modal artificial intelligence information processing system and retrieval method

Publications (1)

Publication Number Publication Date
CN112905829A true CN112905829A (en) 2021-06-04

Family

ID=76106449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110320317.3A Pending CN112905829A (en) 2021-03-25 2021-03-25 Cross-modal artificial intelligence information processing system and retrieval method

Country Status (1)

Country Link
CN (1) CN112905829A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099195A1 (en) * 2009-10-22 2011-04-28 Chintamani Patwardhan Method and Apparatus for Video Search and Delivery
US20140328570A1 (en) * 2013-01-09 2014-11-06 Sri International Identifying, describing, and sharing salient events in images and videos
CN105430536A (en) * 2015-10-30 2016-03-23 北京奇艺世纪科技有限公司 Method and device for video push
CN108459785A (en) * 2018-01-17 2018-08-28 中国科学院软件研究所 A kind of video multi-scale visualization method and exchange method
CN109101558A (en) * 2018-07-12 2018-12-28 北京猫眼文化传媒有限公司 A kind of video retrieval method and device
WO2019176398A1 (en) * 2018-03-16 2019-09-19 ソニー株式会社 Information processing device, information processing method, and program
CA3068692A1 (en) * 2019-01-18 2020-07-18 James Carey Investigation generation in an observation and surveillance system
WO2020155423A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and apparatus, and storage medium
CN111680173A (en) * 2020-05-31 2020-09-18 西南电子技术研究所(中国电子科技集团公司第十研究所) CMR model for uniformly retrieving cross-media information
CN112001265A (en) * 2020-07-29 2020-11-27 北京百度网讯科技有限公司 Video event identification method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099195A1 (en) * 2009-10-22 2011-04-28 Chintamani Patwardhan Method and Apparatus for Video Search and Delivery
US20140328570A1 (en) * 2013-01-09 2014-11-06 Sri International Identifying, describing, and sharing salient events in images and videos
CN105430536A (en) * 2015-10-30 2016-03-23 北京奇艺世纪科技有限公司 Method and device for video push
CN108459785A (en) * 2018-01-17 2018-08-28 中国科学院软件研究所 A kind of video multi-scale visualization method and exchange method
WO2019176398A1 (en) * 2018-03-16 2019-09-19 ソニー株式会社 Information processing device, information processing method, and program
CN109101558A (en) * 2018-07-12 2018-12-28 北京猫眼文化传媒有限公司 A kind of video retrieval method and device
CA3068692A1 (en) * 2019-01-18 2020-07-18 James Carey Investigation generation in an observation and surveillance system
WO2020155423A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and apparatus, and storage medium
CN111680173A (en) * 2020-05-31 2020-09-18 西南电子技术研究所(中国电子科技集团公司第十研究所) CMR model for uniformly retrieving cross-media information
CN112001265A (en) * 2020-07-29 2020-11-27 北京百度网讯科技有限公司 Video event identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111488489B (en) Video file classification method, device, medium and electronic equipment
JP2002541738A (en) Image compression
CN101539929A (en) Method for indexing TV news by utilizing computer system
WO2022188644A1 (en) Word weight generation method and apparatus, and device and medium
US9031852B2 (en) Data compression apparatus, computer-readable storage medium having stored therein data compression program, data compression system, data compression method, data decompression apparatus, data compression/decompression apparatus, and data structure of compressed data
CN113327603A (en) Speech recognition method, speech recognition device, electronic equipment and computer-readable storage medium
KR20120090101A (en) Digital video fast matching system using key-frame index method
CN116233445A (en) Video encoding and decoding processing method and device, computer equipment and storage medium
CN114625918A (en) Video recommendation method, device, equipment, storage medium and program product
CN113409803B (en) Voice signal processing method, device, storage medium and equipment
CN112905829A (en) Cross-modal artificial intelligence information processing system and retrieval method
CN114333896A (en) Voice separation method, electronic device, chip and computer readable storage medium
US20220417540A1 (en) Encoding Device and Method for Utility-Driven Video Compression
CN115604475A (en) Multi-mode information source joint coding method
WO2005046213A1 (en) Document image encoding/decoding
CN102047662A (en) Encoder
CA2392644C (en) Coding and decoding apparatus of key data for graphic animation and method thereof
CN114827663A (en) Distributed live broadcast frame insertion system and method
CN115019137A (en) Method and device for predicting multi-scale double-flow attention video language event
KR100348901B1 (en) Segmentation of acoustic scences in audio/video materials
CN105912615A (en) Human voice content index based audio and video file management method
JP4964114B2 (en) Encoding device, decoding device, encoding method, decoding method, encoding program, decoding program, and recording medium
CN113345446B (en) Audio processing method, device, electronic equipment and computer readable storage medium
JP4053251B2 (en) Image search system and image storage method
KR20090012927A (en) Method and apparatus for generating multimedia data with decoding level, and method and apparatus for reconstructing multimedia data with decoding level

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination