CN112215908A - Compressed domain-oriented video content comparison system, optimization method and comparison method - Google Patents

Compressed domain-oriented video content comparison system, optimization method and comparison method Download PDF

Info

Publication number
CN112215908A
CN112215908A CN202011086137.5A CN202011086137A CN112215908A CN 112215908 A CN112215908 A CN 112215908A CN 202011086137 A CN202011086137 A CN 202011086137A CN 112215908 A CN112215908 A CN 112215908A
Authority
CN
China
Prior art keywords
module
video
compressed domain
video content
comparison
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011086137.5A
Other languages
Chinese (zh)
Other versions
CN112215908B (en
Inventor
李扬曦
缪亚男
袁庆升
胡卫明
李兵
刘雨帆
胡赛军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
National Computer Network and Information Security Management Center
Original Assignee
Institute of Automation of Chinese Academy of Science
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, National Computer Network and Information Security Management Center filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202011086137.5A priority Critical patent/CN112215908B/en
Publication of CN112215908A publication Critical patent/CN112215908A/en
Application granted granted Critical
Publication of CN112215908B publication Critical patent/CN112215908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/008Vector quantisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention belongs to the field of computer vision, and particularly relates to a video content comparison system, an optimization method and a comparison method for a compression domain, aiming at solving the problem of low efficiency of completing the comparison of video content by using full decoding information. The comparison system of the invention comprises: the characteristic learning module is used for respectively acquiring characteristic graphs of multiple modes based on multiple kinds of compressed domain information of the input video; the multi-modal compressed domain information fusion module is used for carrying out information fusion on the characteristic graphs of the multiple modalities output by the characteristic learning module to obtain a fusion characteristic vector of the input video; a second module configured to obtain an L1 distance of a fused feature vector of two input videos; the classifier is a two-classification network and is configured to perform two classifications of the comparison result based on the L1 distance output by the second module. The invention can effectively extract the high-level semantic information of the video content and ensure the high speed and the high performance of the comparison of the video content.

Description

Compressed domain-oriented video content comparison system, optimization method and comparison method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a video content comparison system, an optimization method and a comparison method for a compressed domain.
Background
In content-based video understanding systems, a large amount of video typically needs to be processed. At present, the internet video flow rate is more than 99% of the internet video flow rate in the encoding standards such as H264, H265 and the like, the volume of the encoded video is greatly reduced by tens of to hundreds of times, but the image information in the video is also converted into indirect information and can be restored into image frame information forming the video only by decoding. Most of the existing algorithms or systems for video recognition, comparison, retrieval, etc. need to decode the video into image frames, and then process and analyze the image frames in the image sequence. However, video decoding is very computation-resource-consuming and time-consuming, which will certainly greatly affect the practicability and flexibility of various application systems, especially for some video retrieval, comparison application systems and occasions requiring real-time processing.
Therefore, research is directed to a compression domain, and a scheme for understanding, comparing and identifying video content under a partial decoding condition is an urgent problem to be solved. Different from the conventional video processing method, the video comparison method facing the compression domain needs to be directly carried out on compressed data which is not decoded or is decoded as little as possible, and additional links of decompression and recompression are omitted, so that the overall processing time of the system is greatly reduced. The video content comparison task is taken as a representative task, how to play the characteristics of high efficiency and roughness of video compression domain information, and a proper network structure is designed, so that the task of efficiently finishing video content understanding is the technical problem to be solved.
Disclosure of Invention
In order to solve the above-mentioned problem in the prior art, that is, to solve the problem that the efficiency of comparing video contents using full video decoding information is not high, a first aspect of the present invention provides a video content comparison system for a compressed domain, which includes a first module, a second module, and a classifier, which are connected in sequence;
the first module comprises a feature learning module and a multi-modal compressed domain information fusion module; the feature learning module is configured to obtain feature maps of multiple modalities based on multiple kinds of compressed domain information of the input video; the multi-modal compressed domain information fusion module is configured to perform information fusion on the multi-modal feature maps output by the feature learning module to obtain a fusion feature vector of the input video;
the second module is configured to obtain an L1 distance of a fusion feature vector of two input videos;
the classifier is a two-classification network and is configured to perform two classifications of the comparison result based on the L1 distance output by the second module.
In some preferred embodiments, the feature learning module is constructed based on a weight-shared twin convolutional neural network.
In some preferred embodiments, the method for the second module to obtain the L1 distance is as follows:
and performing element-based difference on the fusion feature vectors of the two input videos to obtain a corresponding L1 distance.
In a second aspect of the present invention, a method for optimizing a compressed domain-oriented video content comparison system is provided, where the method is used for optimizing the compressed domain-oriented video content comparison system, and includes:
training the first module based on a preset training sample to obtain an optimized first module;
constructing a new comparison system based on the optimized first module, the optimized second module and the classifier;
and fixing the parameters of the optimized first module based on a preset training sample, and training a classifier in the new comparison system to obtain the optimized comparison system.
In some preferred embodiments, the "training of the first module" is performed using a loss function L of
Figure BDA0002720407000000031
Wherein N is the number of samples, DnThe Euclidean distance of the fusion feature vectors of the two videos in the nth sample pair is obtained, Y is a label for judging whether the two samples are matched, and m is a preset threshold value.
In some preferred embodiments, the loss function used in training the classifier in the new comparison system is the cross-entropy loss of the classification.
In some preferred embodiments, the training sample is obtained by:
based on an offline video database, video clipping is carried out on copied video segments existing in different videos according to a label file, similar video segment pairs clipped according to the label file are used as positive samples, 1 video is randomly selected from other remaining video segments, and pairs formed by the videos and the original videos are used as negative samples.
The third aspect of the present invention provides a video content comparison method for a compressed domain, where the comparison method includes:
acquiring a video pair to be compared;
respectively carrying out partial decoding on two videos in the video pair to be compared, and extracting video compression domain information;
obtaining a comparison result through an optimized comparison system;
wherein the content of the first and second substances,
the obtaining method of the optimized comparison system comprises the following steps: and optimizing the video content comparison system facing the compressed domain based on the optimization method of the video content comparison system facing the compressed domain.
In a fourth aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, and the programs are adapted to be loaded and executed by a processor to implement the optimization method of the compressed domain-oriented video content comparison system or the compressed domain-oriented video content comparison method.
In a fifth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the optimization method of the video content comparison system facing the compressed domain or the video content comparison method facing the compressed domain.
The invention has the beneficial effects that:
1. according to the invention, the compressed domain information of the video is fully used, the deep twin neural network is designed, the high-level semantic information of the video content can be effectively extracted, and the high speed and the high performance of the comparison of the video content are ensured. By using the compressed domain information instead of the information of the video full decoding, the calculation amount of the video content understanding task is greatly reduced.
2. The invention designs a multi-modal fusion mode of compressed domain information, so that the information of different modes of a compressed domain is effectively fused, and the representation characteristic of high-level video semantics combined with video spatio-temporal information is constructed. The depth twin neural network effectively uses various coarse compressed domain information, and the video content comparison precision is improved.
3. According to the method, the characteristic of contrast loss in the deep twin neural network is used, namely the characteristic that the characteristic distance of a positive sample is as small as possible, and the characteristic distance of a negative sample is as large as possible, so that the effect similar to an SVM (support vector machine) large-distance classifier is learned by the network, the learned video characteristics are more decisive, and the network performance is more robust.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a block diagram of a compressed domain-oriented video content alignment system according to an embodiment of the present invention;
FIG. 2 is an algorithmic framework schematic of a deep twin neural network;
fig. 3 is a flowchart illustrating an optimization method of a compressed domain-oriented video content comparison system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention relates to a video content comparison system facing a compressed domain, which comprises a first module, a second module and a classifier which are connected in sequence as shown in figure 1;
the first module comprises a feature learning module and a multi-modal compressed domain information fusion module; the feature learning module is configured to obtain feature maps of multiple modalities based on multiple kinds of compressed domain information of the input video; the multi-modal compressed domain information fusion module is configured to perform information fusion on the multi-modal feature maps output by the feature learning module to obtain a fusion feature vector of the input video;
the second module is configured to obtain an L1 distance of a fusion feature vector of two input videos;
the classifier is a two-classification network and is configured to perform two classifications of the comparison result based on the L1 distance output by the second module.
For the purpose of more clearly illustrating the present invention, reference is now made to the following detailed description taken in conjunction with the accompanying drawings.
The video content comparison system facing the compressed domain comprises a first module, a second module and a classifier which are connected in sequence.
The embodiment further includes a video compression domain information extraction module, and before comparing the video contents, the video compression domain information of the video pair to be compared needs to be extracted respectively. And partially decoding the video, and extracting video compression domain information, including an I frame, a motion vector and a residual error.
The video compression domain information extraction module in this embodiment uses a core video codec frame of FFmpeg, and adopts a codec mode of an H264 code stream, for example, when decoding an I frame, steps of entropy decoding, inverse quantization, inverse transformation, and the like are provided for the code stream. For a decoded stream with motion vectors present in macroblock prediction, before entropy decoding is performed, it is necessary to first determine a prediction Mode or motion vector MV of a macroblock and a block coding Mode CBP, and then perform entropy decoding on luminance and chrominance, respectively. The source code based on the FFmpeg is designed into a c + + code, unnecessary decoding information and processes are skipped while decoding of a key decoding process, and efficient extraction of compressed domain information is completed. In addition, because the training of the whole network in the embodiment adopts end-to-end training, the mixed compilation of c + + and python needs to be completed in engineering, so that compressed domain information extracted from FFmemg by using c + + can be directly subjected to data exchange in the training using a PyTorch framework.
1. First module
The first module comprises a feature learning module and a multi-modal compressed domain information fusion module, forms a video similarity discrimination network based on a depth twin network, and is used for obtaining the video expressive feature vector from the video compressed domain information of the input video pair.
And the characteristic learning module is configured to respectively acquire characteristic graphs of multiple modes based on multiple kinds of compressed domain information of the input video. As shown in fig. 2, the structure of the module is a twin convolutional neural network structure based on weight sharing, the module takes compressed domain information of a pair of videos as input, respectively learns the compressed domain information by taking a multi-stream convolutional neural network as a branch of the twin network for different compressed domain information, such as I frames and motion vectors, and specifically uses a resnet34 skeleton for the I frames, and uses a resnet18 skeleton for the motion vectors, and uses a feature map output by layer4 in the resnet structure as output of the learning module. The feature learning network structure of the design includes, but is not limited to, the above structure configuration.
And the multi-modal compressed domain information fusion module is configured to perform information fusion on the multi-modal feature maps output by the feature learning module to obtain a fusion feature vector of the input video. The module inputs feature maps from compressed domain information learning network outputs from different modalities. The fusion of the information features of the multi-modal compressed domain is firstly carried out by stacking feature maps, then the weights of different modal information are learned under the condition that the number of channels is kept unchanged by a convolution layer with the core size of 1, the 1x1 convolution layer uses a kaiming initialization mode, the learning rate is set to be twice of the initial learning rate of a network during training, the faster convergence effect is achieved, and the multi-modal information is effectively fused. Assuming that two compressed domain information of I-frame and motion vector are used, the feature learning module outputs graph feature 1(FeatureMap1), graph feature 2(FeatureMap2) with sizes of (N, C, T, W, H) and (N, C, T, W, H), respectively; then, the obtained product is subjected to on-channel splicing to obtain graph characteristics 3(FeatureMap3), (N, C2, T, W, H); and then, performing weighted learning on the channel again through convolution of conv1x1 to obtain final graph features (FeatureMap) and (N, C, T, W and H), and then obtaining a fused unique video-level feature vector through a Flatten operation (leveling operation), thereby completing fusion of multi-mode information.
2. Second module
A second module configured to obtain an L1 distance of a fused feature vector of two input videos.
The method for the second module to obtain the L1 distance comprises the following steps: and performing element-based difference on the fusion feature vectors of the two input videos to obtain a corresponding L1 distance.
3. Classifier
The classifier is a two-classification network and is configured to perform two classifications of the comparison result based on the L1 distance output by the second module. The two-classification network in this embodiment is a full connection layer, and the number of neurons output by the two-classification network is similar or dissimilar, so that whether a video is a copy video can be determined.
A second embodiment of the present invention provides an optimization method for a compressed domain-oriented video content comparison system, which is used for optimizing the compressed domain-oriented video content comparison system.
Training samples need to be constructed before optimization, and the embodiment adopts an off-line sampling method, so that a large number of positive and negative sample pairs required by training can be effectively obtained. The method comprises the steps of carrying out off-line sampling on a public data set VCDB, carrying out video cutting on copied video segments existing in different videos according to a marking file, taking a similar video segment pair cut according to the marking file as a positive sample, randomly selecting 1 video from other residual video segments, and taking a pair formed by the video and an original video as a negative sample. And repeating the above method to complete the construction of the data set.
The optimization method of the embodiment, as shown in fig. 3, includes the following steps:
and S100, training the first module based on a preset training sample to obtain an optimized first module.
The obtained training samples are fed into a feature learning module of a first module in batches, and feature graphs of different compressed domain information are obtained according to forward propagation; and then, the feature map is sent to a multi-modal compressed domain information fusion module of the first module to obtain a unique feature vector of the video, and a back propagation training network is carried out by using the contrast loss. Wherein the contrast loss is defined as follows:
Figure BDA0002720407000000081
wherein the content of the first and second substances,
Figure BDA0002720407000000082
fusion feature vector X representing two videos in nth sample pair1And X2P represents the feature dimension of the fused feature vector, Y is a label of whether two samples match, Y ═ 1 represents that two samples are similar or matched, Y ═ 0 represents that two samples do not match, m is a set threshold, N is the number of samples, W is the length of the fused feature vector output by the first module, and 512 is selected here.
And S200, constructing a new comparison system based on the optimized first module, the optimized second module and the classifier.
And (5) fixing the parameters of the first module based on the optimized parameters obtained in the step (S100), and constructing and obtaining a new comparison system together with the second module and the classifier.
And S300, fixing the parameters of the optimized first module based on a preset training sample, and training a classifier in the new comparison system to obtain the optimized comparison system.
During the training process of this step, the entire network is propagated backwards using the cross-entropy loss of the classification. Setting the training end condition as iteration times and/or preset convergence, repeating the forward propagation and the backward propagation in the process, setting the iteration times, training the network and the Hash function until the network converges, and stopping training.
It should be noted that the training process of the present embodiment adopts a staged training method, and adopts two loss functions, including the contrast loss of step S100 and the cross-entropy loss of step S300.
The embodiment may also be systematized, for example, by optimizing an optimization system of a compressed domain-oriented video content comparison system, where the optimization system includes: the system comprises a first training module, an intermediate system building module and a second training module.
The first training module is configured to perform training of the first module based on a preset training sample to obtain an optimized first module;
the intermediate system construction module is configured to construct a new comparison system based on the optimized first module, the optimized second module and the classifier;
and the second training module is configured to fix the parameters of the optimized first module based on a preset training sample, and train the classifier in the new comparison system to obtain the optimized comparison system.
As shown in fig. 2, after the two video frames are respectively subjected to feature extraction by the feature learning module and multi-modal information fusion by the multi-modal compressed domain information fusion module, the fused feature vectors are compared and compared with each other to obtain a comparison and comparison difference result of the two fused feature vectors, and then the comparison and comparison difference result is classified by the full connection layer to obtain a determination result. When the first module is trained, calculating the contrast loss by using the feature vector after multi-modal information fusion, and then performing reverse optimization; and when the classifier is optimized, keeping the first module parameter unchanged, and performing reverse optimization through cross entropy loss based on the classification result of the training sample.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and the related description of the optimization method described above may refer to the corresponding description in the foregoing system embodiment, and the specific working process and the related description of the optimization system described above may refer to the corresponding description in the foregoing optimization method embodiment, which are not repeated herein.
A video content comparison method for a compressed domain according to a third embodiment of the present invention includes:
acquiring a video pair to be compared;
respectively carrying out partial decoding on two videos in the video pair to be compared, and extracting video compression domain information;
obtaining a comparison result through an optimized comparison system;
the obtaining method of the optimized comparison system comprises the following steps: and optimizing the video content comparison system facing the compressed domain based on the optimization method of the video content comparison system facing the compressed domain.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and the related description of the above-described video content comparison method for a compressed domain may refer to the corresponding description in the embodiments of the video content comparison system for a compressed domain and the optimization method for a video content comparison system for a compressed domain, and are not described herein again.
A storage device according to a fourth embodiment of the present invention stores a plurality of programs, which are suitable for being loaded and executed by a processor to implement the optimization method of the compressed domain-oriented video content comparison system or the compressed domain-oriented video content comparison method.
A processing apparatus according to a fifth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the optimization method of the video content comparison system facing the compressed domain or the video content comparison method facing the compressed domain.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing embodiments, and are not described herein again.
It should be noted that, the video content comparison system for the compressed domain provided in the foregoing embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A video content comparison system facing a compressed domain is characterized by comprising a first module, a second module and a classifier which are connected in sequence;
the first module comprises a feature learning module and a multi-modal compressed domain information fusion module; the feature learning module is configured to obtain feature maps of multiple modalities based on multiple kinds of compressed domain information of the input video; the multi-modal compressed domain information fusion module is configured to perform information fusion on the multi-modal feature maps output by the feature learning module to obtain a fusion feature vector of the input video;
the second module is configured to obtain an L1 distance of a fusion feature vector of two input videos;
the classifier is a two-classification network and is configured to perform two classifications of the comparison result based on the L1 distance output by the second module.
2. The system according to claim 1, wherein the feature learning module is constructed based on a weight-sharing twin convolutional neural network.
3. The system according to claim 1, wherein the second module obtains the L1 distance by:
and performing element-based difference on the fusion feature vectors of the two input videos to obtain a corresponding L1 distance.
4. A method for optimizing a compressed domain-oriented video content comparison system, which is used for optimizing the compressed domain-oriented video content comparison system according to any one of claims 1 to 3, and the method comprises:
training the first module based on a preset training sample to obtain an optimized first module;
constructing a new comparison system based on the optimized first module, the optimized second module and the classifier;
and fixing the parameters of the optimized first module based on a preset training sample, and training a classifier in the new comparison system to obtain the optimized comparison system.
5. The method as claimed in claim 4, wherein the training of the first module is performed by using a loss function L of
Figure FDA0002720406990000021
Wherein N is the number of samples, DnThe Euclidean distance of the fusion feature vectors of the two videos in the nth sample pair is obtained, Y is a label for judging whether the two samples are matched, and m is a preset threshold value.
6. The method as claimed in claim 5, wherein the loss function used in training the classifier in the new comparison system is cross entropy loss of the classification.
7. The method for optimizing a compressed domain-oriented video content comparison system according to any one of claims 4-6, wherein the training samples are obtained by:
based on an offline video database, video clipping is carried out on copied video segments existing in different videos according to a label file, similar video segment pairs clipped according to the label file are used as positive samples, 1 video is randomly selected from other remaining video segments, and pairs formed by the videos and the original videos are used as negative samples.
8. A video content comparison method facing a compressed domain is characterized in that the comparison method comprises the following steps:
acquiring a video pair to be compared;
respectively carrying out partial decoding on two videos in the video pair to be compared, and extracting video compression domain information;
obtaining a comparison result through an optimized comparison system;
wherein the content of the first and second substances,
the obtaining method of the optimized comparison system comprises the following steps: based on the optimization method of the compressed domain-oriented video content comparison system of any one of claims 4 to 7, the optimization method of the compressed domain-oriented video content comparison system of any one of claims 1 to 3 is obtained.
9. A storage device, having a plurality of programs stored therein, wherein the programs are adapted to be loaded and executed by a processor to implement the optimization method of the compressed domain-oriented video content comparison system according to any one of claims 4 to 7 or the compressed domain-oriented video content comparison method according to claim 8.
10. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to implement the optimization method of the compressed domain-oriented video content comparison system according to any one of claims 4 to 7 or the compressed domain-oriented video content comparison method according to claim 8.
CN202011086137.5A 2020-10-12 2020-10-12 Compressed domain-oriented video content comparison system, optimization method and comparison method Active CN112215908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011086137.5A CN112215908B (en) 2020-10-12 2020-10-12 Compressed domain-oriented video content comparison system, optimization method and comparison method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011086137.5A CN112215908B (en) 2020-10-12 2020-10-12 Compressed domain-oriented video content comparison system, optimization method and comparison method

Publications (2)

Publication Number Publication Date
CN112215908A true CN112215908A (en) 2021-01-12
CN112215908B CN112215908B (en) 2022-12-02

Family

ID=74052819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011086137.5A Active CN112215908B (en) 2020-10-12 2020-10-12 Compressed domain-oriented video content comparison system, optimization method and comparison method

Country Status (1)

Country Link
CN (1) CN112215908B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990273A (en) * 2021-02-18 2021-06-18 中国科学院自动化研究所 Compressed domain-oriented video sensitive character recognition method, system and equipment
CN114445918A (en) * 2022-02-21 2022-05-06 支付宝(杭州)信息技术有限公司 Living body detection method, device and equipment
CN114666571A (en) * 2022-03-07 2022-06-24 中国科学院自动化研究所 Video sensitive content detection method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763069B1 (en) * 2000-07-06 2004-07-13 Mitsubishi Electric Research Laboratories, Inc Extraction of high-level features from low-level features of multimedia content
CN110163079A (en) * 2019-03-25 2019-08-23 腾讯科技(深圳)有限公司 Video detecting method and device, computer-readable medium and electronic equipment
CN110175266A (en) * 2019-05-28 2019-08-27 复旦大学 A method of it is retrieved for multistage video cross-module state
WO2019242222A1 (en) * 2018-06-21 2019-12-26 北京字节跳动网络技术有限公司 Method and device for use in generating information
CN111046766A (en) * 2019-12-02 2020-04-21 武汉烽火众智数字技术有限责任公司 Behavior recognition method and device and computer storage medium
CN111242173A (en) * 2019-12-31 2020-06-05 四川大学 RGBD salient object detection method based on twin network
CN111401267A (en) * 2020-03-19 2020-07-10 山东大学 Video pedestrian re-identification method and system based on self-learning local feature characterization
CN111626178A (en) * 2020-05-24 2020-09-04 中南民族大学 Compressed domain video motion recognition method and system based on new spatio-temporal feature stream

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763069B1 (en) * 2000-07-06 2004-07-13 Mitsubishi Electric Research Laboratories, Inc Extraction of high-level features from low-level features of multimedia content
WO2019242222A1 (en) * 2018-06-21 2019-12-26 北京字节跳动网络技术有限公司 Method and device for use in generating information
CN110163079A (en) * 2019-03-25 2019-08-23 腾讯科技(深圳)有限公司 Video detecting method and device, computer-readable medium and electronic equipment
CN110175266A (en) * 2019-05-28 2019-08-27 复旦大学 A method of it is retrieved for multistage video cross-module state
CN111046766A (en) * 2019-12-02 2020-04-21 武汉烽火众智数字技术有限责任公司 Behavior recognition method and device and computer storage medium
CN111242173A (en) * 2019-12-31 2020-06-05 四川大学 RGBD salient object detection method based on twin network
CN111401267A (en) * 2020-03-19 2020-07-10 山东大学 Video pedestrian re-identification method and system based on self-learning local feature characterization
CN111626178A (en) * 2020-05-24 2020-09-04 中南民族大学 Compressed domain video motion recognition method and system based on new spatio-temporal feature stream

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴晓雨等: "多模态特征融合与多任务学习的特种视频分类", 《光学精密工程》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990273A (en) * 2021-02-18 2021-06-18 中国科学院自动化研究所 Compressed domain-oriented video sensitive character recognition method, system and equipment
CN112990273B (en) * 2021-02-18 2021-12-21 中国科学院自动化研究所 Compressed domain-oriented video sensitive character recognition method, system and equipment
CN114445918A (en) * 2022-02-21 2022-05-06 支付宝(杭州)信息技术有限公司 Living body detection method, device and equipment
CN114666571A (en) * 2022-03-07 2022-06-24 中国科学院自动化研究所 Video sensitive content detection method and system

Also Published As

Publication number Publication date
CN112215908B (en) 2022-12-02

Similar Documents

Publication Publication Date Title
CN112215908B (en) Compressed domain-oriented video content comparison system, optimization method and comparison method
CN111382555B (en) Data processing method, medium, device and computing equipment
US8787692B1 (en) Image compression using exemplar dictionary based on hierarchical clustering
CN109614517B (en) Video classification method, device, equipment and storage medium
TWI744827B (en) Methods and apparatuses for compressing parameters of neural networks
KR102299958B1 (en) Systems and methods for image compression at multiple, different bitrates
WO2021205065A1 (en) Training a data coding system comprising a feature extractor neural network
TWI806199B (en) Method for signaling of feature map information, device and computer program
CN112668559A (en) Multi-mode information fusion short video emotion judgment device and method
US10972749B2 (en) Systems and methods for reconstructing frames
CN111970509B (en) Video image processing method, device and system
CN112990273B (en) Compressed domain-oriented video sensitive character recognition method, system and equipment
WO2021205066A1 (en) Training a data coding system for use with machines
CN116978011B (en) Image semantic communication method and system for intelligent target recognition
CN115481283A (en) Audio and video feature extraction method and device, electronic equipment and computer readable storage medium
CN113628116B (en) Training method and device for image processing network, computer equipment and storage medium
KR102315077B1 (en) System and method for the detection of multiple compression of image and video
US20220377342A1 (en) Video encoding and video decoding
CN117014693A (en) Video processing method, device, equipment and storage medium
CN116074574A (en) Video processing method, device, equipment and storage medium
CN116778376B (en) Content security detection model training method, detection method and device
US7747093B2 (en) Method and apparatus for predicting the size of a compressed signal
Grycuk et al. Neural video compression based on SURF scene change detection algorithm
KR20240090245A (en) Scalable video coding system and method for machines
WO2023222313A1 (en) A method, an apparatus and a computer program product for machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant