CN112990273A - Compressed domain-oriented video sensitive character recognition method, system and equipment - Google Patents

Compressed domain-oriented video sensitive character recognition method, system and equipment Download PDF

Info

Publication number
CN112990273A
CN112990273A CN202110190037.5A CN202110190037A CN112990273A CN 112990273 A CN112990273 A CN 112990273A CN 202110190037 A CN202110190037 A CN 202110190037A CN 112990273 A CN112990273 A CN 112990273A
Authority
CN
China
Prior art keywords
face
video
modal
compressed domain
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110190037.5A
Other languages
Chinese (zh)
Other versions
CN112990273B (en
Inventor
刘雨帆
李兵
胡卫明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110190037.5A priority Critical patent/CN112990273B/en
Publication of CN112990273A publication Critical patent/CN112990273A/en
Application granted granted Critical
Publication of CN112990273B publication Critical patent/CN112990273B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of image recognition, and particularly relates to a method, a system and equipment for recognizing a video sensitive figure facing a compressed domain, aiming at solving the problems of low efficiency and resource waste of the existing sensitive figure recognition method. The invention comprises the following steps: decoding a video part to be detected to obtain compressed domain multi-modal information, detecting and calibrating the compressed domain multi-modal information, obtaining multi-modal face features from the calibrated compressed domain face multi-modal information through a trained multi-modal face recognition network, comparing the multi-modal face features with a sensitive face feature library, and determining whether a sensitive face exists. The compressed domain face multi-modal information respectively extracts different features through the I branch, the MV branch and the Res branch, and then carries out multi-modal feature fusion to obtain unique multi-modal face features. The invention can complete feature extraction only by partial decoding, solves the problems of low efficiency and resource waste in the prior art, and simultaneously keeps higher identification precision.

Description

Compressed domain-oriented video sensitive character recognition method, system and equipment
Technical Field
The invention belongs to the field of image recognition, and particularly relates to a method, a system and equipment for recognizing a video sensitive person facing a compressed domain.
Background
In the field of video security, sensitive person identification is a relatively critical piece of work. At present, the existing method is to perform full decoding on a target video to obtain a video frame. And then carrying out face detection and face feature extraction on each frame of video. And finally, comparing the extracted features of the target video with the face features of the sensitive persons one by one, and judging whether the video contains the sensitive persons or not by combining a preset threshold value. This type of process has two distinct disadvantages: firstly, the video full decoding has high requirements on computing resources and computing time, so that the method is limited to run on the mobile terminal equipment, and the running time is long even in a cloud server; secondly, the method performs independent feature extraction on each face in the video, ignores the fact that a plurality of faces in the images of the previous frame and the next frame are actually the same identity, and thus, a large amount of repeated calculation is performed when the feature extraction is performed, further the calculation resources are wasted, and the calculation time is consumed.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the problems of inefficiency and resource waste of the existing sensitive person identification method, the present invention provides a video sensitive person identification method for a compressed domain, wherein the method comprises:
s100, carrying out partial decoding on a video to be detected by using FFmepg and c + + coding methods, and extracting compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
step S200, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
step S300, inputting the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and acquiring multi-modal face features of each face of the video to be detected;
and step S400, matching the multi-modal face features with a preset sensitive character feature library to obtain a judgment result of whether the video to be detected contains sensitive characters.
Further, the multi-modal face recognition network comprises an I branch, an MV branch, a Res branch and a multi-modal fusion module:
the I branch is constructed based on one of ResNet, InceptionNet or DenseNet, and is input as a calibrated I frame and output as a feature map of the I frame; the I frame is a 3-channel RGB image;
the MV branch is constructed based on one of ResNet, InceptionNet or DenseNet, is input into a calibrated motion vector image, and is output as a feature map of the motion vector image; the motion vector image is a 2-channel vector image;
the Res branch is constructed based on one of ResNet, InceptionNet or DenseNet, is input into a calibrated residual image, and is output as a feature map corresponding to the residual image; the residual image is a 2-channel image;
the multi-mode fusion module comprises 3 residual modules connected in parallel, and each residual module comprises two convolution layers with convolution kernels of 3 x 3; the multi-mode fusion module inputs the feature map of the frame I, the feature map corresponding to the motion vector image and the feature map corresponding to the residual image and outputs the feature maps into multi-mode face feature vectors.
Further, the multi-modal face recognition network, the training method thereof comprises:
a100, acquiring a training data set by an off-line sample acquisition method;
step A200, compressing based on the training data set to obtain training video compression domain information; the training video compression domain information comprises an I frame, a motion vector image, a residual image, a DCT coefficient and a segmentation depth;
step A300, randomly selecting training video compression domain information of any training data in the training data set, and respectively acquiring corresponding training multi-mode face feature vectors through the multi-mode face recognition network;
step A500, calculating a contrast loss L based on the training multi-mode face feature vector;
and step A600, repeating the step A300 to the step A500, and reducing the contrast loss L through back propagation training until the network is converged to obtain the trained multi-modal face recognition network.
Further, the contrast loss L is:
Figure BDA0002943635640000031
wherein the content of the first and second substances,
Figure BDA0002943635640000032
Figure BDA0002943635640000033
representing sample features X1To X2P represents the characteristic dimension of the sample, Y is a label indicating whether the two samples match, Y ═ 1 indicates that the two samples are the same identity, Y ═ 0 indicates that the two samples are different identities, m is a preset threshold, N is the number of samples, R (X) is the number of samples1) And R (X)2) A sparse regularization term is represented.
Further, step S300 includes:
step S310, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated I frame information;
step S320, based on the calibrated I frame information, carrying out face calibration on the residual image and the motion vector image to obtain a calibrated residual image and a calibrated motion vector;
and step S330, the calibrated I frame information, the calibrated residual image and the calibrated motion vector are calibrated face multi-modal information.
Further, the step S400 includes:
step S410, acquiring a face feature vector of the sensitive person by the method of the steps A200-A400 based on preset sensitive person video data;
step S420, constructing a sensitive character face feature library based on the face feature vector of the sensitive character;
step S430, calculating cosine similarity between the face features and the sensitive figure face features in the sensitive figure face feature library based on the face features, and judging that the video to be detected has sensitive figures when the cosine similarity is greater than a preset threshold value T.
Further, the offline sample collection method includes:
b100, crawling celebrity videos from a network;
step B200, extracting multi-modal information of the celebrity video based on the celebrity video;
b300, extracting all the facial features of the celebrities through a face recognition algorithm based on the multi-modal information of the celebrity video;
b400, based on the facial features of the celebrities, clustering is carried out through a clustering algorithm, and the class with the largest number of faces is used as the ID of the celebrity video;
and step B500, repeating the steps B100-B400 until the number of the processed celebrity videos reaches a preset number, and obtaining a training data set.
In another aspect of the present invention, a compressed domain-oriented video sensitive character recognition system is provided, including: the system comprises an information extraction module, a face positioning calibration module, a feature extraction module and a sensitive character matching module;
the information extraction module is configured to perform partial decoding on the video to be detected by using FFmepg and c + + coding methods, and extract compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
the face positioning and calibrating module is configured to perform face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
the feature extraction module is configured to input the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and obtain multi-modal face features of each face of the video to be detected;
the sensitive person matching module is configured to match the multi-modal face features with a preset sensitive person feature library to obtain a judgment result of whether the video to be detected contains sensitive persons.
In a third aspect of the present invention, an electronic device is provided, including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for implementing the compressed domain oriented video sensitive character recognition method described above.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores computer instructions for being executed by the computer to implement the above-mentioned method for identifying a video-sensitive person in a compressed domain.
The invention has the beneficial effects that:
(1) the method for identifying the video sensitive people facing the compressed domain ensures high speed and high performance of extracting the face features of the video by fully using the compressed domain information of the video, replaces the information of full decoding of the video by identifying the compressed domain information, and greatly reduces the calculated amount of a task of detecting the video sensitive people.
(2) The compressed domain-oriented video sensitive person identification method obtains the high-level semantic representation of the face combined with the face time-space information in a fusion mode of the face multi-mode compressed domain information, so that the multi-mode face identification network effectively uses the compressed domain information of various faces, and the precision of sensitive face detection is improved.
(3) The video sensitive person identification method facing the compressed domain expands the magnitude of training samples by crawling internet face data, and reduces the dirty sample proportion of a data center by a sample cleaning technology, thereby improving the identification performance of a face identification network.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a compressed domain-oriented video sensitive character recognition method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a multi-modal face recognition network architecture in an embodiment of the invention;
FIG. 3 is a block diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a compressed domain-oriented video sensitive character recognition method, which replaces the information of video full decoding by using compressed domain information acquisition characteristics, and greatly reduces the calculated amount of video sensitive character tasks.
The invention relates to a compressed domain-oriented video sensitive character recognition method, which comprises the following steps of S100-S400, wherein the following steps are detailed:
s100, carrying out partial decoding on a video to be detected by using FFmepg and c + + coding methods, and extracting compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
step S200, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
step S300, inputting the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and acquiring multi-modal face features of each face of the video to be detected;
and step S400, matching the multi-modal face features with a preset sensitive character feature library to obtain a judgment result of whether the video to be detected contains sensitive characters.
In order to solve the problems existing in the existing video sensitive character recognition, the method provides an efficient video sensitive character recognition technology facing to a compressed domain. On one hand, when the method is used for processing the video, only partial decoding is carried out on the target video, and multi-mode information of { I frame, motion vector and residual error } is obtained. The time consuming problem is greatly mitigated since partial decoding takes only 1/10 that is fully decoded. On the other hand, the method slices the video based on the I frame, and designs a face recognition network aiming at multi-modal information. Therefore, the face feature extraction is only performed once in each slice (depending on how many faces are detected in the I frame corresponding to the current slice), and compared with the face feature extraction frame by frame, the time consumption is further reduced, and the operation cost is saved.
In order to more clearly describe the method for identifying a video sensitive person in a compressed domain according to the present invention, the following will describe each step in the embodiment of the present invention in detail with reference to fig. 1.
The method for identifying the video sensitive people facing the compressed domain in the first embodiment of the invention comprises the following steps S100-S400, wherein the following steps are described in detail:
in this embodiment, taking the mp4 format as an example of the video to be tested, a general processing means is to decode the video into a frame-by-frame image, and this operation needs to consume a large amount of computing resources. For videos with the same format, the invention can realize the task of character recognition only by partially decoding and processing the video content in a compressed domain.
S100, carrying out partial decoding on a video to be detected by using FFmepg and c + + coding methods, and extracting compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
in the embodiment, an open source entry FFmpeg is used for video decoding, compressed domain information generated in a decompression process is determined by analyzing an FFmpeg source code, and a decompression flow chart is shown as the following figure.
The I frame comprises key RGB space information in the video, the motion vector image comprises main motion information of the video, and the residual image comprises boundary information of a motion subject in the video; the DCT coefficient is obtained by performing DCT (discrete cosine transform) on a video frame in the video coding process, and can filter high-frequency information and reduce redundancy, so that the DCT coefficient can reflect texture information of the video frame in a compressed domain; when the split depth is used for video coding, each picture is first divided into macroblocks of different sizes, and the H264 standard specifies that the size of each macroblock is 16 × 16 pixels, and the macroblocks can be further divided into 16 × 8, 8 × 16, and 8 × 8 sub-macroblocks. Further, the 8 × 8 sub-macroblock may be further divided into 8 × 4, 4 × 8, 4 × 4 sub-macroblocks. These different partitioning rules are the partition depth, and the smaller the sub-macroblock is, the more drastic the pixel change at this location is, i.e. the more temporal information is enriched in the texture at this location:
time domain information: the motion vector sequence and the residual error sequence are included; spatial domain information: including I-frame information, as well as motion vector information and residual information for the current time.
In this embodiment, based on the FFmpeg-based video coding framework, the encoding and decoding method of the H265 code stream is deeply studied and analyzed. The research comprises the steps of decoding I frames, entropy decoding of code streams, inverse quantization, inverse DCT transformation and the like. For a decoded stream with motion vectors present in macroblock prediction, before entropy decoding is performed, it is necessary to first determine a prediction Mode or motion vector MV of a macroblock and a block coding Mode CBP, and then perform entropy decoding on luminance and chrominance, respectively. The method has the advantages that the source code of the FFmpeg is researched, the c + + code is written to decode the key decoding process, and unnecessary decoding information and processes are skipped, so that the compressed domain information is efficiently extracted. In addition, the whole network is trained end to end, and mixed compilation of c + + and python is required to be completed in engineering, so that compressed domain information extracted from FFmepg by c + + can be directly exchanged in training using a PyTorch framework.
The video, when in a compressed state (as in conventional avi, mp4), is in the form of a binary stream. In the conventional method, the image needs to be decoded into one frame by a decoder for analysis and processing, but the process takes much time, especially when the amount of video to be processed is large.
The invention adopts a partial decoding mode to replace the whole decoding mode, wherein the partial decoding is to carry out entropy decoding on binary code streams and is decoded into a readable compressed domain form; wherein the information of the compressed domain also contains the entire information of the video. And the time consumed by decoding is reduced by extracting data from the compressed domain data for processing, and further, the purpose of task identification can be achieved by adopting smaller data volume, so that the real-time performance of model processing is improved.
The prior art can achieve the technical effect of character recognition by fully decoding the video, and a technical scheme of directly performing character recognition by compressing domain data according to the invention has not been proposed yet, so that the invention has creativity.
The present invention can be implemented by any programming language without any limitation to the implementation means, and any technical solution for achieving the same technical effect by using the principles disclosed in the present invention should be considered as the scope of the present application.
Preferably, in this embodiment, the present invention is implemented by using a 3.2GHz central processing unit and an 8 gbyte memory computer, the training process of the network is implemented in a PyTorch framework, the training and testing processes of the whole network are all processed in parallel by using Telsa V100, and a working program for video compression and information extraction is compiled by using a C + + language, thereby implementing the present invention.
Step S200, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
step S300, inputting the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and acquiring multi-modal face features of each face of the video to be detected;
and training a multi-mode face recognition model. Different from the traditional RGB face recognition, the compressed domain information contains three kinds of information, and is a multi-modal face recognition technology. Therefore, aiming at the characteristics of compressed domain information, the invention designs a multi-mode face recognition network structure (which consists of three independent branches and a mode fusion module, namely an I branch corresponding to I frame information processing, an MV branch corresponding to motion vector processing, a Res branch corresponding to residual error processing and a { I, M, Res } fusion module). The invention also designs a multi-modal face recognition model training method based on the network structure, which comprises the steps of establishing a data set and extracting characteristics to train specifically.
In this embodiment, the multi-modal face recognition network includes an I branch, an MV branch, a Res branch, and a multi-modal fusion module, as shown in fig. 2:
the I branch is constructed based on one of ResNet, InceptionNet or DenseNet, and is input as a calibrated I frame and output as a feature map of the I frame; the I frame is a 3-channel RGB image;
the MV branch is constructed based on one of ResNet, InceptionNet or DenseNet, is input into a calibrated motion vector image, and is output as a feature map of the motion vector image; the motion vector image is a 2-channel vector image;
the Res branch is constructed based on one of ResNet, InceptionNet or DenseNet, is input into a calibrated residual image, and is output as a feature map corresponding to the residual image; the residual image is a 2-channel image;
the multi-mode fusion module comprises 3 residual modules connected in parallel, and each residual module comprises two convolution layers with convolution kernels of 3 x 3; the multi-mode fusion module inputs the feature map of the frame I, the feature map corresponding to the motion vector image and the feature map corresponding to the residual image and outputs the feature maps into multi-mode face feature vectors;
in this embodiment, the training method of the multi-modal face recognition network includes:
because the structure of the multi-modal neural network of the invention needs a large number of training samples, the invention designs an off-line sample acquisition method, which can effectively acquire a large number of samples required by training.
A100, acquiring a training data set by an off-line sample acquisition method;
in this embodiment, the offline sample collection method includes:
b100, crawling celebrity videos from a network;
step B200, extracting multi-modal information of the celebrity video based on the celebrity video;
b300, extracting all the facial features of the celebrities through a face recognition algorithm based on the multi-modal information of the celebrity video;
b400, based on the facial features of the celebrities, clustering is carried out through a clustering algorithm, and the class with the largest number of faces is used as the ID of the celebrity video;
and step B500, repeating the steps B100-B400 until the number of the processed celebrity videos reaches a preset number, and obtaining a training data set. In this embodiment, it is preferable to set the video of the celebrity crawling 10000 identities to a preset number. Because the celebrity video crawled from the network may have a large amount of dirty data, namely the condition of non-self video or multiple human faces, the method can effectively eliminate the dirty data, and avoids time and labor waste caused by manual marking. Preferably, the present invention can use the KMeans clustering algorithm to cluster the features.
Step A200, compressing based on the training data set to obtain training video compression domain information; the training video compression domain information comprises an I frame, a motion vector image, a residual image, a DCT coefficient and a segmentation depth;
step A300, randomly selecting training video compression domain information of any training data in the training data set, and respectively acquiring corresponding training multi-mode face feature vectors through the multi-mode face recognition network;
step A500, calculating a contrast loss L based on the training multi-mode face feature vector;
and step A600, repeating the step A300 to the step A500, and reducing the contrast loss L through back propagation training until the network is converged to obtain the trained multi-modal face recognition network.
In this embodiment, the contrast loss L is:
Figure BDA0002943635640000121
wherein the content of the first and second substances,
Figure BDA0002943635640000122
Figure BDA0002943635640000123
representing sample features X1To X2P represents the characteristic dimension of the sample, Y is a label indicating whether the two samples match, Y ═ 1 indicates that the two samples are the same identity, Y ═ 0 indicates that the two samples are different identities, m is a preset threshold, N is the number of samples, R (X) is the number of samples1) And R (X)2) A sparse regularization term is represented. The specific form of the sparse regularization term may be L1 norm, L2 norm, or other specific forms. By pairing features X1And X2By adding sparse regular constraint, the characteristics have stronger separability, the redundancy is reduced, the generalization capability and the processing efficiency of the model are improved, and particularly, the method has great advantages under the condition of small data volume. In this embodiment, step S300 includes:
step S310, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated I frame information;
step S320, based on the calibrated I frame information, carrying out face calibration on the residual image and the motion vector image to obtain a calibrated residual image and a calibrated motion vector;
and step S330, the calibrated I frame information, the calibrated residual image and the calibrated motion vector are calibrated face multi-modal information.
And step S400, matching the multi-modal face features with a preset sensitive character feature library to obtain a judgment result of whether the video to be detected contains sensitive characters.
In this embodiment, step S400 includes:
step S410, acquiring a face feature vector of the sensitive person by the method of the steps A200-A400 based on preset sensitive person video data;
step S420, constructing a sensitive character face feature library based on the face feature vector of the sensitive character;
step S430, calculating cosine similarity between the face features and the sensitive figure face features in the sensitive figure face feature library based on the face features, and judging that the video to be detected has sensitive figures when the cosine similarity is greater than a preset threshold value T.
The video sensitive person identification system facing the compressed domain in the second embodiment of the invention comprises an information extraction module, a face positioning calibration module, a feature extraction module and a sensitive person matching module;
the information extraction module is configured to perform partial decoding on the video to be detected by using FFmepg and c + + coding methods, and extract compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
the face positioning and calibrating module is configured to perform face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
the feature extraction module is configured to input the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and obtain multi-modal face features of each face of the video to be detected;
the sensitive person matching module is configured to match the multi-modal face features with a preset sensitive person feature library to obtain a judgment result of whether the video to be detected contains sensitive persons.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the video sensitive person identification system for a compressed domain provided in the foregoing embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An electronic device according to a third embodiment of the present invention includes at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for implementing the compressed domain oriented video sensitive character recognition method described above.
A computer-readable storage medium according to a fourth embodiment of the present invention is characterized in that the computer-readable storage medium stores computer instructions for being executed by the computer to implement the above-mentioned method for identifying a video-sensitive person in a compressed domain.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Reference is now made to FIG. 3, which illustrates a block diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application. The server shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 3, the computer system includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for system operation are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An Input/Output (I/O) interface 305 is also connected to bus 304.
The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 308 including a hard disk and the like; and a communication section 309 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 311. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 301. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A method for identifying video sensitive people facing compressed domain is characterized in that the method comprises the following steps:
s100, carrying out partial decoding on a video to be detected by using FFmepg and c + + coding methods, and extracting compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
step S200, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
step S300, inputting the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and acquiring multi-modal face features of each face of the video to be detected;
and step S400, matching the multi-modal face features with a preset sensitive character feature library to obtain a judgment result of whether the video to be detected contains sensitive characters.
2. The compressed domain-oriented video sensitive character recognition method as claimed in claim 1, wherein the multi-modal face recognition network comprises an I branch, an MV branch, a Res branch and a multi-modal fusion module:
the I branch is constructed based on one of ResNet, InceptionNet or DenseNet, and is input as a calibrated I frame and output as a feature map of the I frame; the I frame is a 3-channel RGB image;
the MV branch is constructed based on one of ResNet, InceptionNet or DenseNet, is input into a calibrated motion vector image, and is output as a feature map of the motion vector image; the motion vector image is a 2-channel vector image;
the Res branch is constructed based on one of ResNet, InceptionNet or DenseNet, is input into a calibrated residual image, and is output as a feature map corresponding to the residual image; the residual image is a 2-channel image;
the multi-mode fusion module comprises 3 residual modules connected in parallel, and each residual module comprises two convolution layers with convolution kernels of 3 x 3; the multi-mode fusion module inputs the feature map of the frame I, the feature map corresponding to the motion vector image and the feature map corresponding to the residual image and outputs the feature maps into multi-mode face feature vectors.
3. The compressed domain-oriented video sensitive character recognition method as claimed in claim 2, wherein the multi-modal face recognition network is trained by the method comprising:
a100, acquiring a training data set by an off-line sample acquisition method;
step A200, compressing based on the training data set to obtain training video compression domain information; the training video compression domain information comprises an I frame, a motion vector image, a residual image, a DCT coefficient and a segmentation depth;
step A300, randomly selecting training video compression domain information of any training data in the training data set, and respectively acquiring corresponding training multi-mode face feature vectors through the multi-mode face recognition network;
step A400, calculating a contrast loss L based on the training multi-mode face feature vector;
and step A500, repeating the step A300 to the step A400, and reducing the contrast loss L through back propagation training until the network is converged to obtain the trained multi-modal face recognition network.
4. The method of claim 3, wherein the contrast loss L is:
Figure FDA0002943635630000021
wherein the content of the first and second substances,
Figure FDA0002943635630000031
representing sample features X1To X2P represents the characteristic dimension of the sample, Y is a label indicating whether the two samples match, Y ═ 1 indicates that the two samples are the same identity, Y ═ O indicates that the two samples are different identities, m is a preset threshold, N is the number of samples, R (X) represents the number of samples1) And R (X)2) A sparse regularization term is represented.
5. The method for identifying a sensitive human in a compressed domain-oriented video according to claim 1, wherein the step S300 comprises:
step S310, carrying out face detection and face calibration on the compressed domain multi-modal information to obtain calibrated I frame information;
step S320, based on the calibrated I frame information, carrying out face calibration on the residual image and the motion vector image to obtain a calibrated residual image and a calibrated motion vector;
and step S330, the calibrated I frame information, the calibrated residual image and the calibrated motion vector are calibrated face multi-modal information.
6. The method for identifying a video-sensitive character in a compressed domain according to claim 3, wherein the step S400 comprises:
step S410, acquiring a face feature vector of the sensitive person by the method of the steps A200-A400 based on preset sensitive person video data;
step S420, constructing a sensitive character face feature library based on the face feature vector of the sensitive character;
step S430, calculating cosine similarity between the face features and the sensitive figure face features in the sensitive figure face feature library based on the face features, and judging that the video to be detected has sensitive figures when the cosine similarity is greater than a preset threshold value T.
7. The compressed domain-oriented video sensitive character recognition method of claim 3, wherein the offline sample collection method comprises:
b100, crawling celebrity videos from a network;
step B200, extracting multi-modal information of the celebrity video based on the celebrity video;
b300, extracting all the facial features of the celebrities through a face recognition algorithm based on the multi-modal information of the celebrity video;
b400, based on the facial features of the celebrities, clustering is carried out through a clustering algorithm, and the class with the largest number of faces is used as the ID of the celebrity video;
and step B500, repeating the steps B100-B400 until the number of the processed celebrity videos reaches a preset number, and obtaining a training data set.
8. A compressed domain oriented video-sensitive character recognition system, the system comprising: the system comprises an information extraction module, a face positioning calibration module, a feature extraction module and a sensitive character matching module;
the information extraction module is configured to perform partial decoding on the video to be detected by using FFmepg and c + + coding methods, and extract compressed domain multi-modal information of the video to be detected; the compressed domain multimodal information comprises: i frame, motion vector image, residual image, DCT coefficient and segmentation depth;
the face positioning and calibrating module is configured to perform face detection and face calibration on the compressed domain multi-modal information to obtain calibrated compressed domain face multi-modal information;
the feature extraction module is configured to input the calibrated multi-modal information of the compressed domain face into a trained multi-modal face recognition network, and obtain multi-modal face features of each face of the video to be detected;
the sensitive person matching module is configured to match the multi-modal face features with a preset sensitive person feature library to obtain a judgment result of whether the video to be detected contains sensitive persons.
9. An electronic device, comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor to implement the compressed domain oriented video sensitive character recognition method of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the compressed domain oriented video sensitive character recognition method of any of claims 1-7.
CN202110190037.5A 2021-02-18 2021-02-18 Compressed domain-oriented video sensitive character recognition method, system and equipment Active CN112990273B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110190037.5A CN112990273B (en) 2021-02-18 2021-02-18 Compressed domain-oriented video sensitive character recognition method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110190037.5A CN112990273B (en) 2021-02-18 2021-02-18 Compressed domain-oriented video sensitive character recognition method, system and equipment

Publications (2)

Publication Number Publication Date
CN112990273A true CN112990273A (en) 2021-06-18
CN112990273B CN112990273B (en) 2021-12-21

Family

ID=76394051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110190037.5A Active CN112990273B (en) 2021-02-18 2021-02-18 Compressed domain-oriented video sensitive character recognition method, system and equipment

Country Status (1)

Country Link
CN (1) CN112990273B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445918A (en) * 2022-02-21 2022-05-06 支付宝(杭州)信息技术有限公司 Living body detection method, device and equipment
CN114666571A (en) * 2022-03-07 2022-06-24 中国科学院自动化研究所 Video sensitive content detection method and system
CN115391751A (en) * 2022-10-31 2022-11-25 知安视娱(北京)科技有限公司 Infringement determination method
CN116778376A (en) * 2023-05-11 2023-09-19 中国科学院自动化研究所 Content security detection model training method, detection method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1960491A (en) * 2006-09-21 2007-05-09 上海大学 Real time method for segmenting motion object based on H.264 compression domain
CN102014295A (en) * 2010-11-19 2011-04-13 嘉兴学院 Network sensitive video detection method
CN103826125A (en) * 2014-01-20 2014-05-28 北京创鑫汇智科技发展有限责任公司 Concentrated analysis method of compressed surveillance video and device
CN105825176A (en) * 2016-03-11 2016-08-03 东华大学 Identification method based on multi-mode non-contact identity characteristics
CN106850515A (en) * 2015-12-07 2017-06-13 中国移动通信集团公司 A kind of data processing method and video acquisition device, decoding apparatus
CN108319938A (en) * 2017-12-31 2018-07-24 奥瞳系统科技有限公司 High quality training data preparation system for high-performance face identification system
CN109858467A (en) * 2019-03-01 2019-06-07 北京视甄智能科技有限公司 A kind of face identification method and device based on the fusion of key point provincial characteristics
US20200033112A1 (en) * 2015-11-06 2020-01-30 Ap Robotics, Llc Interferometric distance measurement based on compression of chirped interferogram from cross-chirped interference
CN110796662A (en) * 2019-09-11 2020-02-14 浙江大学 Real-time semantic video segmentation method
CN111507311A (en) * 2020-05-22 2020-08-07 南京大学 Video character recognition method based on multi-mode feature fusion depth network
CN111860291A (en) * 2020-07-16 2020-10-30 上海交通大学 Multi-mode pedestrian identity recognition method and system based on pedestrian appearance and gait information
CN111914742A (en) * 2020-07-31 2020-11-10 辽宁工业大学 Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics
CN112215908A (en) * 2020-10-12 2021-01-12 国家计算机网络与信息安全管理中心 Compressed domain-oriented video content comparison system, optimization method and comparison method
CN112241704A (en) * 2020-10-16 2021-01-19 百度(中国)有限公司 Method and device for judging portrait infringement, electronic equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1960491A (en) * 2006-09-21 2007-05-09 上海大学 Real time method for segmenting motion object based on H.264 compression domain
CN102014295A (en) * 2010-11-19 2011-04-13 嘉兴学院 Network sensitive video detection method
CN103826125A (en) * 2014-01-20 2014-05-28 北京创鑫汇智科技发展有限责任公司 Concentrated analysis method of compressed surveillance video and device
US20200033112A1 (en) * 2015-11-06 2020-01-30 Ap Robotics, Llc Interferometric distance measurement based on compression of chirped interferogram from cross-chirped interference
CN106850515A (en) * 2015-12-07 2017-06-13 中国移动通信集团公司 A kind of data processing method and video acquisition device, decoding apparatus
CN105825176A (en) * 2016-03-11 2016-08-03 东华大学 Identification method based on multi-mode non-contact identity characteristics
CN108319938A (en) * 2017-12-31 2018-07-24 奥瞳系统科技有限公司 High quality training data preparation system for high-performance face identification system
CN109858467A (en) * 2019-03-01 2019-06-07 北京视甄智能科技有限公司 A kind of face identification method and device based on the fusion of key point provincial characteristics
CN110796662A (en) * 2019-09-11 2020-02-14 浙江大学 Real-time semantic video segmentation method
CN111507311A (en) * 2020-05-22 2020-08-07 南京大学 Video character recognition method based on multi-mode feature fusion depth network
CN111860291A (en) * 2020-07-16 2020-10-30 上海交通大学 Multi-mode pedestrian identity recognition method and system based on pedestrian appearance and gait information
CN111914742A (en) * 2020-07-31 2020-11-10 辽宁工业大学 Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics
CN112215908A (en) * 2020-10-12 2021-01-12 国家计算机网络与信息安全管理中心 Compressed domain-oriented video content comparison system, optimization method and comparison method
CN112241704A (en) * 2020-10-16 2021-01-19 百度(中国)有限公司 Method and device for judging portrait infringement, electronic equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
C. SOLANA-CIPRES等: "Real-time moving object segmentation in H.264 compressed domain based on approximate reasoning", 《INTERNATIONAL JOURNAL OF APPROXIMATE REASONING》 *
李晓光等: "压缩域人脸检测与跟踪技术", 《测控技术》 *
田巍等: "一种用于DCT压缩域的人脸检测算法", 《测控技术》 *
蒋轶玮: "视频/图像压缩域编辑技术的研究", 《中国博士学位论文全文数据库 信息科技辑》 *
赵嘉姝: "基于PCA降维算法优化的敏感人脸识别研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈桂财等: "一种采用AVS的视频监控系统设计与实现", 《中国图象图形学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445918A (en) * 2022-02-21 2022-05-06 支付宝(杭州)信息技术有限公司 Living body detection method, device and equipment
CN114666571A (en) * 2022-03-07 2022-06-24 中国科学院自动化研究所 Video sensitive content detection method and system
CN114666571B (en) * 2022-03-07 2024-06-14 中国科学院自动化研究所 Video sensitive content detection method and system
CN115391751A (en) * 2022-10-31 2022-11-25 知安视娱(北京)科技有限公司 Infringement determination method
CN116778376A (en) * 2023-05-11 2023-09-19 中国科学院自动化研究所 Content security detection model training method, detection method and device
CN116778376B (en) * 2023-05-11 2024-03-22 中国科学院自动化研究所 Content security detection model training method, detection method and device

Also Published As

Publication number Publication date
CN112990273B (en) 2021-12-21

Similar Documents

Publication Publication Date Title
CN112990273B (en) Compressed domain-oriented video sensitive character recognition method, system and equipment
Cai et al. End-to-end optimized ROI image compression
US11847816B2 (en) Resource optimization based on video frame analysis
US11074791B2 (en) Automatic threat detection based on video frame delta information in compressed video streams
CN112673625A (en) Hybrid video and feature encoding and decoding
Sun et al. Semantic structured image coding framework for multiple intelligent applications
Zhang et al. A joint compression scheme of video feature descriptors and visual content
CN112235569B (en) Quick video classification method, system and device based on H264 compressed domain
CN116233445B (en) Video encoding and decoding processing method and device, computer equipment and storage medium
US9712828B2 (en) Foreground motion detection in compressed video data
Chakraborty et al. MAGIC: Machine-learning-guided image compression for vision applications in Internet of Things
CN112215908A (en) Compressed domain-oriented video content comparison system, optimization method and comparison method
Beratoğlu et al. Vehicle license plate detector in compressed domain
Uddin et al. Double compression detection in HEVC-coded video with the same coding parameters using picture partitioning information
CN111368593A (en) Mosaic processing method and device, electronic equipment and storage medium
CN115474058A (en) Point cloud encoding processing method, point cloud decoding processing method and related equipment
CN113810654A (en) Image video uploading method and device, storage medium and electronic equipment
US11164328B2 (en) Object region detection method, object region detection apparatus, and non-transitory computer-readable medium thereof
CN114501031B (en) Compression coding and decompression method and device
CN112714336B (en) Video segmentation method and device, electronic equipment and computer readable storage medium
Lyu et al. Apron surveillance video coding based on compositing virtual reference frame with object library
Kuang et al. Fast HEVC to SCC transcoding based on decision trees
WO2013160040A1 (en) Methods and devices for object detection in coded video data
CN113051415B (en) Image storage method, device, equipment and storage medium
WO2024078512A1 (en) Pre-analysis based image compression methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant