CN112927127A - Video privacy data fuzzification method running on edge device - Google Patents

Video privacy data fuzzification method running on edge device Download PDF

Info

Publication number
CN112927127A
CN112927127A CN202110265858.0A CN202110265858A CN112927127A CN 112927127 A CN112927127 A CN 112927127A CN 202110265858 A CN202110265858 A CN 202110265858A CN 112927127 A CN112927127 A CN 112927127A
Authority
CN
China
Prior art keywords
video
algorithm
model
mask
edge device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110265858.0A
Other languages
Chinese (zh)
Inventor
张泽华
李向阳
高焕丽
罗家祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110265858.0A priority Critical patent/CN112927127A/en
Publication of CN112927127A publication Critical patent/CN112927127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for fuzzifying video privacy data running on edge equipment, which comprises the steps of algorithm model design and algorithm model building, model optimization, model quantification and acceleration, and model migration to mobile terminal equipment for running; the invention has the beneficial effects that: the privacy automatic fuzzy system is low in computing resource requirement and high in computing speed, can run on a mobile terminal, does not depend on other server resources, and protects the privacy of others in the public video; the multi-target tracking algorithm is adopted to track the object, so that the purpose of specifying the fuzzy object or recording the motion track of the specified object can be realized; the TensorRT is used for quantifying, optimizing and accelerating the algorithm model, the deployment difficulty is reduced, and meanwhile, when the algorithm runs, three threads are used for respectively processing the preprocessing part, the model reasoning part and the post-processing part, so that the running efficiency is improved.

Description

Video privacy data fuzzification method running on edge device
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a method for fuzzifying video privacy data running on edge equipment.
Background
The application range of monitoring and other camera devices is more and more extensive, for example, the monitoring device of a community or a kitchen environment is disclosed by a camera, or a video needs to be acquired by camera shooting and then uploaded to a network for publication; meanwhile, people pay more and more attention to privacy protection, and although the kitchen monitoring disclosure can help people to supervise the kitchen sanitary environment, the privacy of cooks is also leaked; the video is published on the internet, which is an efficient transmission way, and meanwhile, the privacy of others in the video is more easily revealed due to the efficient transmission and searching of network data.
The existing processing mode is mainly post-desensitization processing which needs to consume manpower processing, the post-desensitization processing is not suitable for online monitoring equipment, and meanwhile, the processing mode is mostly frame-by-frame processing, and the efficiency is low.
The segmentation technology in the computer vision technology can rapidly obtain the mask of all people in the video, and can be divided into a semantic segmentation technology and an instance segmentation technology; the semantic segmentation is different from the example segmentation, namely the semantic segmentation refers to pixel-level classification, each pixel point in the image is classified, and the example is taken as an example to judge whether each pixel point is a part of a person or not; the example segmentation is further distinguished on the basis of semantic segmentation, and not only can distinguish people from backgrounds, but also can distinguish people from people; therefore, semantic segmentation can be rapidly segmented, but is not flexible enough, and when the proportion of the target object to the image is small, the segmentation precision is low, the example segmentation technology is flexible, but the calculation amount is large, and the operation can be performed on low-calculation-capacity equipment only by performing more processing; meanwhile, the target tracking technology can be conveniently added to the instance level processing.
The convolutional neural network can realize higher-precision example segmentation, but the calculation amount of the example segmentation is huge, most of the example segmentation can only be operated at a server end, and the example segmentation is difficult to directly deploy to mobile-end equipment; with the rapid development of society, the number of image pickup apparatuses has also increased explosively; it is difficult to process a large-scale instance partitioning algorithm by a server alone, and meanwhile, the server processing has a risk of privacy leakage in the process of uploading video data, so that a method capable of performing edge calculation on edge equipment is needed to process the data in real time, wherein the edge calculation refers to that an application program runs on the edge equipment on the side close to an object or a data source, so that a faster service response is generated and real-time services are met.
In order to reduce the demand of computing resources, protect the privacy of others in the public video and realize the purpose of specifying a fuzzy object or recording the motion trail of a specified target without depending on other server resources, a video privacy data fuzzification method running on edge equipment is provided for the purpose.
Disclosure of Invention
The invention aims to provide a method for fuzzifying video privacy data running on edge equipment, which reduces the requirement of operation resources, does not depend on other server resources, protects the privacy of others in public videos, and realizes the purpose of specifying fuzzy objects or recording the motion trail of specified objects.
In order to achieve the purpose, the invention provides the following technical scheme: a video privacy data fuzzification method running on edge equipment comprises algorithm model building and model running, and the method comprises the following steps:
the method comprises the following steps: model initialization, including model building according to configuration files, model optimization acceleration on training and tracker initialization;
step two: acquiring a video sequence and importing the video sequence into an operation platform;
step three: performing feature extraction by adopting a lightweight network and an FPN structure;
step four: obtaining a detection result of the image and an example mask through an example segmentation algorithm;
step five: allocating an ID to each detected object through a multi-target tracking algorithm;
step six: whether the fuzzy processing is carried out on the object is controlled through the ID.
As a preferred technical solution of the present invention, in the first step, the optimization acceleration method includes: the PyTorch model is firstly converted into an intermediate format file ONNX, and then optimized and accelerated by TensorRT.
As a preferred technical solution of the present invention, in the second step, the operation platform is an operation platform supporting C + + or Python or is applied to NVIDIA Jetson devices.
In the third step, the light-weight feature extraction network ShuffleNet V2 or MobileNet is selected as the feature extraction network.
As a preferred technical solution of the present invention, in the fourth step, the example segmentation algorithm includes 2 subtasks, which are respectively used for target detection and mask generation.
In a preferred embodiment of the present invention, the target detection is to find all people in an input image by an algorithm, and the result is represented by selecting a frame with a bounding box, and the output result of the target detection includes classification and regression.
As a preferred technical solution of the present invention, the mask generation manner is a manner of combining an original mask and a mask coefficient; wherein the original mask is independent of the specific person in the image, the mask coefficients are associated with the specific person, and each person generates a set of mask coefficients.
As a preferred technical solution of the present invention, in the fifth step, the multi-target tracking algorithm is an SORT algorithm.
As a preferred technical solution of the present invention, in the sixth step, part of the persons are subjected to the fuzzy processing by ID pertinence, and the fuzzy object can be changed by changing the ID at different time periods, so as to prevent the tracking algorithm from tracking the wrong object.
As a preferred technical solution of the present invention, the video input may be a video read from a file or a camera video read directly for processing, and the video read library is OpenCV.
As an optimal technical scheme of the invention, TensorRT is used for quantifying, optimizing and accelerating the algorithm model, the deployment difficulty is reduced, and meanwhile, three threads are adopted to respectively process three parts of preprocessing, model reasoning and post-processing when the algorithm runs.
Compared with the prior art, the invention has the beneficial effects that:
(1) the privacy automatic fuzzy system is low in computing resource requirement, can run on a mobile terminal, does not depend on other server resources, and protects the privacy of other people in the public video; the tracking algorithm is adopted to track the object, so that the purpose of specifying the fuzzy object or recording the motion trail of the specified target can be realized;
(2) the TensorRT is used for quantifying, optimizing and accelerating the algorithm model, the deployment difficulty is reduced, and meanwhile, when the algorithm runs, three threads are used for respectively processing the preprocessing part, the model reasoning part and the post-processing part, so that the running efficiency is improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a flow chart of the model operation of the present invention;
FIG. 3 is a diagram of an exemplary segmentation algorithm model of the present invention;
FIG. 4 is a flow chart of model transformation according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 2, fig. 3 and fig. 4, the present invention provides a technical solution: a method for fuzzifying video privacy data running on edge equipment comprises algorithm model building and model running, and comprises the following steps:
the method comprises the following steps: model initialization, including model building according to configuration files, model optimization acceleration on training and tracker initialization; model initialization, including building a model according to a configuration file, optimizing and accelerating the trained model through neural network acceleration and initializing a tracker, wherein the neural network acceleration is used for further improving the running speed of the model, and the TensorRT is used for reasoning and accelerating, is a high-performance deep learning reasoning and accelerating optimizer aiming at NVIDIA GPU, and a model realized by a deep learning framework PyTorch cannot be directly optimized by the TensorRT, so that the PyTorch model is firstly converted into an intermediate format file ONNX and then optimized and accelerated by the TensorRT;
step two: importing the obtained video sequence into an operation platform; the operation platform is an operation platform supporting C + + or Python or applied to NVIDIA Jetson equipment;
step three: performing feature extraction on an input image through a feature extraction network; the feature extraction network is composed of a convolutional neural network and is used for extracting features of an input image; the input is a preprocessed image, the output is a series of feature maps, and the feature maps are used for subsequent example segmentation algorithm processing; the feature extraction network selects the light weight type feature extraction network ShuffleNet V2, the light weight type feature extraction network ShuffleNet V2 is unnecessary, other feature extraction networks can be replaced, such as MobileNet, or a larger feature extraction network can be used when the calculation power is sufficient so as to meet the processing of higher resolution or obtain higher precision;
step four: identifying pixel points where people are located in an input image through an example segmentation algorithm, and generating an independent mask for each person; the instance segmentation algorithm adopts YOLACT, and is based on deep learning, wherein the instance segmentation algorithm comprises 2 subtasks which are respectively used for target detection and mask generation; the input of the algorithm is a preprocessed video frame, after algorithm processing, the output comprises a human body mask and a human body detection result, the detection result and the mask are in one-to-one correspondence, wherein the detection result is used for a subsequent tracking algorithm, the tracking algorithm allocates an identity ID to each object, and the mask is used for subsequent fuzzy processing; wherein the content of the first and second substances,
the target detection means that for an input image, all people are found in the image through an algorithm, the result is that a bounding box is used for selecting and representing a human frame, therefore, the output result comprises classification and regression, wherein the classification is used for judging whether the image is a human or not, when an object is a human, a regression task regresses the boundary of the bounding box, the used detection mode is an Anchor-based detection mode, the Anchor-based detection mode is that a preset detection frame is fully paved in the image, therefore, the classification in the detection is to judge whether the human exists in the preset detection frame, and meanwhile, the regression mode is the offset position of the specific position of the regressed human relative to the detection frame;
the mask generation mode is a mode of combining an original mask and mask coefficients, wherein the original mask is not related to specific people in an image, the mask coefficients are related to the specific people, and each person generates a group of mask coefficients;
an ID number is distributed to each person in a video sequence through a multi-target tracking algorithm, and fuzzy processing is carried out on part of the persons through the ID number; the multi-target tracking algorithm is SORT which is a high-speed multi-target tracking algorithm based on detection results, and according to the detection results, an Identity (ID) number is allocated to each person in the video by the multi-target tracking algorithm.
Step six: according to the ID number, part of people can be fuzzed in a targeted manner, and the condition that main people are not blurred and background people are all blurred in a video when the video is published on the Internet as required can be met.
It should be added that: the video input can be processed by reading the video from a file or directly reading the camera video.
It should be added that: for an input video sequence, all people in the image can be detected. And a mask is generated for each person, the mask is at an example level, different persons can be distinguished through the mask, the mask is different from the mask in semantic segmentation, only the persons and the background can be distinguished, meanwhile, different data can be adopted to train the algorithm model according to task requirements, different types of example segmentation models are obtained, and the example segmentation models do not only act on the persons.
It should be added that: the fuzzy operation can be performed on a specific area in a targeted manner according to the generated mask, the fuzzy algorithm can select any one of Gaussian blur, mean value blur and median blur, the mask is generated by adopting example segmentation and then the fuzzy operation is performed, a fuzzy boundary can be well obtained, and other information of the video can be retained to the maximum extent after the fuzzy operation is performed on the target.
It should be added that: in order to optimize the speed of the model and reduce the deployment difficulty, the precision of the model trained on the server is Float32, the precision of Float16 or INT8 can be quantified according to the support of the device, or the mixed precision is adopted, so that the size of the algorithm model can be reduced and the operation speed of the algorithm can be increased.
It should be added that: in order to further utilize the resources of the operation platform and improve the operation speed, 3 threads are adopted to respectively process the read video data and carry out preprocessing operation; invoking an instance segmentation model to obtain an output result; and thirdly, the output result is converted and post-processed, then is processed by a tracking algorithm and then is displayed or stored, and through concurrent processing of multiple threads, CPU resources can be more fully utilized, and real-time operation of the algorithm is facilitated.
In the embodiment, preferably, the TensorRT is used for quantizing, optimizing and accelerating the algorithm model, the deployment difficulty is reduced, and meanwhile, when the algorithm runs, three threads are adopted for respectively processing the preprocessing part, the model reasoning part and the post-processing part, so that the running efficiency is improved.
The algorithm model building data is as follows:
the main network part selects a light-weight feature extraction network ShuffleNet V2 to extract image features, the ShuffleNet V2 structure can reduce the memory access time and has higher operation speed, the ShuffleNet V2 is a feature extraction network, and the number of parameters and the amount of operation are greatly reduced by adopting packet convolution and Channel shuffle to replace standard convolution; because the complexity of the model and the time cost of memory access are reduced, the precision and the speed of the model are better balanced; the lightweight feature extraction network ShuffleNet V2 is optional, and other feature extraction networks, such as MobileNet, can be used instead, or a larger feature extraction network can be used instead when the computational power is sufficient to meet higher resolution processing or to achieve higher accuracy.
The characteristic pyramid and multi-scale characteristic output means that the multi-scale expression capability of a model is enhanced by adopting an FPN structure, because a convolutional neural network depends on stacking convolutional layers, the receptive field of a single neuron is limited, the receptive field of the neuron is rapidly increased by adopting a down-sampling layer, but a characteristic graph after down-sampling is correspondingly small, and a characteristic graph with low resolution is difficult to accurately restore the boundary of a mask in a subsequent example segmentation algorithm, so that the operation of FPN multi-scale fusion is adopted, the characteristic graph after down-sampling is up-sampled and restored to the original size and is directly added with an original characteristic graph, the receptive field can be effectively increased, deep semantic information is obtained, the positioning capability of the original characteristic graph is reserved, and the subsequent segmentation algorithm is facilitated to position a segmentation boundary;
the example segmentation algorithm Head is YOLACT, the used target detection algorithm is a detection method based on Anchor, the Anchor has the function of fully paving a preset detection frame in the image, and whether a detection object exists in the corresponding detection frame can be judged according to an output result; the input of the network is a single picture, the output is divided into 4 parts after passing through a convolutional network, and the 4 parts are respectively classified results, namely the probability that a target exists in the region corresponding to the Anchor; a regression result is used for representing the offset of the target real position relative to a preset detection frame, and the accurate position is restored according to the offset and the preset position; the combination of the third and fourth is used for respectively generating masks for each object, wherein the third is 32 original masks relative to the whole image and does not depend on specific objects; generating 32 coefficients for each object as a result of (iv); linearly combining the 32 original mask images in the third step with the 32 corresponding coefficients in the fourth step to obtain masks of each person in the image; selecting tanh as an activation function in the fourth step; ensuring that the coefficient can be positive or negative, and enhancing the combined expression capability; because the mask is generated for each object by adopting a mode of combining the original mask and the mask coefficient, only a 32-dimensional mask coefficient needs to be generated for each object, wherein the original mask is similar to semantic segmentation and is independent of specific objects, and the running speed of the method is greatly improved compared with a two-stage algorithm; after the linear summation of the mask coefficients of the original mask graph, mapping the output to the range of (0, 1) through a Sigmoid activation function, and simultaneously dividing the output into 2 classes of human masks and backgrounds by adopting a threshold value; because the range of the original mask is the whole image, noise points easily appear at other positions in the range of the whole image, and the value of the noise points is also larger than the threshold value; therefore, the detection result is adopted for range limitation on the result after linear summation, and the result outside the detection frame is forcibly set as the background, so that the noise interference of other areas is reduced;
adam is adopted as an optimizer for algorithm training, is a gradient descent optimizer, and compared with random gradient descent, a momentum mechanism is added, the gradients in all directions are balanced, and the optimization speed is higher; the tracking algorithm adopts SORT (simple Online and real tracking), the SORT is a high-speed associated matching algorithm, and the SORT only adopts Kalman filtering and Hungarian algorithm to match the recognition result without feature extraction through a neural network.
The whole network is trained on a server, and after the main network is pre-trained through the ImageNet classification data set, the whole model is trained through instance segmentation data.
Most of models trained by the server are difficult to directly run on edge equipment, the great reason is that resource limitations such as memory and computing power are far inferior to that of the server, and simultaneously, due to different architectures, part of programs cannot be compiled and run on the edge equipment directly, a neural network acceleration engine TensrT is selected, the reasoning speed is improved, the real-time running and the application of the algorithm to the edge equipment are facilitated, so that model quantization and model migration are required, the model quantization and model migration are used for applying the algorithm model to small-sized equipment, therefore, a Pythroch model is converted into TensrT engine, the subsequent reasoning is optimized, the TensrT cannot directly act on Pythch, the Algorithm model trained by Pythroch is required to be converted into a model of ONNX format supported by TensrT, and ONNX is accelerated by removing redundant operation nodes and combining part of operation nodes through ONNX2trt to generate the TensennrT, because the TensorRT does not fully support the operating nodes of PyTorch, the algorithm model is required to be split into a plurality of parts or the unsupported operating nodes are required to be rewritten in the conversion process.
The details of the model operation are as follows:
the method comprises the steps of firstly loading TensorRT into TensorRT engine for example segmentation reasoning, simultaneously constructing a multi-target tracker, sequentially carrying out image preprocessing (enabling the distribution and the size of images to be consistent with those of an example segmentation model during training) on each frame of image in an input video sequence, carrying out example segmentation model reasoning, carrying out post-processing, finally obtaining detection frames and masks of all targets, carrying out association matching according to the positions and the sizes of the detection frames, the aspect ratio and the like by using a multi-target tracking algorithm, obtaining a tracking result and updating the tracker.
In order to make better use of system resources; 3 threads are adopted to process each stage of data processing respectively, and video data reading and preprocessing, model reasoning and post-processing are processed respectively; the preprocessing comprises the zooming and normalization of images, and meanwhile, data are transferred from a memory to a video memory, so that the data can be conveniently and directly read from the video memory when a subsequent GPU processes the data; the model reasoning refers to the forward reasoning of the operation instance segmentation algorithm to obtain an original mask, a mask coefficient, and classification and regression output in detection; the post-processing is to restore the output result of the example segmentation algorithm to obtain an example mask and a detection result; the detection result is obtained by classifying and regressing the detection output, recovering the coordinates, filtering the low threshold value result and inhibiting the non-maximum value; the example mask is obtained by the linear combination, and the post-processing also comprises the processing of tracking tasks, which is used for allocating a specified ID to each detection result and associating the same person in different frames through the ID.
In the thread 1 (preprocessing), OpenCV reads a video frame of a local video or an external camera (a CSI interface or a USB interface), and simultaneously preprocesses an image, so that the mean value and the variance of image data distribution are consistent with those of a model during training, the size of the image resolution is simultaneously zoomed, the size of the image resolution directly influences the operation speed and the final precision of an algorithm, and the adjustment is carried out according to the calculation power of equipment; and simultaneously, transferring the image data from the memory to the video memory for the subsequent algorithm to directly read the data when running on the GPU.
And the example segmentation algorithm of the thread 2 (model inference) carries out forward inference on the preprocessed data, outputs the preprocessed data after algorithm processing and comprises an original mask, a mask coefficient, classification and regression output in detection, wherein the detection result and the mask are in one-to-one correspondence, the detection result is used for a subsequent tracking algorithm, the tracking algorithm allocates an identity ID to each object, and the mask is used for subsequent fuzzy processing.
Thread 3 (post-processing) is to perform post-processing on the result of the instance segmentation algorithm and allocate an ID to each person by using a multi-target tracking algorithm, and the post-processing flow comprises (i) coordinate conversion and converts the output position result into absolute coordinates; secondly, comparing the result with a threshold value, and filtering the result with low confidence coefficient; adopting non-maximum value to restrain, filtering out other coincidence results of a plurality of detection results of the same target; and fourthly, carrying out linear addition according to the original mask and the mask coefficient to obtain an example mask, obtaining a final detection result after post-processing, distributing the ID to each person according to the final detection result by a multi-target tracking algorithm, and updating the tracker.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (11)

1. A method for fuzzifying video privacy data running on an edge device is characterized in that: the method comprises the steps of algorithm model building and model operation, and comprises the following steps:
the method comprises the following steps: model initialization, including model building according to configuration files, model optimization acceleration on training and tracker initialization;
step two: acquiring a video sequence and importing the video sequence into an operation platform;
step three: performing feature extraction by adopting a lightweight network and an FPN structure;
step four: obtaining a detection result of the image and an example mask through an example segmentation algorithm;
step five: allocating an ID to each detected object through a multi-target tracking algorithm;
step six: whether the fuzzy processing is carried out on the object is controlled through the ID.
2. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: in the first step, the optimization acceleration method comprises the following steps: the PyTorch model is firstly converted into an intermediate format file ONNX, and then optimized and accelerated by TensorRT.
3. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: in the second step, the operation platform is an operation platform supporting C + + or Python or applied to NVIDIA Jetson equipment.
4. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: in the third step, the light-weight feature extraction network ShuffleNet V2 or MobileNet is selected as the feature extraction network.
5. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: in the fourth step, the example segmentation algorithm includes 2 subtasks, which are respectively used for target detection and mask generation.
6. The method of claim 5 for obfuscating video privacy data running on an edge device, wherein: the target detection means that all people are found in an input image through an algorithm, the result is that a people frame is selected and represented by a surrounding frame, and the output result of the target detection comprises classification and regression.
7. The method of claim 5 for obfuscating video privacy data running on an edge device, wherein: the mask generation mode is a mode of combining an original mask and a mask coefficient; wherein the original mask is independent of the specific person in the image, the mask coefficients are associated with the specific person, and each person generates a set of mask coefficients.
8. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: in the fifth step, the multi-target tracking algorithm is an SORT algorithm.
9. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: in the sixth step, the fuzzy processing is carried out on part of people in a targeted manner through the ID, and the ID can be changed at different time periods to change the fuzzy object so as to prevent the tracking algorithm from tracking the wrong object.
10. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: the video input can be processed by reading video from a file or directly reading camera video, and the video reading library is OpenCV.
11. The method of claim 1 for obfuscating video privacy data running on an edge device, wherein: and the TensorRT is used for quantizing, optimizing and accelerating the algorithm model, the deployment difficulty is reduced, and meanwhile, three threads are adopted to respectively process three parts of preprocessing, model reasoning and post-processing when the algorithm runs.
CN202110265858.0A 2021-03-11 2021-03-11 Video privacy data fuzzification method running on edge device Pending CN112927127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110265858.0A CN112927127A (en) 2021-03-11 2021-03-11 Video privacy data fuzzification method running on edge device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110265858.0A CN112927127A (en) 2021-03-11 2021-03-11 Video privacy data fuzzification method running on edge device

Publications (1)

Publication Number Publication Date
CN112927127A true CN112927127A (en) 2021-06-08

Family

ID=76172669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110265858.0A Pending CN112927127A (en) 2021-03-11 2021-03-11 Video privacy data fuzzification method running on edge device

Country Status (1)

Country Link
CN (1) CN112927127A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436232A (en) * 2021-06-29 2021-09-24 上海律信信息科技有限公司 Hardware acceleration method based on tracking algorithm
WO2023231704A1 (en) * 2022-05-31 2023-12-07 京东方科技集团股份有限公司 Algorithm running method, apparatus and device, and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130108105A1 (en) * 2011-10-31 2013-05-02 Electronics And Telecommunications Research Institute Apparatus and method for masking privacy region based on monitored video image
CN109993207A (en) * 2019-03-01 2019-07-09 华南理工大学 A kind of image method for secret protection and system based on target detection
CN110298296A (en) * 2019-06-26 2019-10-01 北京澎思智能科技有限公司 Face identification method applied to edge calculations equipment
US20200098096A1 (en) * 2018-09-24 2020-03-26 Movidius Ltd. Methods and apparatus to generate masked images based on selective privacy and/or location tracking
CN111209013A (en) * 2020-01-15 2020-05-29 深圳市守行智能科技有限公司 Efficient deep learning rear-end model deployment framework
CN111460926A (en) * 2020-03-16 2020-07-28 华中科技大学 Video pedestrian detection method fusing multi-target tracking clues
CN111968155A (en) * 2020-07-23 2020-11-20 天津大学 Target tracking method based on segmented target mask updating template
CN112184757A (en) * 2020-09-28 2021-01-05 浙江大华技术股份有限公司 Method and device for determining motion trail, storage medium and electronic device
CN112364744A (en) * 2020-11-03 2021-02-12 珠海市卓轩科技有限公司 TensorRT-based accelerated deep learning image recognition method, device and medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130108105A1 (en) * 2011-10-31 2013-05-02 Electronics And Telecommunications Research Institute Apparatus and method for masking privacy region based on monitored video image
US20200098096A1 (en) * 2018-09-24 2020-03-26 Movidius Ltd. Methods and apparatus to generate masked images based on selective privacy and/or location tracking
CN109993207A (en) * 2019-03-01 2019-07-09 华南理工大学 A kind of image method for secret protection and system based on target detection
CN110298296A (en) * 2019-06-26 2019-10-01 北京澎思智能科技有限公司 Face identification method applied to edge calculations equipment
CN111209013A (en) * 2020-01-15 2020-05-29 深圳市守行智能科技有限公司 Efficient deep learning rear-end model deployment framework
CN111460926A (en) * 2020-03-16 2020-07-28 华中科技大学 Video pedestrian detection method fusing multi-target tracking clues
CN111968155A (en) * 2020-07-23 2020-11-20 天津大学 Target tracking method based on segmented target mask updating template
CN112184757A (en) * 2020-09-28 2021-01-05 浙江大华技术股份有限公司 Method and device for determining motion trail, storage medium and electronic device
CN112364744A (en) * 2020-11-03 2021-02-12 珠海市卓轩科技有限公司 TensorRT-based accelerated deep learning image recognition method, device and medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RUBEN PANERO MARTINEZ ET AL: "Real-Time Instance Segmentation of Traffic Videos for Embedded Devices", SENSORS, vol. 21, no. 275, 3 January 2021 (2021-01-03), pages 1 - 19 *
刘嘉敏: "基于YOLCAT实例分割的路段监控系统研究与实现", 中国优秀硕士学位论文全文数据库(工程科技Ⅱ辑), no. 02, 15 February 2021 (2021-02-15), pages 034 - 662 *
牛德姣: "基于视频的目标跟踪及隐私保护技术的研究与实现", 中国优秀硕士学位论文全文数据库(信息科技辑), no. 01, 15 March 2004 (2004-03-15), pages 138 - 550 *
高韬 等: "基于交通视频序列的多运动目标跟踪算法", 中南大学学报(自然科学版), vol. 41, no. 03, 30 June 2010 (2010-06-30), pages 1028 - 1036 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436232A (en) * 2021-06-29 2021-09-24 上海律信信息科技有限公司 Hardware acceleration method based on tracking algorithm
WO2023231704A1 (en) * 2022-05-31 2023-12-07 京东方科技集团股份有限公司 Algorithm running method, apparatus and device, and storage medium

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
WO2021043112A1 (en) Image classification method and apparatus
US20180114071A1 (en) Method for analysing media content
CN112446270A (en) Training method of pedestrian re-identification network, and pedestrian re-identification method and device
CN112837344B (en) Target tracking method for generating twin network based on condition countermeasure
CN109753878B (en) Imaging identification method and system under severe weather
Chen et al. Corse-to-fine road extraction based on local Dirichlet mixture models and multiscale-high-order deep learning
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
Wang et al. Removing background interference for crowd counting via de-background detail convolutional network
Pavel et al. Recurrent convolutional neural networks for object-class segmentation of RGB-D video
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
CN112927127A (en) Video privacy data fuzzification method running on edge device
CN112464930A (en) Target detection network construction method, target detection method, device and storage medium
Teimouri et al. A real-time ball detection approach using convolutional neural networks
DE102022100360A1 (en) MACHINE LEARNING FRAMEWORK APPLIED IN A SEMI-SUPERVISED SETTING TO PERFORM INSTANCE TRACKING IN A SEQUENCE OF IMAGE FRAMES
CN114627269A (en) Virtual reality security protection monitoring platform based on degree of depth learning target detection
CN115829915A (en) Image quality detection method, electronic device, storage medium, and program product
Wang et al. Intrusion detection for high-speed railways based on unsupervised anomaly detection models
Sureshkumar et al. Deep learning framework for component identification
CN117079095A (en) Deep learning-based high-altitude parabolic detection method, system, medium and equipment
CN114283087A (en) Image denoising method and related equipment
WO2023069085A1 (en) Systems and methods for hand image synthesis
Muhamad et al. A comparative study using improved LSTM/GRU for human action recognition
CN109643390A (en) The method of object detection is carried out in digital picture and video using spike neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination