CN113743231B

CN113743231B - Video target detection avoidance system and method

Info

Publication number: CN113743231B
Application number: CN202110909116.7A
Authority: CN
Inventors: 陈晶; 汪欣欣; 何琨; 杜瑞颖; 康鹏昊; 吴宗儒; 张润航; 胡诗睿; 佘计思
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2024-02-20
Anticipated expiration: 2041-08-09
Also published as: CN113743231A

Abstract

The invention discloses a video target detection avoidance system and a video target detection avoidance method. The adaptive training module is used for detecting the object attached with the avoidance patch, extracting the number and the confidence coefficient of human body detection, and getting rid of the limitation on the types and the parameters of the actual detection model; the patch distance self-adaptive module is used for carrying out distance adaptive updating on the patch based on a threshold value specified by a user or preset by a system, so as to ensure the protection performance of the patch under different distances; the multiple loss function calculation module and the digital world patch fitting module are used for realizing clothes fold simulation of patches in the digital world, physical world color transformation, picture training loss constraint and the like, and guaranteeing robustness of transferring the patches to the physical world. The method is not aimed at a specific model, can be effective against different models, has good physical world robustness, and meets the privacy protection requirement of a user side.

Description

Video target detection avoidance system and method

Technical Field

The invention belongs to the technical field of privacy disclosure of anti-sample protection target detection in computer vision, relates to a video target detection avoidance system and method, and particularly relates to a universal human body target detection privacy protection system and method.

Background

The method has the advantages that the method is rapid in development in various fields such as behavior tracking, intelligent monitoring and the like, target detection and recognition are used as core technologies, convenience is provided for people, and meanwhile, huge security challenges and risks are brought to personal privacy. Users often use camouflage means such as glasses, hats and masks to avoid personal privacy disclosure, but these means cause inconvenience to the users when traveling, and cannot thoroughly avoid video target detection technically.

The main reason why the target detection and identification technology brings great risks to personal privacy safety is that the construction cost of a platform is extremely low. According to the investigation, the YOLOv3 human body target detection and identification model can run to 40FPS on a raspberry group 4b+ development board. This means that individuals or enterprises can build such human body target detection models only with extremely low cost (a camera, a raspberry group development board and model codes of opened sources), so as to acquire massive pedestrian data. As shown in fig. 1-2, this is a human target detection and feature extraction system, and the data obtained by the system not only includes the captured pedestrian image, but also includes personal privacy information covering detailed features such as pedestrian behavior, whereabouts, faces, clothes, etc., which are obtained after analysis and processing of a human target detection model.

In recent years, research on interference of target detection based on an antagonistic sample has become a hotspot in the field of academic research, but many problems still exist in existing research. The video target detection interference technology based on the antagonism sample can be used for disturbing the target detection network by generating a specific antagonism sample so as to protect the trace privacy of the user. However, most of the existing countermeasure samples are generated based on a single model of a white box, the real target detection model has complex structure, the countermeasure samples are difficult to meet the actual privacy protection requirement of users, and the disadvantages of difficult portability, easy detection, long-distance attack failure and the like exist; how to improve the interference capability of a sample to multiple target detection models, ensure the validity of the sample within the full distance range, improve the naturalness and portability of the sample, and is a challenge to be solved based on the current technology for detecting interference of the target on the video of the countersample.

Disclosure of Invention

In view of the drawbacks of the conventional privacy protection scheme and the safety and performance requirements of protecting the privacy characteristic information of the human body in the real physical world, the invention provides a human body target detection avoidance system and method with high universality, high distance adaptability and good semanteme in the physical world based on the multi-model generation substitution model.

The system of the invention adopts the technical proposal that: a video target detection avoidance system comprises a model adaptive gray box training module based on multiple models, a patch distance self-adaptive module based on a threshold value, a calculation module based on multiple loss functions and a digital world patch fitting module;

the model adaptive gray box training module based on multiple models (YOLO, SSD, fast RCNN, etc.) is used for detecting target pictures attached with avoidance patches, extracting the number of human body detections and confidence level

The patch distance self-adaptive module based on the threshold value is used for setting a user or a system threshold value, deciding a threshold distance and adaptively updating the patch distance;

the calculation module based on the multiple loss functions is used for training loss constraint, and comprises smoothness loss, pixel change loss, non-printable color loss and the like, wherein the smoothness loss and the pixel change loss are used for smoothing an image to keep picture semantic information;

the digital world patch attaching module is used for simulating clothes wrinkles and physical world color transformation, simulating the environmental change of each physical world, carrying out relevant transformation on the avoidance patch, and improving the robustness of the avoidance patch in the physical world.

The technical scheme adopted by the method is as follows: a video target detection evasion method comprises the following steps:

step 1: detecting target pictures attached with avoidance patches based on a model adaptive gray box training module of a plurality of models, and extracting the number and the confidence of human body detection; distributing the patch distance to a patch distance self-adaptive module;

step 2: threshold setting and threshold distance decision are carried out, and according to result parameters distributed by the system, the distance adaptability characteristic of the relevant area of the target patch to be updated is updated; the patch pictures after iteration updating are sent into a digital world patch attaching module to generate a new target detection data set attached with the patch pictures;

step 3: simulating clothes folds, carrying out physical world color transformation and training loss constraint, calculating to obtain various loss indexes of the patch picture in the training process, and storing and distributing the loss indexes to a patch distance self-adaptive module;

the clothing fold simulation system generates fold simulation distortion for the patch based on the built-in two-position fold data set, and stores the distorted patch for later use;

the physical world color transformation is carried out by using a multi-layer perceptron method to establish a mapping relation between the digital world colors and the physical world printable colors, and then fitting the patch picture colors into the physical world printable colors;

the training loss constraint is based on the output result of the substitution model and various loss functions (smoothness loss, pixel change loss, non-printable color loss and the like) generated in the model adaptive gray box training module, various loss indexes of the patch picture in the training process are calculated and obtained, and the various loss indexes are stored and distributed to the patch distance self-adaptive module.

Compared with the prior art, the invention has the advantages and positive effects mainly represented in the following aspects:

(1) The model adaptive gray box training method is provided, so that the universality of the generated avoidance patches in the physical world is improved, and the disturbance capacity of the patches on multiple models is improved; based on this, an avoidance patch that is effective against many detection models can be generated.

(2) The invention designs a distance self-adaptive patch generation algorithm, improves the self-adaptability of the countermeasure sample to the attack distance, ensures the effectiveness of the countermeasure patch within the whole distance range of 2-10m, and improves the capability of the countermeasure patch to the disturbance of the whole distance of the model; is suitable for the real physical world.

(3) The invention provides a semantic preservation mechanism based on smooth images, which reserves the semantics of the original pictures to a certain extent and improves the semantics and naturalness of samples. The color of the clothing pattern is not easy to be perceived by passers-by when wearing in the physical world.

Drawings

FIG. 1 is a system frame diagram of an embodiment of the present invention.

FIG. 2 is a schematic diagram of a model adaptive gray box training module based on multiple models in an embodiment of the present invention.

Fig. 3 is a schematic diagram of a threshold-based patch distance adaptation module in an embodiment of the present invention.

Fig. 4 is a schematic diagram of a calculation module based on a plurality of loss functions and a digital world patch fitting module according to an embodiment of the present invention.

FIG. 5 is a block diagram of an alternative Model1 for modeling real world detection in accordance with an embodiment of the present invention.

Fig. 6 is an application scenario diagram of an embodiment of the present invention.

Detailed Description

For the purpose of facilitating understanding and practicing the invention by those of ordinary skill in the art, reference will now be made in detail to the present invention, examples of which are illustrated in the accompanying drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to be limiting.

Referring to fig. 1, the video target detection avoidance system provided by the invention comprises a model adaptive gray box training module based on multiple models, a patch distance self-adaptive module based on threshold values, a calculation module based on multiple loss functions and a digital world patch fitting module.

The model adaptability ash box training module based on multiple models (comprising YOLO, SSD, fasterRCNN and the like) is used for detecting target pictures attached with avoidance patches (also called stealth clothes patches), and extracting the number and the confidence of human body detection; in the process, model types and parameters in actual detection are not needed, and only a preset substitute model in the system is needed;

the patch distance self-adaptive module based on the threshold value is used for setting a user or a system threshold value, deciding a threshold distance and adaptively updating the patch distance; in the process, a threshold value set by a user or preset by a system is used as a decision main body to decide to generate a patch updating area range, and the system performs image updating of the patch to ensure the protection performance of the patch under different distances;

the calculation module based on the multiple loss functions of the embodiment is used for training loss constraint, including smoothness loss, pixel change loss, non-printable color loss and the like, wherein the smoothness loss and the pixel change loss are used for smoothing an image to keep picture semantic information;

the digital world patch attaching module is used for simulating clothes wrinkles and physical world color transformation, simulating the environmental change of each physical world, and carrying out relevant transformation on the avoidance patch, so that the robustness of the avoidance patch in the physical world is improved.

The video target detection avoidance method provided by the embodiment comprises the following steps:

step 1: detecting a target picture attached with a stealth clothing patch based on a model adaptive gray box training module of a plurality of models, and extracting the number and the confidence of human body detection; distributing the patch distance to a patch distance self-adaptive module;

referring to fig. 2, the specific process of step 1 in this embodiment includes the following steps:

step A1: detecting a target stage of sticking a 'stealth coat' patch, determining a model detection sequence and times by a user or a system, generating a substitute model by using a model built in the system, and performing serial detection on a target picture stuck with the patch;

a1.1: before the system works, determining the detection sequence and the round epoch of the Model according to the specified or preset parameters of a user, generating a substitute Model1 for simulating real world detection, and storing the Model for training iteration in patch adaptability training;

referring to FIG. 5, the alternative Model1 of the present embodiment includes a Yolov2 layer, a Faster RCNN layer, a Yolov2/SDD layer, a Faster RCNN layer and a Faster RCNN layer connected in sequence;

in this embodiment, the system iterating process is to attach the evading patch to the human target picture through the digital world patch attaching module, obtain the confidence coefficient and the prediction frame through the model adaptive gray box training module based on multiple models, obtain multiple loss values through the loss function calculating module, and finally update the patch through the patch distance self-adapting module. The user can specify the iteration times or iterate by using the built-in iteration times of the system, and finally, the trained avoidance patches are output.

A1.2: the system distributes the target picture (batch) with the 'stealth clothing' patch generated in the patch attaching module to the patch distance self-adapting module, and uses the alternative Model1 generated by A1.1 to perform serial identification detection.

Step A2: the human body detection number and confidence level stage is extracted, after detection, the system acquires the human body target detection number and confidence level of a target picture output by the substitution model, and distributes the extracted result to an optimizer arranged in the module;

a2.1: the system extracts the prediction confidence and the number of prediction frames of the target picture which is output by the substitution model and is pasted with the stealth clothing patch through the detection of the generated substitution model,

a2.2: and after the result obtained through the substitution model is collected and formatted, the system distributes the result to the patch distance self-adaptive module, and the patch picture is actually updated according to the result parameters.

Assuming that each batch (co-batch of pictures), the result after the formatting process is: the confidence weighting sum probe and the number of prediction frames pred_num respectively represent the confidence weighting sum of all the boundary frames of each picture, the confidence of which is larger than a set threshold value conftresh and the prediction category is the weighting of the human, and the number of all the boundary frames of each picture, the prediction category is the human and the prediction is correct.

Step 2: setting a threshold value and deciding a threshold distance, and updating the distance adaptability characteristic of the relevant area of the target patch to be updated according to the result parameters distributed by the system in A2.2; the patch pictures after iteration updating are sent into a digital world patch attaching module to generate a new target detection data set attached with the patch pictures;

please refer to fig. 3, the specific process of step 2 in this embodiment includes the following steps:

step B1: threshold setting, wherein the user main body decides to autonomously set distance threshold parameters or adopts a built-in preset threshold S _thtes By the system submitting to the distance adaptation module, when receiving the result generated by the surrogate model distributed by the system, the distance adaptation module is based on the previously submitted threshold S _thres Running a module function;

step B1.1: before the system operates, the user main body decides to autonomously set a distance threshold or adopts a built-in preset threshold for deciding a target patch image updating range;

step B1.2: the system performs legal inspection on the distance threshold parameters submitted by the user, distributes the distance threshold parameters to the patch distance self-adaptive module after passing the verification, and pre-stores the distance threshold;

step B1.3: after receiving the generation result of the substitution model distributed by the system, the patch distance self-adaptive module extracts a pre-stored distance threshold value and a target patch distributed by the system and executes the corresponding module function;

step B2: and (3) threshold distance decision, carrying out length and width normalization on the patch picture to be updated, which is input, by the patch distance self-adaptive module, and determining the patch picture updating range based on a previously set threshold value. It should be noted that since the patch picture is square, the update scope is also a scaled square area;

step B2.1: after the patch distance self-adaptive module receives the distributed target patch to be updated, the module normalizes the length and width of the patch picture, so that the patch updating range can be conveniently and subsequently determined;

if the size ratio of the picture identification frame attached with the patch to the patch is smaller than the threshold value, the system decides that the patch is in a long-distance scene at the moment, and decides that the patch updating range is the full-image;

if the size ratio of the picture identification frame attached with the patch to the patch exceeds a threshold value, the system decides that the patch is in a short-distance scene at the moment, decides an updating area as a picture center, and decides the updating range ratio of the patch through the system;

step B2.2: after the system makes a decision based on a threshold value, determining an anchor point and updating a relevant area by an anchor patch;

step B2.3: the system transmits the information of the patch updating area after anchoring to the patch distance self-adaptive module, and the patch is updated with the distance adaptive characteristics;

step B3: patch distance adaptive updating, when a patch distance adaptive module obtains the range of a patch updating area to be updated based on a distance threshold value, calculating a corresponding mask M, and updating the distance adaptive characteristics of the relevant area of the target patch to be updated according to the result parameters distributed by the system in A2.2;

step B3.1: after the patch distance self-adaptive module obtains the patch update area range to be updated, the result parameter in A2.2 and the index in the loss function calculation module are requested to the system for patch update;

step B3.2: when the patch distance self-adaptive module obtains result parameters and calculation index pushing, updating the characteristics of the corresponding area of the patch, wherein the characteristics comprise patterns, textures, colors and the like;

step B3.3: the patch pictures after iterative updating are sent into a digital world patch attaching module to generate a new target detection data set attached with the patch pictures;

the L2 regularized weight attenuation method is optimized in updating optimization, and the problem of model overfitting is reduced to a certain extent.

the method comprises the steps of simulating clothes wrinkles, generating wrinkles to simulate distortion for patches based on a built-in two-position wrinkles data set, and storing the distorted patches for later use;

the physical world color transformation, using a multi-layer perceptron method, establishing a mapping relation between the digital world colors and the physical world printable colors, and then fitting the patch picture colors to the physical world printable colors;

and training loss constraint, calculating various loss indexes of the patch picture in the training process based on the generated output result of the substitution model and various loss functions (including smoothness loss, pixel change loss, non-printable color loss and the like), and saving and distributing the loss indexes to a patch self-adaptive updating module.

Please refer to fig. 4, the specific process of step 3 in this embodiment includes the following steps:

step C1: the method comprises the steps of simulating clothes wrinkles, generating wrinkles to simulate distortion for patches based on a built-in two-position wrinkles data set, and storing the distorted patches for later use;

step C1.1: data set data of two-dimensional anchor point distortion under different human states and built in garment fold simulation function loading system _tps And patch picture patch after color conversion _cnv To call the distortion function f _tps Realizing patch picture distortion and providing data support;

step C1.2: garment fold simulation function loading and loading designed distortion function f _tps Performing two-dimensional image distortion on the target patch picture, and simulating the appearance characteristics of the patch worn on a human body;

step C1.3: the clothes fold simulation function stores and distributes the distorted patches to a laminator in the digital world patch lamination module, and the patch pictures are laminated to the digital world human body to form a data set data of the model adaptive gray box training module;

step C2: the physical world color transformation, using a multi-layer perceptron method, establishing a mapping relation f between the digital world colors and the physical world printable colors, and then fitting the patch picture colors to the physical world printable colors;

step C2.1: before the system works, the color conversion function in the digital world patch attaching module is used for converting the digital world color _digital And built-in physical world color _physical Loading three layers of full-connection BP networks to generate color fitting of the physical world and the digital world;

step C2.2:color conversion reading patch picture patch to be color converted _origin Transforming patch picture colors to generate patch by the generated color fitting _cnv Pushing the clothes fold simulation functional part to the digital world patch attaching module;

step C3: training loss constraint, calculating various loss indexes loss of the patch picture in the training process based on the generated output result of the substitution model and various loss functions f (x), and storing and distributing the loss indexes loss to a patch self-adaptive updating module;

step C3.1: the system transmits the result parameters of the model adaptive gray box training module to the loss function calculation module;

step C3.2: the loss function calculation module calculates and obtains various loss indexes loss in patch adaptability training by loading the input result parameters and calling a plurality of loss functions f (x);

step C3.3: the loss function calculation module processes each loss index obtained through calculation and then transmits the processed loss indexes to the patch distance self-adaptive module for updating patch characteristics.

Please refer to fig. 6, which is a view of a stealth garment application scenario according to an embodiment of the present invention. When facing an illegal person detection camera, the individual user who does not wear the "stealth clothing" printed with the avoidance patch is detected by the illegal person detection camera (as shown in the figure, and the individual user who wears the "stealth clothing" is not detected (as shown in the figure, not shown in the figure), and in the simulated illegal person detection camera picture, the user who wears the "stealth clothing" is on the left, and the user who does not have the "stealth clothing" is on the right.

The invention has the advantages that:

1. the model adaptive gray box training method is adopted in the scheme, so that the universality of the countermeasure patches in the physical world is improved, the physical world interference success rate on a plurality of target detection models can reach more than 50%, the transferability of the countermeasure patches is improved, and the privacy protection performance of the countermeasure samples when the main stream recognition models are faced to the practical application is remarkably improved. Applicable to real physical world applications, please see fig. 5;

2. threshold-based patch distance adaptive update mechanism: setting a patch updating threshold value by a user as a decision maker, and carrying out targeted feature updating on a central area or the whole patch by using a distance self-adaptive module as an updating operation main body according to the threshold value by the system;

3. based on a semantic retention mechanism of the smooth image, the semanteme and naturalness of the countermeasure sample are improved. By designing the semantic loss function based on the initial image change, the semantics of the initial image in the challenge sample can be effectively reserved in the training process, and the naturalness of the challenge sample in the physical world is improved.

The invention can effectively protect the user from automatic identification of the target identification detection model, and prevent the target detection extraction technology from illegally acquiring, storing and using the sensitive information of the user. In military, along with the continuous development of modern unmanned warfare, the intelligent stealth clothing can effectively avoid the detection and locking of unmanned weapons to human targets, and preempt the unmanned warfare. In the future, the intelligent stealth clothing based on the countermeasure patches has wide application prospect, and can bring great benefit for civil, commercial and military scenes and be available in the future.

The invention can provide a reliable and convenient sensitive information protection method for users in more fields of civil use, military use and the like.

It should be understood that parts of the specification not specifically set forth herein are all prior art.

It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims

1. The method for detecting and avoiding the video target is characterized by comprising the following steps of:

step 1: the method comprises the steps of detecting target pictures attached with avoidance patches based on a model adaptability gray box training module of a plurality of models, and extracting the number and the confidence of human body detection; distributing the patch distance to a patch distance self-adaptive module;

the clothing fold simulation system generates fold simulation distortion for the patch based on the built-in two-dimensional fold data set, and stores the distorted patch for later use;

the physical world color transformation uses three layers of full-connection BP network to establish a mapping relation between digital world colors and physical world printable colors, and then the fitting of the physical world printable colors is carried out on patch picture colors;

the training loss constraint is based on a substitute model output result generated by the model adaptive gray box training module and various loss functions, including smoothness loss, pixel change loss and non-printable color loss, various loss indexes of the patch picture in the training process are calculated and obtained, and are stored and distributed to the patch adaptive updating module, wherein the smoothness loss and the pixel change loss are semantic retention mechanisms of the smooth image;

the system iteration process is that an avoidance patch is attached to a human body target picture through a digital world patch attaching module, confidence level and a prediction frame are obtained through a model adaptive gray box training module based on multiple models, multiple loss values are obtained through a loss function calculating module, and patch updating is performed through a patch distance self-adaptive module; the user can specify the iteration times or iterate by using the built-in iteration times of the system, and finally, the trained avoidance patches are output.

2. The video object detection avoidance method of claim 1, wherein the specific implementation of step 1 comprises the sub-steps of:

step 1.1: before the system works, determining the detection sequence and the turn of the model according to the user specified or preset parameters, and generating a substitute model for training based on multiple modelsAnd saving the model for iteration in patch adaptation training;

the substitution modelComprises a YOLOv2 layer, a Faster RCNN layer, a YOLOv2/SDD layer, a Faster RCNN layer and a Faster RCNN layer which are connected in sequence;

step 1.2: the system distributes the target pictures with the avoidance patches generated in the patch attaching module to the adaptive training module, and uses the substitution model generated in the step 1.1 to perform serial identification detection, so as to obtain the human body target detection number and the prediction confidence of the target pictures output by the substitution model;

if m target pictures to which avoidance patches are attached in each batch are provided, the prediction confidence is the weighted sum of the confidence of the boundary frames of which the confidence of each picture exceeds a set threshold and the prediction category is a person; the number of human body target detection is the number of boundary frames with the correct prediction and the human body target detection is the number of prediction categories in each picture;

step 1.3: the system collects and combines the results obtained by the substitution model, distributes the results to the patch distance self-adaptive module, and carries out actual updating on the patch picture according to the result parameters.

3. The video object detection avoidance method of claim 1, wherein the specific implementation of step 2 comprises the sub-steps of:

step 2.1: setting a threshold value;

the user main body decides to autonomously set a distance threshold parameter or adopts a built-in preset threshold, the parameter is submitted to a patch distance self-adaptive module through the system, and when a result generated by a substitution model distributed by the system is received, the patch distance self-adaptive module operates a module function based on the previously submitted threshold parameter;

step 2.2: threshold distance decision;

the patch distance self-adaptive module performs length and width normalization on the input patch picture to be updated, and determines the patch picture updating range based on a previously set threshold value;

step 2.3: patch distance adaptive updating;

and after the patch distance self-adaptive module obtains the range of the patch updating area to be updated based on the distance threshold, the patch distance self-adaptive module updates the distance adaptability characteristic of the relevant area of the target patch to be updated according to the result parameters distributed by the system.

4. The video object detection avoidance method of claim 3, wherein: in step 2.1, before the system operates, the user main body decides to autonomously set a distance threshold or adopts a built-in preset threshold for deciding a target patch image updating range; the system performs legal inspection on the distance threshold parameters submitted by the user, distributes the distance threshold parameters to the patch distance self-adaptive module after passing the verification, and pre-stores the distance threshold; and after receiving the generation result of the substitution model distributed by the system, the patch distance self-adaptive module extracts a pre-stored distance threshold value and a target patch distributed by the system and executes the corresponding module function.

5. The video object detection avoidance method of claim 3, wherein: in step 2.2, after the distance self-adaptive module receives the distributed target patch to be updated, the module normalizes the length and width of the patch picture, so that the patch updating range can be conveniently and subsequently determined; if the size ratio of the picture identification frame attached with the patch to the patch is smaller than the threshold value, the system decides that the patch is in a long-distance scene at the moment, and decides that the patch updating range is the full-image; if the size ratio of the picture identification frame attached with the patch to the patch exceeds a threshold value, the system decides that the patch is in a short-distance scene at the moment, decides an updating area as a picture center, and decides the updating range ratio of the patch through the system; after the system makes a decision based on a threshold value, the anchor patch updates the relevant area; the system transmits the anchored patch update area information to a patch distance self-adapting module, and the patch distance self-adapting module updates the distance adaptive characteristics of the patch.

6. The video object detection avoidance method of claim 3, wherein: in step 2.3, after the patch distance adaptive module obtains the patch update area range to be updated, the result parameters and the indexes in the loss function calculation module are requested to the system for patch update; when the patch distance self-adaptive module obtains result parameters and calculation index pushing, updating the characteristics of the corresponding area of the patch, wherein the characteristics comprise patterns, textures and colors; the patch pictures after the iteration update are sent into a digital world patch attaching module to generate a new target detection data set attached with the patch pictures.

7. The video object detection avoidance method of claim 1, wherein the garment fold simulation function in step 3 is specifically implemented by the following sub-steps:

step 3.1.1: the garment fold simulation function loads a two-dimensional anchor point distorted data set and a patch picture subjected to color transformation under different human states built in the system, and provides data support for invoking a TPS distortion function to realize patch picture distortion;

step 3.1.2: the garment fold simulation function loads a designed TPS distortion function, performs two-dimensional image distortion on a target patch picture, and simulates the appearance characteristics of the patch when the patch is worn on a human body;

step 3.1.3: and the clothes fold simulation function stores and distributes the distorted patches to a bonder in the digital world patch bonding module, bonds patch pictures to a digital world human body, and forms a data set of the model adaptive gray box training module.

8. The video object detection circumvention method according to claim 1, wherein said physical world color transform function in step 3 is specifically implemented by the following sub-steps:

step 3.2.1: before the system works, a color conversion function in the digital world patch attaching module loads the digital world color and the built-in physical world color into a three-layer full-connection BP network to generate color fitting of the physical world and the digital world;

step 3.2.2: the color conversion function reads the patch picture to be color-converted, converts the color of the patch picture through the generated color fitting, and pushes the patch picture to the clothes fold simulation function part in the digital world patch fitting module.

9. The method for avoiding video object detection according to any one of claims 1 to 8, wherein the training loss constraint in step 3 is implemented by the following sub-steps:

step 3.3.1: the system transmits the result parameters of the model adaptive gray box training module to the loss function calculation module;

step 3.3.2: the loss function calculation module calculates and obtains various loss indexes in patch adaptability training by loading the input result parameters and calling a plurality of loss functions;

step 3.3.3: the loss function calculation module processes each loss index obtained through calculation and then transmits the processed loss indexes to the patch distance self-adaptive module for updating patch characteristics.

10. A video object detection avoidance system employing the method of any of claims 1 to 9; the method is characterized in that: the system comprises a model adaptive gray box training module based on multiple models, a patch distance self-adaptive module based on threshold values, a calculation module based on multiple loss functions and a digital world patch fitting module;

the model adaptability gray box training module based on the multiple models is used for generating a training substitution model for detecting human body target pictures attached with avoidance patches, and extracting the number and the confidence of human body detection; the multiple modes include YOLO, SSD, and FasterRCNN;

the multi-term loss function-based calculation module is used for calculating losses, including smoothness losses, pixel change losses and non-printable color losses, wherein the smoothness losses and the pixel change losses are used for smoothing images to preserve picture semantic information;

the digital world patch fitting module is used for simulating clothes wrinkles and physical world color fitting, simulating the environmental change of each physical world, carrying out relevant transformation on the avoidance patches, and improving the robustness of the avoidance patches in the physical world.