CN112949849B

CN112949849B - Method and device for optimizing intelligent camera detection model by adopting edge calculation

Info

Publication number: CN112949849B
Application number: CN202110122087.XA
Authority: CN
Inventors: 陈辉; 龙祥; 张智; 熊章; 雷奇文; 艾伟; 胡国湖
Original assignee: Wuhan Xingxun Intelligent Technology Co ltd
Current assignee: Wuhan Xingxun Intelligent Technology Co ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2024-03-26
Anticipated expiration: 2041-01-27
Also published as: CN112949849A

Abstract

The invention belongs to the technical field of image processing, solves the technical problem that a uniformly updated model has low reliability due to different characteristics of various application scenes, and provides a method and a device for optimizing an intelligent camera detection model by adopting edge calculation. Training a current model through a first sample set preset in a camera and a second sample set acquired locally and in real time by the camera, and outputting a verification model; and comparing the verification model with the current model to screen out a target model with good effect. The invention also comprises a device for executing the method. According to the invention, model optimization is carried out locally by combining actual data of a local target scene with preset data, so that the detection precision and the data safety of the camera in an actual application scene are improved.

Description

Method and device for optimizing intelligent camera detection model by adopting edge calculation

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for optimizing an intelligent camera detection model by adopting edge calculation.

Background

With the wide application of video monitoring equipment to monitor different scenes in daily life, cameras, particularly intelligent cameras with image analysis functions, become a common monitoring tool in current life monitoring management. The intelligent camera is provided with a chip with a processor function besides the conventional camera shooting, and can perform simple processing analysis on shot scene image data.

However, in the prior art, since the chip of the intelligent camera has a computing capability, in order to enable the intelligent camera to adapt to multi-scene applications, the common practice is that the intelligent camera can collect as many sample images of a plurality of different scenes as possible before leaving the factory to train out a lightweight network model to be deployed on the intelligent camera, so that the intelligent camera can be suitable for different scenes, for example: infant care management, hospital patient monitoring, school monitoring, mall monitoring, garage monitoring, scenic spot monitoring, farm monitoring, road traffic monitoring and the like.

Although the intelligent camera collects a plurality of samples of different application scenes in advance, a large number of samples are trained on each scene, when the intelligent camera is directly used in the scene to carry out monitoring, large differences still exist on actual scenes where different users are located, so that a basic detection model obtained by carrying out a large number of training before the intelligent camera leaves a factory is not applicable and large deviation easily appears. However, with personal privacy and information security concerns, it is not desirable to connect the smart camera to the server, and other ways such as updating the update package of the basic detection model can increase the hardware cost of the smart camera (because the existing smart camera generally does not have an interactive display interface such as a touch screen, if interactive hardware identification such as a touch screen is provided, the cost of the smart camera is increased) and the upgrade maintenance time.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a method and a device for optimizing an intelligent camera detection model by adopting edge calculation, which are used for solving the technical problem that the reliability of a uniformly updated model is low due to different characteristics of various application scenes.

The technical scheme adopted by the invention is as follows:

the invention provides a method for optimizing an intelligent camera detection model by adopting edge calculation, which comprises the following steps:

s10: acquiring a preset first sample set and a second sample set corresponding to local image data acquired by an intelligent camera;

s11: performing fine tuning training on the current model by adopting the first sample set and the second sample set to obtain a verification model;

s12: and comparing the current model with the verification model, and outputting a target model.

Preferably, the S10 includes:

s101: collecting a local real-time image of an actual application scene;

s102: detecting the local real-time image by using a current model, and outputting a detection result corresponding to the local real-time image;

s103: performing binarization processing on a detection result corresponding to the local real-time image, and taking the processing result as a pseudo tag of the local real-time image;

S104: the second sample set is composed of each of the local real-time images and each of the pseudo tags.

Preferably, the S103 includes:

s1031: acquiring confidence coefficient and confidence coefficient threshold value of each positive sample in the detection result of each local real-time image;

s1032: comparing the confidence coefficient with the confidence coefficient threshold value, resetting the confidence coefficient according to the comparison result, and outputting the processed detection result as a pseudo tag of the local real-time image;

and if the confidence coefficient is larger than or equal to the confidence coefficient threshold value, setting the confidence coefficient to be 1, and if the confidence coefficient is smaller than the confidence coefficient threshold value, setting the confidence coefficient to be 0.

Preferably, the S11 includes:

s111: acquiring a first loss function, a second loss function and reliable weights corresponding to positive samples;

s112: combining the first sample set and the second sample set, and outputting a third sample set;

s113: performing first-stage training on the current model by using the third sample set through a first loss function, and outputting an intermediate training model;

and S114, training the second stage by using the third sample set through a second loss function and the reliable weight on the basis of the intermediate training model, and outputting the verification model.

Preferably, the S12 includes:

s121: acquiring a first detection result corresponding to the detection of the current model on the local image;

s122: obtaining a second detection result corresponding to the detection of the local image by the verification model;

s123: and comparing the first detection result with the second detection result, and outputting a target model.

Preferably, in S11, the method further includes performing data enhancement on the first sample set and/or the second sample set, where the data enhancement includes at least one of: image rotation, mosaic material parameters, color gamut variation and image cropping.

Preferably, when data enhancement is performed by adopting image rotation, mosaic material parameters, color gamut variation and image cropping at the same time, sequencing the processing, firstly performing image rotation, then performing image cropping, then performing color gamut conversion, and finally performing mosaic processing according to the mosaic material parameters; outputting the first sample set and the second sample set subjected to data enhancement.

The invention also provides a device for optimizing the intelligent camera detection model by adopting edge calculation, which comprises:

and a data acquisition module: the method comprises the steps of acquiring a preset first sample set and a local real-time acquired second sample set;

Model training module: the method comprises the steps of performing fine tuning training on a current model by adopting the first sample set and the second sample set to obtain a verification model;

and a model checking module: and the target model is output by comparing the current model with the verification model.

The invention also provides an electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any of the above.

The invention also provides a medium having stored thereon computer program instructions which when executed by a processor implement a method as claimed in any one of the preceding claims.

In summary, the beneficial effects of the invention are as follows:

according to the method and the device for optimizing the intelligent camera detection model by adopting edge calculation, a first sample set preset in a camera and a second sample set corresponding to local image data acquired by the camera are obtained, the first sample set and the second sample set are utilized to locally perform fine tuning training on a currently used model, update of the current model is completed, and a verification model is output; comparing the detection effect of the local image data of the check model and the current model, and outputting a target model; training is carried out by combining a local real-time image of an actual application scene with a preset first sample set locally, so that the fit degree of a target model and the actual application scene is improved, and the detection accuracy is improved; meanwhile, the safety of data can be improved by adopting the local training at the equipment end.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described, and it is within the scope of the present invention to obtain other drawings according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a method for optimizing a smart camera detection model by edge calculation in embodiment 1 of the present invention;

fig. 2 is a schematic flow chart of a second sample set obtained in example 1 according to the first embodiment of the present invention;

fig. 3 is a schematic flow chart of adding a pseudo tag to a sample in embodiment 1 of the present invention;

FIG. 4 is a flow chart of the model fine tuning training using the third sample set in example 1 according to the first embodiment of the present invention;

fig. 5 is a schematic flow chart of obtaining a target model in example 1 according to the first embodiment of the present invention;

fig. 6 is a flowchart illustrating a process of obtaining a basic model of a camera in embodiment 1 of the present invention;

fig. 7 is a schematic flow chart of obtaining a target confidence threshold in example 2 according to the first embodiment of the present invention;

Fig. 8 is a schematic flow chart of outputting a target confidence threshold in example 2 according to the first embodiment of the present invention;

fig. 9 is a flowchart illustrating a process for obtaining a target mAP value in embodiment 2 according to the first embodiment of the present invention;

fig. 10 is a schematic diagram of the structure of the mAP value curve in example 2 according to the first embodiment of the present invention;

fig. 11 is a flowchart illustrating the acquisition of the mAP value of the confidence threshold in example 2 according to the first embodiment of the present invention;

fig. 12 is a flowchart illustrating the calculation of the mAP value in example 2 according to the first embodiment of the present invention;

FIG. 13 is a flow chart of the output verification model in embodiment 3 according to the first embodiment of the present invention;

fig. 14 is a flow chart of the reliable weights of each positive sample in example 3 according to the first embodiment of the present invention;

fig. 15 is a schematic structural diagram of a network output layer in embodiment 3 of the present invention;

fig. 16 is a schematic flow chart of obtaining a high reliability weight and a low reliability weight in embodiment 3 of the first embodiment of the present invention;

fig. 17 is a schematic flow chart of a verification model corresponding to an artificial tag in embodiment 3 of the present invention;

fig. 18 is a schematic flow chart of a verification model corresponding to a pseudo tag in embodiment 3 of the present invention;

Fig. 19 is a schematic structural diagram of a device for continuously optimizing camera effects in embodiment 4 of the second embodiment of the present invention;

fig. 20 is a block diagram of a device for selecting confidence level thresholds of samples of an intelligent camera according to embodiment 5 of the present invention;

fig. 21 is a schematic structural diagram of a device for self-training of a smart camera model in embodiment 6 of the present invention;

fig. 22 is a schematic structural diagram of an electronic device in a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present invention, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element. If not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other, which are all within the protection scope of the present invention.

Embodiment one

Example 1

Referring to fig. 1, fig. 1 is a flow chart of a method for optimizing a smart camera detection model by adopting edge calculation in embodiment 1 of the present invention. The method for optimizing the intelligent camera detection model by adopting edge calculation in the embodiment 1 of the invention comprises the following steps:

s10: acquiring a preset first sample set and a local real-time acquired second sample set;

specifically, a basic detection model which can be used in a plurality of application scenes is built in the intelligent camera, and the basic detection model is obtained by training a training set A formed by collected training samples of the plurality of application scenes; the scene includes at least one of: infant care, schools, amusement parks, malls, garages, farms and roads; obtaining a sample corresponding to a target scene through a specific channel, wherein the sample at least comprises one of the following: images, sounds, and videos; the particular channel includes at least one of: open source data sets, setting up scenes and markets; wherein, the sample set A for training the basic detection model needs to be manually checked, and an artificial label is added; randomly extracting a plurality of samples from the training set A as a first sample set; the samples of the second sample set are local images of the actual application scene acquired by the intelligent camera, and the labels of the second sample set are set to be pseudo labels.

s12: comparing the current model with the verification model, and outputting a target model;

specifically, before the intelligent camera is marketed, the training set A is randomly trained, a basic detection model is output, the basic detection model is implanted into the intelligent camera, and after the intelligent camera is installed in a target area by a user, the intelligent camera is initially used, and the basic detection model is utilized to perform data processing on local images acquired by the intelligent camera, wherein the data processing comprises detection task processing and/or classification task processing; at this time, the current model is taken as a basic detection model, in the using process, the current model is subjected to fine tuning training locally through the first sample set and the second sample set to obtain a verification model of the current training, the verification model and the current model are compared with a processing result of the second sample set, a target model is output according to the comparison result, and the target model carries out data processing on the local image acquired by the intelligent camera before the next fine tuning training.

It should be noted that: the first sample set and the second sample set are used for carrying out fine tuning training on the current model locally, the intelligent camera is used for carrying out local training, a server is not required to be uploaded for user data, leakage of the local data is prevented, and data security is improved; meanwhile, the local images are adopted for training, so that the fitting degree of the target model and the installation scene of the intelligent camera can be improved, and the accuracy of data processing of the intelligent camera is ensured.

In one embodiment, referring to fig. 2, the step S10 includes:

s101: collecting a local real-time image of an actual application scene;

specifically, after the intelligent camera is installed in a specific application scene, the intelligent camera starts to monitor the actual application scene, and a real-time image of the actual application scene is acquired.

specifically, the collected local real-time images are sent to a current model, the current model outputs detection results corresponding to the local images, and the detection results at least comprise one of the following: confidence, category information, location information; and according to the detection result, the recall rate and the accuracy rate corresponding to each confidence coefficient threshold value can be calculated.

Classification tasks for category information, in an application embodiment: the method comprises the steps that local real-time images acquired through an intelligent camera are analyzed through a current model to obtain categories of the images; and adding a corresponding category label according to the category to which each image belongs.

For a detection task, in an application embodiment, searching a positive sample corresponding to a first class … and/or a second class … and/or an N class in a local real-time image of a monitoring target area shot by an intelligent camera, processing the local real-time image by a current model, outputting a positive sample required by a compound class in the local real-time image, and outputting a corresponding detection result; the detection result at least comprises one of the following: location information, confidence, category information, etc.; wherein the category information includes at least one of: a person, an animal, a movable object, a fixed object attached to a movable object, etc.

It should be noted that:

for a classification task, a positive sample is a sample corresponding to a target class;

for a detection task, the positive sample is a target corresponding to all positions marked in the sample in advance.

specifically, performing binarization processing on the confidence coefficient of each positive sample according to a confidence coefficient threshold value, and taking the processed detection result as a pseudo tag of a local real-time image;

referring to fig. 3, the step S103 includes:

specifically, the intelligent camera inputs the collected local real-time image into the current model and outputs the detection result of the real-time image; such as: the current model judges one target in the local real-time image as a person, and the confidence coefficient is 0.6; that is, the target is "human" with a confidence level of 0.6, i.e., a confidence level of 0.6; simultaneously acquiring a confidence coefficient threshold value, wherein the confidence coefficient is used for representing the confidence coefficient of the detection result, and the confidence coefficient threshold value is used for judging the true or false of the detection result; if the confidence level is 0.6, the confidence threshold is 0.7, which indicates that the detection result is not feasible. Specifically, a sampling step length of the confidence threshold value can be set, a group of confidence threshold values are obtained according to the sampling step length, the mAP values corresponding to the confidence threshold values are compared, and the final confidence threshold value is determined.

Specifically, according to the confidence coefficient threshold value, performing binarization processing on the confidence coefficient in the detection result, if the confidence coefficient threshold value is 0.7 when the target object is judged to be 'person', if the confidence coefficient is less than 0.7, setting the confidence coefficient of the target object judged to be 'person' as '0', and updating the detection result of the positive sample; if the confidence coefficient is greater than or equal to 0.7, the confidence coefficient of the current target judged as 'person' is set as '1', and the updated detection result is made and stored as a pseudo tag of the real-time image.

In one embodiment, referring to fig. 4, the step S11 includes:

Specifically, the first loss function is used for training in the first stage, calculating the loss value of each sample, thus finishing the fine-tuning training of the model, and the second loss function is used for training in the second stage, calculating the loss value of each sample, and finishing the final training of the model; taking the yolov3 loss function as an example, the first loss function includes;

coordinate loss:

category loss:

confidence loss:

wherein L is _coord ,L _cls ,L _conf Respectively representing a coordinate error, a category error and a confidence error,whether the jth candidate frame of the ith grid in the application scene data acquired by the intelligent camera is a positive sample of a data processing task or not is indicated, and if yes +.>Otherwise, 0.

Obtain a first loss function = L _coord +L _cls +L _conf 。

The second loss function is:

coordinate loss:

category loss:

confidence loss:

weight channel loss:

the method comprises the following steps of:

and, a second supervision function: l (L) _w ＝max(0,δ-(w _H -w _L ))

Wherein L is _coord ,L _cls ,L _conf Respectively representing coordinate error, class error and confidence error, L _w And L' _w Representing the reliable weight error loss for the target in the second sample set and the target in the first sample set respectively,indicating whether a j candidate frame of an i-th grid in application scene data acquired by an intelligent camera is a positive sample, if yes +. >Otherwise, 0; w (w) _ij The reliable weight of positive samples of the jth candidate frame of the ith grid is represented, x and y respectively represent the abscissa and the ordinate of the center point of the selection frame, w and n respectively represent the width and the height of the selection frame, C represents the category, P represents the confidence level, S is the number of grids of the network output layer, B is a fixed reference frame, lambda is a constant, and delta is a weight difference preset value.

specifically, a first sample set preset in the intelligent camera and a second sample set corresponding to the local real-time image acquired by the intelligent camera are combined to obtain a training sample set, and the training sample set is recorded as a third sample set.

specifically, the loss value of each positive sample is calculated by using the first loss function, then the current model is finely tuned, so that each sample in the third sample set realizes network convergence, and an updated model is obtained and recorded as an intermediate training model. The first-stage training is performed on the detection model by the first loss function and the third sample set, the real detection result of each sample is based on the detection model, the loss value calculated by the real detection result is subjected to fine-tuning training, and no external intervention is added, so that the first-stage training can very naturally complete the convergence of the model, and the stability of the current model in the subsequent training process is ensured.

S114: and on the basis of the intermediate training model, performing second-stage training by using the third sample set through a second loss function and the reliable weight, and outputting the verification model.

Specifically, on the basis of the intermediate training model, the second stage training is continued by the second loss function and the reliable weight corresponding to each positive sample in the third sample set, and finally a verification model is output; the training of the model is divided into a plurality of rounds of training, so that after each round of training is completed, the current model is updated once; when the next training is carried out, outputting new reliable weights for each sample in the third sample set by the updated current model; and then fine-tuning the current model of the round again by the second loss function and the corresponding new reliable weight, and updating the model after the round of training is completed. After the established multi-training is completed, outputting a check model.

It should be noted that: each wheel training includes a plurality of training batches; each time a training batch is completed, the model is updated once, so that the current models corresponding to the training batches are different, wherein the training batch is as follows: and selecting small batches of data from the sample set for training. That is, in each training round of the second stage training, the reliable weight corresponding to each positive sample in each training batch is calculated by different models, and the influence of unreliable labels in the pseudo labels in the training process is reduced through multiple training rounds, wherein the unreliable labels are pseudo labels which are obviously different from most of the positive samples and are difficult to fit.

In an embodiment, in S11, the method further includes performing data enhancement on the first sample set and/or the second sample set, where the data enhancement includes at least one of: image rotation, mosaic material parameters, color gamut conversion and image cropping.

Specifically, in the training process, the same sample set is trained for multiple times by using a data enhancement technology. In one training, the angle of the image is randomly rotated in the image angle rotation range, and/or the image is randomly cut in the size range corresponding to the image cutting, and/or the color gamut of the image is randomly adjusted in the color gamut adjusting range, and/or the image is subjected to mosaic processing in the image mosaic material parameter increasing range, so that a first sample set and a second sample set after data enhancement are obtained, and then the first sample set and the second sample set are trained, and a corresponding verification model is output. Through data enhancement, the trained model can be ensured to judge the same target in different states more accurately, so that the detection accuracy and the robustness of the intelligent camera in a target scene are improved.

When data enhancement is carried out, two or more image data enhancement modes are generally adopted, when the four modes are adopted, the processing is ordered sequentially, preferably, image rotation is carried out firstly, then image clipping is carried out, color gamut conversion is carried out, and finally mosaic processing is carried out according to mosaic material parameters. This not only can greatly reduce the data processing amount, but also can maximally reduce the data distortion in terms of data enhancement.

Specifically, in the training process, the same sample set is trained for multiple times by utilizing data enhancement. In one training, the angle of the image is rotated within the image angle rotation range, preferably the image is rotated to a designated angle, then the image is cut under the condition that the size of the image to be cut is determined, then the color gamut of the image is adjusted within the color gamut adjusting range, then the image is subjected to mosaic processing according to the image mosaic material parameters within the image mosaic material parameter increasing range, a first sample set and a second sample set after data enhancement are obtained, the first sample set and the second sample set are trained, and a corresponding verification model is output. Through data enhancement, the trained model can be ensured to judge the same target in different states more accurately, so that the detection accuracy and the robustness of the intelligent camera in a target scene are improved.

In one embodiment, referring to fig. 5, the step S12 includes:

Specifically, after the current model finishes one-time fine tuning training, the target model is used for detecting the local real-time image acquired by the intelligent camera to obtain a corresponding second detection result, the second detection result is compared with a first detection result corresponding to the model before fine tuning, and the model corresponding to the detection result with good effect is selected as the target model. That is, if the first detection result is better than the second detection result, the current model is not updated; if the second detection result is better than the first detection result, updating the current model into a verification model, and processing the data acquired by the intelligent camera by the updated verification model; such as: the current model is a model number 0, the verification model is a model number 1, and the current model is updated to be a model number 1. In an application embodiment, the first output result of the current model and the second output result of the verification model may directly obtain the detection result through a revisit user.

In one embodiment, the fine tuning training further comprises a fine tuning threshold; after reaching the fine tuning threshold, the default effect is optimal, and then training is not performed if the shooting scene is unchanged; the fine tuning threshold is that after the current model is trained for a plurality of times, the output result corresponding to the current model is superior to the second output result corresponding to the verification model trained each time. Therefore, unnecessary training times and energy consumption can be reduced, and the service life of the intelligent camera is prolonged.

In one embodiment, referring to fig. 6, before S10, the method includes:

s1: acquiring chip computing power, task type and sample data of at least one application scene of the intelligent camera;

specifically, the chip computing power is the processing capacity of a chip built in the intelligent camera to data, and task types comprise classification tasks and detection tasks.

S2: establishing a basic detection model according to the chip computing power, the task type and the sample data;

specifically, the sample data is randomly trained according to the data processing parameters and the task types to obtain a basic detection model, wherein corresponding output nodes are arranged on a network output layer according to different tasks, and detection results of the task types are correspondingly output by the output nodes.

S3: screening the sample data and outputting the first sample set;

specifically, the sample data is 2m images of the target scene, n images are randomly selected from the 2m images to serve as a first sample set, and m and n are positive integers.

S4: and pre-setting the first sample set and the basic detection model into the intelligent camera.

Specifically, before the intelligent camera tests or enters the market, the basic detection model and the corresponding first sample set are implanted into the intelligent camera, then the intelligent camera is arranged in an actual application scene, the local real-time image acquired by the intelligent camera is initially processed by the basic detection model, in the use process, the current model is subjected to fine tuning training by utilizing the first sample data and the second sample data, the model is updated, and the detection effect of the model is improved.

By adopting the method for optimizing the intelligent camera detection model by adopting edge calculation, a first sample set preset in the intelligent camera and a second sample set corresponding to local image data acquired by the intelligent camera are obtained, the first sample set and the second sample set are utilized to locally perform fine tuning training on a currently used model, update of the current model is completed, and a verification model is output; comparing the detection effect of the local image data of the check model and the current model, and outputting a target model; training is carried out by combining a local real-time image of an actual application scene with a preset first sample set locally, so that the fit degree of a target model and the actual application scene is improved, and the detection accuracy is improved; meanwhile, local training is carried out at the equipment end, so that the privacy of a user can be protected, and the safety of data is improved.

Example 2

In embodiment 1, the false labels are screened by the confidence threshold value to reduce false labels such as missing labels and wrong labels in the false labels. When the confidence threshold is set unreasonably, false labels such as missed labels and wrong labels exist in the false labels, and finally the accuracy of the target model after fine adjustment training is affected. Therefore, embodiment 2 of the present invention proposes a further improvement on the confidence threshold value in S1031 based on embodiment 1, and proposes a method for selecting a self-training confidence threshold value of an intelligent camera, in an embodiment, please refer to fig. 7, the method includes:

S20: acquiring a test sample set and a sampling step length of a confidence threshold;

specifically, a plurality of samples are randomly extracted from a training sample set A to serve as a test sample set, and each sample in the test set A is subjected to manual verification and is added with a manual label; meanwhile, a sampling step length of the confidence threshold is obtained, and a group of confidence threshold is output through the sampling step length.

Sampling step length: setting sampling interval points of confidence thresholds in the range of the confidence thresholds, wherein the interval distances correspond to the adjacent two confidence thresholds; if the confidence threshold is 0 to 1, the sampling step length is 0.2; the sampling points for the confidence thresholds are 0, 0.2, 0.4, 0.6, 0.8, and 1.

S21: testing the current model by using the test sample set, and outputting a detection result in the test sample set;

specifically, the detection result may be for a test sample set, or may be each sample in the test sample set, or each positive sample in each sample; the detection result at least comprises one of the following: confidence, location information, and category information.

S22: outputting mAP values corresponding to the confidence threshold values one by one according to the confidence threshold values corresponding to the sampling step length and the detection result;

Specifically, under different confidence thresholds of the detection results of the test sample set, the positive samples of each class can obtain corresponding AP values (average precision, AP); the average precision value (mean average precision, mAP) of the AP values of each category can be obtained from the AP values of all the categories; that is, mAP values are obtained that are in one-to-one correspondence with confidence thresholds.

S23: and outputting a target confidence threshold of the current model according to each mAP value.

Specifically, comparing the mAP values, and taking the confidence threshold corresponding to the finally selected mAP value as a target confidence threshold.

In one embodiment, referring to fig. 8, the step S23 includes:

s231: comparing each mAP value to output a target mAP value;

specifically, all mAP values are compared, and mAP values meeting the requirements are screened out and used as target mAP values.

In one embodiment, referring to fig. 9, the step S231 includes:

s2311: acquiring a group of confidence threshold values corresponding to the sampling step length of the confidence threshold values, and marking the confidence threshold values as a first confidence threshold value group;

specifically, a sampling step length of a confidence threshold is set, and a group of confidence thresholds is output according to the sampling step length of the confidence threshold and is recorded as a first confidence threshold group.

S2312: according to mAP values corresponding to all confidence thresholds in the first confidence threshold group, outputting a confidence threshold corresponding to the largest mAP value as a reference confidence threshold, and a previous confidence threshold and a next confidence threshold adjacent to the reference confidence threshold;

specifically, comparing mAP values corresponding to the confidence thresholds to obtain a maximum mAP value; and taking the confidence coefficient threshold corresponding to the maximum mAP value as a reference confidence coefficient threshold, and simultaneously acquiring a previous confidence coefficient threshold and a next confidence coefficient threshold adjacent to the reference confidence coefficient threshold.

S2313: reducing a sampling step size, and outputting a first confidence threshold interval between the previous confidence threshold and the reference confidence threshold and a second confidence threshold interval between the reference confidence threshold and the subsequent confidence threshold;

specifically, the interval between the former confidence threshold and the reference confidence threshold is referred to as a first confidence threshold interval, and the interval between the latter confidence threshold and the reference confidence threshold is referred to as a second confidence threshold interval. After obtaining a reference confidence threshold, adjusting the sampling step length, and obtaining a group of new confidence threshold according to the adjusted sampling step length; and taking out the confidence threshold value belonging to the first confidence threshold value interval and the confidence threshold value belonging to the second confidence threshold value interval.

S2314: acquiring confidence thresholds of the first confidence threshold interval and the second confidence threshold interval to obtain a new first confidence threshold group;

s2315: s2312 through S2314 are repeated until the target mAP value is output.

Specifically, after confidence threshold values of a first confidence threshold value interval and a second confidence threshold value interval are obtained, forming a new first confidence threshold value group by the confidence threshold values, then comparing mAP values corresponding to the confidence threshold values again, outputting a new reference confidence threshold value, repeating the process for a plurality of times, and finally outputting the reference confidence threshold value as a target confidence threshold value; the mAP value corresponding to the target confidence threshold can be guaranteed to be the maximum value closest to the mAP curve corresponding to the current model, and the accuracy and recall rate corresponding to the detection result can be guaranteed to be the most reasonable by using the target confidence threshold.

In one embodiment, in S2313, the subsequent sampling step is 1/2 of the previous sampling step.

Specifically, the sampling step length is adjusted by: the later sampling step length is 1/2 of the previous sampling step length, so that only one new confidence coefficient threshold value exists in the first confidence coefficient threshold value interval and the second confidence coefficient threshold value interval of each round, and the confidence coefficient threshold value is always positioned at the middle point position corresponding to the first confidence coefficient threshold value interval or the second confidence coefficient threshold value interval; and then directly comparing the mAP value corresponding to the new confidence coefficient threshold with the mAP value corresponding to the previous reference confidence coefficient threshold, and determining the new reference confidence coefficient threshold and the adjacent confidence coefficient threshold, wherein each round of mAP value comparison is performed at most twice, so that the new reference confidence coefficient threshold can be obtained, the data processing capacity is reduced, and the calculation efficiency is improved. Referring to FIG. 10, let b be the sampling step before adjustment, a be the adjusted sampling step, and a be 1/2 of b; when the sampling step length is B, the mAP value of the corresponding confidence coefficient threshold value is a ' point, ' mAP value corresponding to the reference confidence coefficient threshold value is obtained at the moment, the mAP value corresponding to the former confidence coefficient threshold value is a point D, the mAP value corresponding to the latter confidence coefficient threshold value is a point F, then the sampling step length is adjusted to a, the ' point O ' and the ' point, ' point C ' are obtained, the mAP value corresponding to the confidence coefficient threshold value in the first confidence coefficient threshold value interval is a point, the mAP value corresponding to the confidence coefficient threshold value in the second confidence coefficient threshold value interval is an A point, the mAP values of the A point and the C point are compared, and the confidence coefficient threshold value corresponding to the largest mAP value is used as a new reference confidence coefficient threshold value; and repeating the process to finally obtain a target confidence threshold value, reduce the data processing amount and improve the calculation efficiency. In a real-time mode, if the mAP value of the point A is larger than the mAP value of the point B, comparing the mAP values of the point C and the point B, and taking the confidence coefficient threshold corresponding to the point A as a new reference confidence coefficient threshold; if the mAP value of the point A is smaller than the mAP value of the point B, comparing the mAP values of the point B and the point C, if the mAP value of the point C is also smaller than the mAP value of the point B, the confidence threshold corresponding to the point B is still the reference confidence threshold, at the moment, the sampling step is adjusted to be a/2, and the process is repeated to finally obtain the target confidence threshold.

S232: and taking a confidence threshold corresponding to the target mAP value as the target confidence threshold according to the target mAP value.

In one embodiment, referring to fig. 11, the step S22 includes:

s221: outputting the accuracy and recall of each category corresponding to each confidence threshold one by one according to each confidence threshold and the detection result of each sample in the test sample set;

specifically, the detection result at least includes one of the following: positive sample number, location information, confidence, and category information; determining the accuracy and recall rate of the current model for detecting each category in the test sample set according to the confidence threshold; different confidence thresholds correspond to different accuracy rates and recall rates.

S222: and outputting mAP values corresponding to the confidence threshold values one by one according to the accuracy rates and the recall rates corresponding to the respective types.

In one embodiment, referring to fig. 12, the step S221 includes:

s2211: according to the detection result, the method is represented by the formulaOutputting the accuracy corresponding to each class;

s2212: according to the detection result, the method is represented by the formulaOutputting the recall rates corresponding to each class;

s2213: outputting mAP values corresponding to the confidence threshold values one by one according to the accuracy rates and the recall rates corresponding to the respective classes;

The confidence in the detection result of the TP positive sample is larger than a confidence threshold; the confidence in the detection result of the positive sample FP is smaller than a confidence threshold; FN is positive sample where no corresponding position is detected; precision is accuracy and Recall is Recall.

In one embodiment, the step S13 includes:

the method comprises the steps of firstly, obtaining confidence coefficient of each positive sample of a detection result of a test sample set and confidence coefficient thresholds corresponding to sampling step sizes;

secondly, sequencing the confidence coefficient of each positive sample, and outputting a confidence coefficient sequence;

thirdly, performing percentage sampling on the opposite confidence sequence to obtain a plurality of sampling confidence coefficients;

and fourthly, comparing the mAP value corresponding to each sampling confidence coefficient with the mAP value corresponding to each confidence coefficient threshold value, and outputting the confidence coefficient corresponding to the maximum mAP value as a target confidence coefficient threshold value.

Specifically, the test sample set is input into a current model, the current model detects each sample, and a detection result is output; sequencing the confidence coefficient of each positive sample in the detection result, sampling according to the percentage to obtain the confidence coefficient corresponding to a plurality of sampling points, and marking the confidence coefficient as the sampling confidence coefficient; such as: detecting 10 positive samples to obtain a confidence coefficient sequence with the length of 10, sampling according to 2%, taking out confidence coefficients at the positions of 2, 4, 6, 8 and 10, comparing the 5 confidence coefficients with mAP values corresponding to a plurality of confidence coefficient thresholds corresponding to sampling step length, and selecting the confidence coefficient threshold corresponding to the largest mAP value or sampling confidence coefficient as a target confidence coefficient threshold.

By adopting the intelligent camera self-training confidence threshold selection method, a plurality of confidence thresholds are obtained by setting sampling step length, a current model is used for testing a test sample set, and a detection result is output; and obtaining mAP values corresponding to the confidence coefficient thresholds one by one according to the detection result, comparing all mAP values, and outputting the confidence coefficient threshold corresponding to the maximum mAP value as a target confidence coefficient threshold, so that the accuracy and recall rate of the detection result are optimal. When the detection model is required to be subjected to fine tuning training, the confidence threshold value can be directly determined locally by adopting the method, false labels such as missed labels and wrong labels in the false labels can be reduced, the quality of the false labels is improved, the training effect of the detection model is improved, and the method does not need to upload local data to a server, so that the safety of the data and the privacy of users can be ensured.

Example 3

In the embodiment 1 and the embodiment 2, setting a reasonable confidence threshold can reduce false labels such as missed labels and wrong labels in the false labels; however, the false label after screening still has false labels such as missed labels, false labels and the like, if the current model is subjected to fine tuning training directly, the updated model is still influenced by the false labels, and the detection precision of the updated model is low. Therefore, embodiment 3 of the present invention further proposes a method for self-training of a smart camera detection model for further improvement of the fine-tuning training of the current model in S11 using the first sample set and the second sample set based on embodiment 1 and/or embodiment 2 of the present invention from another perspective, in an embodiment, please refer to fig. 13, the method includes:

S30: acquiring a training sample set, a first loss function, a second loss function and reliable weights corresponding to positive samples in the training sample set, wherein the training sample set is used for training a current model of an intelligent camera;

specifically, a basic detection model which can be used in a plurality of application scenes is built in the intelligent camera, and the basic detection model is obtained by training a training set A formed by collected training samples of the plurality of application scenes; the training sample set comprises a plurality of samples (hereinafter referred to as a first sample set) randomly extracted from the training set A and a plurality of samples (hereinafter referred to as a second sample set) acquired from an actual application scene of intelligent camera installation; and obtaining the reliable weight output by the model network layer to each positive sample after the training sample set is sent into the current model, and fine-tuning a first loss function of the first-stage training and a second loss function of the second-stage training in the training.

It should be noted that: the current model can be a basic detection model or a trained model; that is, when training is performed according to the training sample set for the first time, the current model is the basic detection model, and when training is performed according to the new training sample set for the second time, if the effect of the model trained for the first time is better than that of the basic detection model, the current model trained for the second time is a verification model obtained by the first time; if the effect of the first trained model is worse than that of the basic detection model, the current model of the second training is still the basic detection model, and so on, the current model in the N training can be a training verification model obtained in the N-1 training or a current model corresponding to the N-1 fine tuning training.

In an embodiment, please refer to fig. 14, adding a weight channel for outputting a reliable weight to the network output layer, the S30 includes:

specifically, referring to fig. 15, a weight channel is added to the object detection network to output a weight channel with reliable weight; the left is the output layer of the common target detection network, the bbox channel represents the target position, the confidence channel is the target foreground confidence, and the class channel is the probability that the target belongs to each class. To the right is the network output with added weights channels. After the target detection network completes the detection of any sample, the weight channel output obtains the reliable weight of each positive sample in the sample through an activation function.

It should be noted that: the data processing task of the current model on the application scene acquired by the intelligent camera comprises the following steps: detecting tasks and classifying tasks; corresponding output nodes are added for different task weights channels to output reliable weights of positive samples.

S301: acquiring mapping functions and position information corresponding to positive samples in a training set one by one;

specifically, gridding a sample acquired by an intelligent camera, and acquiring position information of a target when the target at a certain position accords with a target object of a data processing task; the mapping function may be a Sigmoid function.

S302: and outputting the value corresponding to each position information in the weight channel according to the position information, and outputting the reliable weight of each positive sample corresponding to each value by the mapping function.

Specifically, after a target object conforming to a data processing task is detected, a network channel corresponding to an output layer of a target detection network outputs a corresponding detection result to the target; according to the confidence coefficient, the position information and the category information of the checking result, a pseudo tag of a positive sample corresponding to the position is manufactured; meanwhile, the weights channel outputs a value corresponding to the position information, and the value passes through the mapping relation of the Sigmoid function and outputs the reliable weight of the positive sample of the position.

It should be noted that: further confidence processing may be performed for the confidence of the sample, such as: setting a confidence threshold value, and then performing binarization processing; taking the processed detection result as a pseudo tag; for specific embodiments, please refer to example 2, and no further description is given here.

In an embodiment, each round of training includes a plurality of training latches, in S10, the training sample set includes a preset first sample set and a locally collected second sample set, and each positive sample corresponding to a sample belonging to the second sample set in each training latch is ordered according to a reliable weight, so as to output a reliable weight sequence.

Specifically, in the first stage training and the second stage training, each training round comprises a plurality of training batches (the training batches are used for training by selecting small batch data from a sample set); the samples in each training batch include samples from the first sample set and/or the second sample set, such as: each training batch comprises Q samples, y samples from a first sample set, z samples from a second sample set, wherein Q = y + z; q, y and z are integers greater than or equal to 0; in the training process of each round, the weight channel outputs corresponding reliable weights according to the reliability degree of each positive sample in the second sample set, and it can be understood that pseudo tags corresponding to most of the positive samples which are difficult to fit are given lower reliable weights, and the reliable positive samples are given higher reliable weights; and sequencing the reliable weights to obtain a reliable weight sequence.

It should be noted that: the labels of each positive sample in the first sample set in the training sample set are artificial labels, and the labels in the second sample set are pseudo labels.

In one embodiment, referring to fig. 16, the step S30 includes:

s303: acquiring the quantity proportion of the high reliable weight and the low reliable weight in the reliable weight sequence;

S304: outputting each of the high reliability weights and each of the low reliability weights according to the quantity ratio by the formula m=α×n;

m is the number of high reliable weights, alpha is the high reliable weight number duty cycle, and N is the total number of reliable weights for the reliable weight sequences in each batch.

Specifically, the reliable weights corresponding to all positive samples belonging to the second sample set in each training batch are ordered in a descending order; let the duty ratio of the high reliability weight be α, the duty ratio of the low reliability weight be 1- α, and the number of high reliability weights is obtained by the formula m=α×n.

S31: performing first-stage training on the current model by using the training sample set through the first loss function, and outputting an intermediate training model;

specifically, an original loss value of a training sample set is calculated by using a first loss function; then fine tuning the current model according to the original loss value to realize network convergence, thereby ensuring the integrity of the current model and outputting an intermediate training model for training in the next stage; the pseudo tags include correct tags and incorrect tags.

S32: training the current model in a second stage on the basis of the intermediate training model by utilizing the second loss function and the reliable weight, and outputting a verification model;

In each training round of the second-stage training, the corresponding reliable weights of the positive samples are mutually independent.

Specifically, after the first-stage training of the current model is completed, obtaining a network-converged intermediate training model trained by all samples, then continuing the second-stage training on the basis of the intermediate training model by a second loss function and the corresponding reliable weights of all positive samples, and finally outputting a verification model; each round of training in the second stage of training outputs the corresponding reliable weight of each positive sample, and the reliable weight of each positive sample is combined with the second loss function to adjust the current model so as to update the model; outputting a verification model after the multi-round training of the second-stage training is finished; that is, the second stage training is to continue according to the second loss function and different corresponding reliable weights of each round, reduce the influence of unreliable labels in the pseudo labels in the training process through multiple rounds of training, and finally output a target training model; wherein the unreliable labels are pseudo labels which are obviously different from the pseudo labels corresponding to most positive samples which are difficult to fit.

It should be noted that: the reliable weight of each positive sample in each round of training is independent; that is, when the current wheel is trained, the current model outputs corresponding reliable weights for each positive sample to participate in the current wheel model training; when the next round of training is performed, outputting corresponding reliable weights to each positive sample by the current model corresponding to the next round of training is performed to participate in the next front round of model training; there is no mapping relation between the reliable weights of any two training rounds.

It should be noted that: each round of training comprises a plurality of training batches, and each time one training batch is completed; the model is updated once; that is, in each training batch, models corresponding to reliable weights corresponding to the output positive samples are different; meanwhile, the reliable weights of the positive samples in each training batch are independent of each other.

In an embodiment, referring to fig. 17, in each training batch, when the positive sample belongs to the first sample set, the second loss function includes a first supervision function of a weight channel, and the S32 includes:

specifically, the first supervision function is used for supervising each training batch, so that the reliability weight of each positive sample belonging to the first sample set output by the weights channel after the second-stage training is finished integrally meets the reliability weight being larger than a preset value, and the first loss function can ensure that the positive sample of the artificial tag has very high reliability weight, so that the loss value of the positive sample with the artificial tag in the model fine tuning training is minimum.

S321: acquiring the reliable weight of each positive sample output by the training batch currently;

specifically, after the samples of the first sample set of the current training batch are sent to the current model, the weights channel of the network output layer outputs the corresponding reliable weights for each positive sample.

S322: according to each of the reliable weights and the first supervision functionPerforming the current training batch through the second loss function;

specifically, according to the reliable weight of each positive sample and the first supervision function, parameters of the current model are adjusted through the first loss function, and update of the current model by the current training batch is completed.

S323: repeating S321 to S322 until the second stage training is completed, and outputting the verification model;

specifically, under the action of the first supervision function, the second training stage carries out multiple training, so that the integral reliable weight of the positive sample of the first sample set output by the weight channel of the final target training model is larger than the preset value gamma of the reliable weight; that is, the positive samples in the first sample set that are ultimately output have a majority of their reliable weights greater than γ, and a minority of their reliable weights less than γ. Thereby ensuring that the overall loss of positive samples of the first sample set is small.

Wherein y is the total number of positive samples belonging to the first sample set in one training batch, gamma is the preset value of the reliable weight, W _c Representing the reliable weight of the c-th positive sample.

In an embodiment, referring to fig. 18, in each training batch, when the positive sample belongs to a sample of the second sample set, the second loss function includes a second supervision function of the weight channel, and the S32 includes:

Specifically, the second supervision function is used for supervising each training batch, so that each positive sample belonging to the second sample set output by the weight channel has a high-reliability first weight average value W after the second stage training is finished _H Second weight average value W corresponding to low reliability weight _L The difference of (2) is larger than delta, forcing the network to distinguish between high reliable weight and low reliable weight; the reliable weight of the false label in the false label is gradually reduced in the training process; and the accuracy of the self-training of the model is improved.

S324: acquiring the reliable weight sequence output by the training batch currently;

specifically, after the samples of the second sample set of the current training batch are sent to the current model, the weights channel of the network output layer outputs the corresponding reliable weights to each positive sample, and then the reliable weights are sequenced to obtain a reliable weight sequence; the detection result at least comprises one of the following: confidence information, category information, and location information.

S325: according to each high reliability weight, the method is represented by the formulaCalculating to obtain a first weight average value W corresponding to each high-reliability weight in the training batch currently _H ；

S326: according to each low reliability weight, the method is represented by the formulaCalculating to obtain a second weight average value W corresponding to each low reliable weight in the training batch _L ；

S327: according to the first weight average value W _H The second weight average value W _L And the second supervision function L _w ＝max(0,δ-(w _H -w _L ) Performing the current training batch through the second loss function;

s328: repeating S324 to S327 until the second stage training is completed, outputting the verification model;

wherein M is the number of high reliable weights in one training batch, N is the total number of positive samples belonging to the second sample set in one training batch, W _a An a-th reliable weight in the high reliable weights; w (W) _b And b is the b reliable weight in the low reliable weights, and delta is a weight difference value preset value.

Specifically, in the second supervision function L _w ＝max(0,δ-(w _H -w _l ) Through multiple training in the second training stage, the high-reliability first weight average value W of the weight channel output by the final target training model _H Second weight average value W corresponding to low reliability weight _L The difference of (2) is greater than delta; forcing the network to distinguish between high and low reliability weights; and reducing the influence of unreliable labels in the pseudo labels in the training process, and finally outputting the target training model.

In an embodiment, in the step S20, the first loss function includes at least one of: coordinate loss, category loss, and confidence loss; the second loss function includes at least one of: coordinate loss, category loss, confidence loss, and weight channel loss.

Specifically, taking the yolov3 loss function as an example, the first loss function includes:

coordinate loss:

category loss:

confidence loss:

Obtain a first loss function = L _coord +L _cls +L _conf 。

The second loss function is:

coordinate loss:

category loss:

confidence loss:

weight channel loss:

first supervision function:

and, a second supervision function: l (L) _w ＝max(0,δ-(w _H -w _L ))

Obtain a second loss function = L _coord +L _cls +L _conf +σL _w +θL’ _w 。

Wherein L is _coord ,L _cls ,L _conf Respectively representing coordinate error, class error and confidence error, L _w And L' _w Representing the reliable weight error loss for the target in the second sample set and the target in the first sample set respectively,indicating whether a j candidate frame of an i-th grid in application scene data acquired by an intelligent camera is a positive sample, if yes +.>Otherwise, 0; w (w) _ij The reliable weight of positive samples of the jth candidate frame of the ith grid is represented, x and y respectively represent the abscissa and the ordinate of the center point of the frame, w and n respectively represent the width and the height of the frame, C represents the category, P represents the confidence, S is the number of grids of the network output layer, B is a fixed reference frame, lambda, sigma and theta are constants, and delta is a weight difference preset value.

By adopting the intelligent camera model self-training method, the intelligent camera detects a training sample set formed by locally acquired image data, directly calculates the loss value of each sample by using a first loss function locally, and then carries out fine tuning training of the detection model in the first stage to complete convergence of the model so that the model can provide possibility for local training; after the first-stage training session is completed, the second-stage training is performed on the detection model by introducing the reliable weight and the second loss function, and under the action of the reliable weight and the second loss function, the model can complete fine tuning training along the expected direction under the condition of lacking human intervention; the fine tuning training of the detection model is directly completed at the user side, a local data uploading server is not needed, and the safety of the data and the privacy of the user can be ensured.

Second embodiment

Example 4

The embodiment 4 of the present invention also provides a device for optimizing an intelligent camera detection model by adopting edge calculation, as shown in fig. 19, including:

and a data acquisition module: the intelligent camera is used for acquiring a preset first sample set and a second sample set corresponding to local image data acquired by the intelligent camera;

By adopting the device for optimizing the intelligent camera detection model by adopting edge calculation, a first sample set preset in the intelligent camera and a second sample set corresponding to local image data acquired by the intelligent camera are obtained, the first sample set and the second sample set are utilized to locally perform fine tuning training on a currently used model, update of the current model is completed, and a verification model is output; comparing the detection effect of the local image data of the check model and the current model, and outputting a target model; training is carried out by combining a local real-time image of an actual application scene with a preset first sample set locally, so that the fit degree of a target model and the actual application scene is improved, and the detection accuracy is improved; meanwhile, local training is carried out at the equipment end, so that the privacy of a user can be protected, and the safety of data is improved.

Example 5

In example 4, pseudo tags are screened by confidence thresholds; so as to reduce false labels such as missing labels and wrong labels in the false labels; when the confidence threshold is set unreasonably, false labels such as missed labels and wrong labels exist in the false labels, and the accuracy of the target model after fine adjustment training is finally affected; accordingly, a corresponding sub-arrangement of further improvement of the confidence threshold is proposed on the basis of embodiment 4.

Referring to fig. 20, the method includes:

sample data module: the sampling step length is used for acquiring a test sample set and a confidence threshold value;

and a data detection module: the testing device is used for testing the current model by utilizing the testing sample set and outputting a detection result corresponding to each positive sample in the testing sample set;

and a data processing module: the mAP values corresponding to the confidence threshold values one by one are output according to the confidence threshold values corresponding to the sampling step length and the detection result;

a target output module: and the target confidence threshold value is used for outputting the current model according to each mAP value.

By adopting the intelligent camera sample confidence threshold selecting device, a plurality of confidence thresholds are obtained by setting sampling step length, a current model is used for testing a test sample set, and a detection result is output; and obtaining mAP values corresponding to the confidence coefficient thresholds one by one according to the detection result, comparing all mAP values, and outputting the confidence coefficient threshold corresponding to the maximum mAP value as a target confidence coefficient threshold, so that the accuracy and recall rate of the detection result are optimal. When the detection model is required to be subjected to fine tuning training, the confidence threshold value can be directly determined locally by adopting the method, false labels such as missed labels and wrong labels in the false labels can be reduced, the quality of the false labels is improved, the training effect of the detection model is improved, and the method does not need to upload local data to a server, so that the safety of the data and the privacy of users can be ensured.

Example 6

In the embodiment 4 and the embodiment 5, setting a reasonable confidence threshold can reduce false labels such as missed labels and wrong labels in the false labels; however, error labels such as missed labels and false labels still exist in the screened pseudo labels, if fine tuning training is directly performed on the current model, the updated model is still affected by the error labels, so that the detection precision of the updated model is not high, and further improvement of fine tuning training of the current model by using the first sample set and the second sample set in fine tuning training of the model is further proposed on the basis of embodiment 4 and/or embodiment 5;

referring to fig. 21, the method includes:

training data module: the method comprises the steps of obtaining a training sample set, a first loss function, a second loss function and reliable weights corresponding to positive samples in the training sample set, wherein the training sample set is used for training a current model of an intelligent camera;

a first training module: the training sample set is used for training the current model in a first stage through the first loss function and outputting an intermediate training model;

and a second training module: the method comprises the steps of training a current model in a second stage on the basis of the intermediate training model by using the second loss function and the reliable weight, and outputting a verification model;

By adopting the intelligent camera model self-training device, the intelligent camera detects a training sample set formed by locally acquired image data, directly calculates the loss value of each sample by using a first loss function locally, and then carries out fine tuning training of the detection model in the first stage to complete convergence of the model so that the model can provide possibility for local training; after the first-stage training session is completed, the second-stage training is performed on the detection model by introducing the reliable weight and the second loss function, and under the action of the reliable weight and the second loss function, the model can complete fine tuning training along the expected direction under the condition of lacking human intervention; the fine tuning training of the detection model is directly completed at the user side, a local data uploading server is not needed, and the safety of the data and the privacy of the user can be ensured.

Embodiment III:

the present invention provides an electronic device and storage medium, as shown in fig. 22, comprising at least one processor, at least one memory, and computer program instructions stored in the memory.

In particular, the processor may comprise a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits implementing embodiments of the present invention, the electronic device comprising at least one of: intelligent camera, have intelligent camera's mobile device, have intelligent camera's wearing equipment.

The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a non-volatile solid state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.

The processor reads and executes the computer program instructions stored in the memory to implement any one of the methods for optimizing the intelligent camera detection model by using edge calculation, the sample confidence threshold selection method, and the model self-training method in the first embodiment.

In one example, the electronic device may also include a communication interface and a bus. The processor, the memory and the communication interface are connected through a bus and complete communication with each other.

The communication interface is mainly used for realizing communication among the modules, the devices, the units and/or the equipment in the embodiment of the invention.

The bus includes hardware, software, or both that couple components of the electronic device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. The bus may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.

In summary, the embodiments of the present invention provide a method for optimizing an intelligent camera detection model by adopting edge calculation, a sample confidence threshold selection method, a device, equipment and a storage medium for model self-training.

It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method for optimizing an intelligent camera detection model by adopting edge calculation, which is characterized by comprising the following steps:

collecting training samples of a plurality of application scenes to form a training set;

randomly extracting a plurality of samples from the training set as a first set of samples;

randomly training the training set, and outputting a basic detection model as the current model;

Wherein, the S11 includes:

s111: acquiring a first loss function, a second loss function and reliable weights corresponding to positive samples; s112: combining the first sample set and the second sample set, and outputting a third sample set;

s114, training a second stage by using the third sample set through a second loss function and the reliable weight on the basis of the intermediate training model, and outputting the verification model;

wherein, the S12 includes:

s123: comparing the first detection result with the second detection result, and outputting a target model;

the method comprises the steps that local real-time images acquired through an intelligent camera are analyzed through a current model to obtain categories of the images;

adding a corresponding class label according to the class to which each image belongs;

For a detection task, a positive sample is a target corresponding to all positions marked in the sample in advance;

wherein the first loss function comprises:

coordinate loss:

category loss:

confidence loss:

wherein L is _coord ，L _cls ，L _conf Respectively representing a coordinate error, a category error and a confidence error,indicating whether a jth candidate frame of an ith grid in application scene data acquired by an intelligent camera is a positive sample of a data processing task, if so, determining that the j candidate frame is a positive sample of the data processing taskOtherwise, 0;

obtain a first loss function = L _coord +L _cls +L _conf ；

The second loss function includes:

coordinate loss:

category loss:

confidence loss:

weight channel loss:

comprises, the firstA supervision function:

and, a second supervision function: l (L) _w ＝max(0，δ-(w _H -w _L ))；

Obtain a second loss function = L _coord +L _cls +L _conf +σL _w +θL’ _w ；

Wherein L is _coord ，L _cls ，L _conf Respectively representing coordinate error, class error and confidence error, L _w And L' _w Representing the reliable weight error loss for the target in the second sample set and the target in the first sample set respectively,indicating whether a j candidate frame of an i-th grid in application scene data acquired by an intelligent camera is a positive sample, if yes +.>Otherwise, 0; wij represents the reliable weight of the positive sample of the jth candidate box of the ith grid, x and x in the coordinate penalty _y Respectively representing the abscissa and the ordinate of the center point of the candidate frame, w and n respectively representing the width and the height of the selected frame, C representing the category, P representing the confidence level, S representing the number of grids of the network output layer, B representing the fixed reference selected frame, lambda, sigma and theta being constants, delta being a weight difference preset value;

Wherein y in the first supervision function is the total number of positive samples belonging to the first sample set in the training batch, and y is a preset value of reliable weight, W _c Representing the reliable weight of the c-th positive sample;

specifically, the second supervision function is used for supervising each training batch, so that each positive sample belonging to the second sample set output by the weights channel has a high-reliability weight of the first weight average value W after the second stage training is finished _H Second weight average value W corresponding to low reliability weight _L The difference of (2) is larger than delta, forcing the network to distinguish between high reliable weight and low reliable weight;

according to each high reliability weight, the method is represented by the formulaCalculating to obtain a first weight average value W corresponding to each high-reliability weight in the training batch currently _H ；

According to each low reliability weight, the method is represented by the formulaCalculating to obtain a second weight average value W corresponding to each low reliable weight in the training batch _L ；

Wherein M is the number of high possible weights in one training batch, N is the total number of samples belonging to the second sample set in one training batch, W _a An a-th reliable weight in the high reliable weights; w (W) _b The b-th reliable weight among the low reliable weights.

2. The method for optimizing a smart camera detection model using edge computation according to claim 1, wherein S10 comprises:

S101: collecting a local real-time image of an actual application scene;

3. The method for optimizing a smart camera detection model using edge computation according to claim 2, wherein S103 includes:

4. The method for optimizing a smart camera detection model using edge computation according to claim 1, further comprising, in S11, performing data enhancement on the first sample set and/or the second sample set, wherein the data enhancement includes at least one of: image rotation, mosaic material parameters, color gamut variation and image cropping.

5. The method for optimizing an intelligent camera detection model by adopting edge calculation according to claim 4, wherein when data enhancement is performed by adopting image rotation, mosaic material parameters, color gamut variation and image cropping at the same time, sequencing the processing sequence, performing image rotation first, then image cropping, then color gamut conversion, and finally performing mosaic processing according to the mosaic material parameters; outputting the first sample set and the second sample set subjected to data enhancement.

6. An apparatus for optimizing an intelligent camera detection model by adopting edge calculation, comprising:

and a model checking module: the method comprises the steps of comparing the current model with the verification model and outputting a target model;

The step of performing fine tuning training on the current model by using the first sample set and the second sample set to obtain a verification model includes:

acquiring a first loss function, a second loss function and reliable weights corresponding to positive samples;

combining the first sample set and the second sample set, and outputting a third sample set;

performing first-stage training on the current model by using the third sample set through a first loss function, and outputting an intermediate training model;

based on the intermediate training model, performing second-stage training by using the third sample set through a second loss function and the reliable weight, and outputting the verification model;

wherein said comparing said current model to said verification model, outputting a target model comprises:

acquiring a first detection result corresponding to the detection of the current model on the local image;

obtaining a second detection result corresponding to the detection of the local image by the verification model;

comparing the first detection result with the second detection result, and outputting a target model;

wherein the first loss function comprises:

coordinate loss:

category loss:

confidence loss:

obtain a first loss function = L _coord +L _cls +L _conf ；

The second loss function includes:

coordinate loss:

category loss:

confidence loss:

weight channel loss:

the method comprises the following steps of:

and, a second supervision function: l (L) _w ＝max(0， ^δ -(w _H -w _L ))；

Wherein L is _coord ，L _cls ，L _conf Respectively representing coordinate error, class error and confidence error, L _w And L' _w Representing the reliable weight error loss for the target in the second sample set and the target in the first sample set respectively,indicating whether a j candidate frame of an i-th grid in application scene data acquired by an intelligent camera is a positive sample, if yes +. >Otherwise, 0; w (W) _ij The method comprises the steps of representing reliable weight of positive samples of jth candidate frames of an ith grid, wherein x and y in coordinate loss represent abscissa and ordinate of center points of the candidate frames respectively, w and n represent width and height of a selected frame respectively, C represents category, P represents confidence, S represents the number of grids of a network output layer, B represents a fixed reference selected frame, lambda, sigma and theta are constants, and delta is a weight difference preset value;

specifically, the second supervision function is used for supervising each training batch, so that each positive sample belonging to the second sample set output by the weights channel has high reliable weight after the second stage training is finishedFirst weight average value W _H Second weight average value W corresponding to low reliability weight _L The difference of (2) is larger than delta, forcing the network to distinguish between high reliable weight and low reliable weight;

according to each high reliability weight, the method is represented by the formulaCalculating to obtain a first weight average value WH corresponding to each high-reliability weight in the current training batch;

according to each low reliability weight, the method is represented by the formula Calculating to obtain a second weight average value W corresponding to each low reliable weight in the training batch _L ；

Wherein M is the number of high reliable weights in one training batch, N is the total number of positive samples belonging to the second sample set in one training batch, W _a An a-th reliable weight in the high reliable weights; w (W) _b The b-th reliable weight among the low reliable weights.

7. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any one of claims 1-5.

8. A medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-5.