CN112434680A

CN112434680A - Intelligent camera model self-training method, device, equipment and medium

Info

Publication number: CN112434680A
Application number: CN202110109737.7A
Authority: CN
Inventors: 陈辉; 龙祥; 张智; 熊章; 雷奇文; 艾伟; 胡国湖
Original assignee: Wuhan Xingxun Intelligent Technology Co ltd
Current assignee: Wuhan Xingxun Intelligent Technology Co ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-03-02
Anticipated expiration: 2041-01-27
Also published as: CN113139475A; CN112434680B

Abstract

The invention belongs to the technical field of image processing, solves the technical problem of poor self-training effect caused by the fact that a confidence coefficient threshold value cannot be set to avoid wrong labels in a self-training process, and provides a method, a device, equipment and a medium for self-training an intelligent camera model. Obtaining a training sample set, a first loss function, a second loss function and reliable weights of positive samples in the training sample set; performing first-stage training on the current model by using a training sample set through a first loss function, and outputting an intermediate training model; and performing second-stage training on the current model on the basis of the intermediate training model by using a second loss function and the reliable weight, and outputting a verification model. The invention also includes apparatus, devices and media for performing the above methods. The invention completes network convergence through the first stage training, and then can gradually reduce the influence of error labels in the training process through reliable weight in the second training stage, thereby improving the self-training effect.

Description

Intelligent camera model self-training method, device, equipment and medium

Technical Field

The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a medium for self-training an intelligent camera model.

Background

With the wide application of video monitoring equipment in monitoring different scenes in daily life, cameras, especially intelligent cameras with image analysis functions, become a common monitoring tool in current life monitoring management. The intelligent camera is provided with a chip with a processor function besides the conventional camera shooting, and can simply process and analyze the shot scene image data.

However, in the prior art, because the chip of such an intelligent camera has computing capability, in order to make the intelligent camera adapt to multi-scene application, a common practice is that such an intelligent camera trains a lightweight network model to be deployed on the intelligent camera by collecting as many sample images of a plurality of different scenes as possible before leaving the factory, so that the intelligent camera can be applicable to different scenes, such as: infant care management, hospital patient monitoring, school monitoring, mall monitoring, garage monitoring, scenic spot monitoring, farm monitoring, road traffic monitoring, and the like.

Although the intelligent camera collects a plurality of samples of different application scenes in advance, a large amount of sample training is carried out on each scene in the intelligent camera, when the intelligent camera is directly used in the scenes to carry out monitoring, the actual scenes where different users are located still have large differences, so that basic detection models obtained by training the intelligent camera before leaving a factory are not applicable, and large deviation is easy to occur. However, people do not want to connect the smart camera with the server in consideration of personal privacy and information security, and other ways such as downloading an update package of the basic detection model for updating increase the hardware cost of the smart camera (because the existing smart camera generally has no interactive display interface such as a touch screen, the cost of the smart camera is increased if the existing smart camera is provided with interactive hardware such as a touch screen for identification) and the time for updating and maintaining.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for self-training an intelligent camera model, so as to solve the technical problem of poor self-training effect due to the fact that a confidence threshold cannot be set during a self-training process to avoid an erroneous tag.

The technical scheme adopted by the invention is as follows:

the invention provides a self-training method of an intelligent camera model, which comprises the following steps:

s30: acquiring a training sample set, a first loss function and a second loss function of a current model for training a camera and a reliable weight corresponding to each positive sample in the training sample set;

s31: performing a first-stage training on the current model by using the training sample set through the first loss function, and outputting an intermediate training model;

s32: performing second-stage training on the current model on the basis of the intermediate training model by using the second loss function and the reliable weight, and outputting a verification model;

and in each round of training of the second stage of training, the reliable weights corresponding to the positive samples are mutually independent.

Preferably, a weight channel for outputting a reliable weight is added to the network output layer, and the S30 includes:

s301: acquiring a mapping function and position information corresponding to each positive sample in a training set one by one;

s302: and outputting values corresponding to the position information in the weight channel according to the position information, and outputting the reliable weight of each positive sample corresponding to each value by the mapping function.

Preferably, each round of training includes a plurality of training batchs, in S30, the training sample set includes a preset first sample set and a locally acquired second sample set, positive samples corresponding to samples belonging to the second sample set in each training batch are sorted according to a reliable weight, and a reliable weight sequence is output.

Preferably, the S30 includes:

s303: obtaining the number ratio of the high reliable weight and the low reliable weight in the reliable weight sequence;

s304: outputting each high reliability weight and each low reliability weight according to the number ratio by a formula M = α × N;

m is the number of high reliable weights, alpha is the ratio of the number of high reliable weights, and N is the total number of reliable weights of the reliable weight sequences in each batch.

Preferably, in each training batch, when the positive sample belongs to the first sample set, the second loss function further includes a first supervision function of a weight channel, and the S32 includes:

s321: obtaining the reliable weight of each positive sample output by the training batch at present;

s322: according to each reliable weight and the first supervision function

Performing the current training batch through the second loss function;

s323: repeating S321 to S322 until the second stage training is completed, and outputting the verification model;

wherein y is the total number of positive samples belonging to the first sample set in a training batch, γ is a preset value of reliable weight, and W is_cRepresenting the reliable weight of the c-th positive sample.

Preferably, in each training batch, when the positive sample belongs to the second sample set, the second loss function further includes a second supervised function of the weight channel, and the S32 includes:

s324: acquiring the reliable weight sequence output by the training batch currently;

s325: according to each of said high reliability weights, by formula

Calculating to obtain a first weight average value W corresponding to each high-reliability weight in the current training batch_H；

S326: according to each of said low reliable weights, by formula

Calculating to obtain a second weight average value W corresponding to each low reliable weight in the current training batch_L；

S327: according to the first weight average value W_HThe second weight average value W_LAnd said second supervisory function

Performing the current training batch through the second loss function;

s328: repeating S324 to S327 until the second stage training is completed, and outputting the verification model;

wherein, M is the number of high reliable weights in one training batch, N is the total number of positive samples belonging to the second sample set in one training batch, and Wa is the a-th reliable weight in the high reliable weights; wb is the b-th reliable weight in the low reliable weights, and delta is a weight difference preset value.

The invention also provides a method for optimizing the intelligent camera detection model by adopting edge calculation, which comprises any one of the intelligent camera self-training methods.

The invention also provides an intelligent camera model self-training device, which comprises:

a training data module: the method comprises the steps of obtaining a training sample set, a first loss function, a second loss function and reliable weights corresponding to positive samples in the training sample set, wherein the training sample set, the first loss function and the second loss function are used for training a current model of a camera;

a first training module: the training sample set is used for carrying out first-stage training on the current model through the first loss function and outputting an intermediate training model;

a second training module: the second-stage training is carried out on the current model on the basis of the intermediate training model by utilizing the second loss function and the reliable weight, and a verification model is output;

The present invention also provides an electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the method of any of the above.

The invention also provides a medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of the above.

In conclusion, the beneficial effects of the invention are as follows:

according to the intelligent camera model self-training method, the intelligent camera model self-training device, the intelligent camera detects a training sample set formed by locally acquired image data, the loss value of each sample is calculated locally by directly utilizing a first loss function, then fine tuning training of a first stage is carried out on a detection model, and convergence of the model is completed, so that possibility is provided for local training of the model; in the process of finishing the first-stage training, performing second-stage training on the detection model by introducing reliable weight and a second loss function, and under the action of the reliable weight and the second loss function, enabling the model to finish fine-tuning training along the expected direction under the condition of lacking human intervention; the method and the device have the advantages that fine tuning training of the detection model is directly completed at the user side, local data do not need to be uploaded to a server, and data safety and user privacy can be guaranteed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, without any creative effort, other drawings may be obtained according to the drawings, and these drawings are all within the protection scope of the present invention.

Fig. 1 is a schematic flowchart of a method for optimizing an intelligent camera detection model by using edge calculation in embodiment 1 of the first embodiment of the present invention;

fig. 2 is a schematic flow chart of acquiring a second sample set in example 1 according to a first embodiment of the present invention;

fig. 3 is a schematic flow chart of adding a pseudo tag to a sample in example 1 according to a first embodiment of the present invention;

FIG. 4 is a schematic flowchart illustrating a model fine tuning training process using a third sample set in example 1 according to a first embodiment of the present invention;

fig. 5 is a schematic flow chart of obtaining a target model in example 1 according to a first embodiment of the present invention;

fig. 6 is a schematic flowchart of a process of acquiring a base model of a camera in example 1 according to a first embodiment of the present invention;

fig. 7 is a schematic flowchart of acquiring a target confidence threshold in example 2 according to a first embodiment of the present invention;

fig. 8 is a schematic flowchart of outputting a target confidence threshold in example 2 according to a first embodiment of the present invention;

fig. 9 is a schematic flow chart illustrating obtaining a target mag value in example 2 according to a first embodiment of the present invention;

FIG. 10 is a schematic diagram of the structure of the mAP value curve in example 2 according to the first embodiment of the present invention;

fig. 11 is a schematic flowchart of obtaining an mAP value of a confidence threshold in example 2 according to a first embodiment of the present invention;

fig. 12 is a schematic flow chart of calculating the value of the mag in example 2 according to the first embodiment of the present invention;

fig. 13 is a schematic flowchart of an output verification model in example 3 according to a first embodiment of the present invention;

fig. 14 is a flowchart illustrating reliable weights of positive samples in example 3 according to a first embodiment of the present invention;

fig. 15 is a schematic structural diagram of a network output layer in example 3 according to the first embodiment of the present invention;

fig. 16 is a schematic flow chart of obtaining high reliable weights and low reliable weights in example 3 according to the first embodiment of the present invention;

fig. 17 is a schematic flowchart of a verification model corresponding to an artificial tag in example 3 according to a first embodiment of the present invention;

fig. 18 is a schematic flowchart of a verification model corresponding to a pseudo tag in example 3 according to a first embodiment of the present invention;

fig. 19 is a schematic structural diagram of an apparatus for continuously optimizing a camera effect according to embodiment 4 of the second embodiment of the present invention;

fig. 20 is a block diagram illustrating a structure of an apparatus for selecting confidence threshold values of sample of an intelligent camera according to a second embodiment of the present invention;

fig. 21 is a schematic structural diagram of an apparatus for self-training an intelligent camera model in embodiment 6 of the real-time mode two of the present invention;

fig. 22 is a schematic structural diagram of an electronic device in a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present invention, it is to be understood that the terms "center", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. In case of conflict, it is intended that the embodiments of the present invention and the individual features of the embodiments may be combined with each other within the scope of the present invention.

Implementation mode one

Example 1

Referring to fig. 1, fig. 1 is a schematic flowchart of a method for optimizing an intelligent camera detection model by using edge calculation in embodiment 1 of the present invention. The method for optimizing the intelligent camera detection model by adopting the edge calculation in the embodiment 1 of the invention comprises the following steps:

s10: acquiring a preset first sample set and a second sample set acquired locally in real time;

specifically, a basic detection model which can be used in a plurality of application scenes is built in the intelligent camera, and the basic detection model is obtained by training a training set A which is formed by collected training samples of the plurality of application scenes; the scene includes at least one of: infant care, schools, amusement parks, malls, garages, farms, and roads; obtaining a sample corresponding to the target scene through a specific channel, wherein the sample at least comprises one of the following components: images, sound, and video; the specific channel includes at least one of: opening a source data set, setting up a scene and a market; wherein, the sample set A of the training basic detection model needs to be manually checked, and manual labels are added; randomly extracting a plurality of samples from the training set A to serve as a first sample set; the samples of the second sample set are local images of practical application scenes collected by the intelligent camera, and the labels of the second sample set are set to be pseudo labels.

S11: performing fine tuning training on the current model by adopting the first sample set and the second sample set to obtain a verification model;

s12: comparing the current model with the verification model and outputting a target model;

specifically, before the intelligent camera comes into the market, training a training set A at random, outputting a basic detection model, implanting the basic detection model into the intelligent camera, and after a user installs the intelligent camera in a target area, performing data processing on a local image acquired by the intelligent camera by using the basic detection model in an initial use stage, wherein the data processing comprises detection task processing and/or classification task processing; the current model is a basic detection model, in the using process, the current model is subjected to fine tuning training locally through the first sample set and the second sample set to obtain a checking model of the training, the processing result of the checking model and the current model on the second sample set is compared, the target model is output according to the comparison result, and before next fine tuning training, the target model performs data processing on local images acquired by the intelligent camera.

It should be noted that: the current model is subjected to fine tuning training locally through the first sample set and the second sample set, the intelligent camera is subjected to local training, user data does not need to be uploaded to a server, leakage of local data is prevented, and data safety is improved; meanwhile, local images are adopted for training, the fitting degree of the target model and the installation scene of the intelligent camera can be improved, and the data processing accuracy of the intelligent camera is guaranteed.

In one embodiment, referring to fig. 2, the S10 includes:

s101: collecting a local real-time image of an actual application scene;

specifically, the intelligent camera is installed behind a specific application scene, starts to monitor the actual application scene, and collects a real-time image of the actual application scene.

S102: detecting the local real-time image by using a current model, and outputting a detection result corresponding to the local real-time image;

specifically, the acquired local real-time images are sent to the current model, the current model outputs detection results corresponding to the local images, and the detection results at least comprise one of the following results: confidence, category information, location information; and calculating the recall rate and the accuracy rate corresponding to each confidence coefficient threshold according to the detection result.

For the classification task of the category information, in an application embodiment: the method comprises the steps that local real-time images collected through an intelligent camera are analyzed through a current model to obtain the category of each image; and adding corresponding class labels according to the class to which each image belongs.

For the detection task, in an application embodiment, positive samples corresponding to the first type and/or the second type … and/or the Nth type are searched in a local real-time image of a monitoring target area shot by an intelligent camera, the current model processes the local real-time image, positive samples required by the composite type in the local real-time image are output, and a corresponding detection result is output; the detection result at least comprises one of the following: location information, confidence, category information, etc.; wherein the category information includes at least one of: a person, an animal, a movable object, a stationary object attached to a movable object, etc.

It should be noted that:

for the classification task, the positive sample is a sample corresponding to the target class;

for the detection task, the positive sample is a target corresponding to all positions marked in advance in the sample.

S103: carrying out binarization processing on a detection result corresponding to the local real-time image, and taking a processing result as a pseudo label of the local real-time image;

specifically, binarization processing is carried out on the confidence of each positive sample according to a confidence threshold, and a detection result after processing is used as a pseudo label of a local real-time image;

referring to fig. 3, the S103 includes:

s1031: obtaining the confidence coefficient and the confidence coefficient threshold value of each positive sample in the detection result of each local real-time image;

specifically, the intelligent camera inputs the acquired local real-time image into the current model and outputs the detection result of the real-time image; such as: the current model judges a target in the local real-time image as a human body, and the confidence coefficient is 0.6; that is, the confidence level that the target is "human" is 0.6, i.e., the confidence is 0.6; meanwhile, a confidence threshold value is obtained, the confidence value is used for representing the reliability of the detection result, and the confidence threshold value is used for judging the truth of the detection result; if the confidence threshold is 0.7, and if the confidence in the detection result is 0.6, the detection result is not feasible. Specifically, a sampling step size of the confidence level threshold value may be set, a group of confidence level threshold values is obtained according to the sampling step size, and the magnitude of the mAP value corresponding to each confidence level threshold value is compared to determine a final confidence level threshold value.

S1032: comparing each confidence coefficient with the confidence coefficient threshold value, resetting each confidence coefficient according to a comparison result, and outputting a processed detection result as a pseudo label of the local real-time image;

and if the confidence coefficient is greater than or equal to the confidence coefficient threshold value, setting the confidence coefficient to be 1, and if the confidence coefficient is less than the confidence coefficient threshold value, setting the confidence coefficient to be 0.

Specifically, according to a confidence threshold, performing binarization processing on a confidence in the detection result, if the confidence threshold is 0.7 when the target object is determined to be "human", and if the confidence is less than 0.7, setting the confidence of the current target determined to be "human" to be "0", and updating the detection result of the positive sample; if the confidence coefficient is more than or equal to 0.7, the confidence coefficient of the target being judged as 'human' is set to be '1', and the updated detection result is made and stored as a pseudo label of the real-time image.

S104: the second sample set is composed of each of the local real-time images and each of the pseudo labels.

In one embodiment, referring to fig. 4, the S11 includes:

s111: obtaining a first loss function, a second loss function and reliable weights corresponding to the positive samples;

specifically, the first loss function is used for training in the first stage and calculating the loss value of each sample so as to finish fine tuning training of the model, and the second loss function is used for training in the second stage and calculating the loss value of each sample so as to finish final training of the model; taking yolov3 loss functions as an example, the first loss function comprises;

loss of coordinates:

class loss:

confidence loss:

wherein the content of the first and second substances,

respectively representing coordinate errors, category errors and confidence errors,

whether the jth candidate box of the ith grid in the application scene data acquired by the intelligent camera is a positive sample of the data processing task or not is shown, and if the jth candidate box is a positive sample of the data processing task

Otherwise, it is 0.

Obtaining a first loss function =

。

The second loss function is:

loss of coordinates:

class loss:

confidence loss:

weight channel loss:

the method comprises the following steps that a first supervision function:

and, a second supervisory function:

wherein the content of the first and second substances,

and

representing the reliable weighted error loss for the target in the second sample set and the target in the first sample set respectively,

whether the jth candidate box of the ith grid in the application scene data acquired by the intelligent camera is a positive sample or not is shown, and if the jth candidate box is the positive sample

Otherwise, the value is 0;

the reliable weight of the positive sample of the jth candidate frame of the ith grid is represented, x and y respectively represent the abscissa and the ordinate of the center point of the selection frame, w and n respectively represent the width and the height of the selection frame, C represents the category, P represents the confidence coefficient, S is the number of the grid of the network output layer, B is a fixed reference frame, lambda is a constant, and delta is a preset value of the weight difference.

S112: merging the first sample set and the second sample set, and outputting a third sample set;

specifically, a first sample set preset in the intelligent camera and a second sample set corresponding to a local real-time image acquired by the intelligent camera are combined to obtain a training sample set, and the training sample set is recorded as a third sample set.

S113: performing first-stage training on the current model by using the third sample set through a first loss function, and outputting an intermediate training model;

specifically, the loss value of each positive sample is calculated by using a first loss function, and then the current model is subjected to fine tuning, so that each sample in the third sample set realizes network convergence, an updated model is obtained, and the updated model is recorded as an intermediate training model. The first-stage training of the detection model is carried out by the first loss function and the third sample set, fine tuning training is carried out on loss values obtained by calculating the real detection results based on the real detection results of the detection model on each sample, and no external intervention is added, so that the first-stage training can naturally complete model convergence, and the stability of the current model in the subsequent training process is ensured.

S114: and on the basis of the intermediate training model, performing second-stage training by using the third sample set through a second loss function and the reliable weight, and outputting the verification model.

Specifically, on the basis of the intermediate training model, the second-stage training is continued by the second loss function and the reliable weights corresponding to the positive samples in the third sample set, and finally, a verification model is output; the training of the model is divided into multiple rounds of training, so that the current model is updated once after each round of training is finished; when the next round of training is carried out, the updated current model outputs a new reliable weight for each sample in the third sample set; and then, the current model of the round is finely adjusted again by the second loss function and the corresponding new reliable weight, and the model after the training of the round is updated. And after the established multiple rounds of training are finished, outputting the verification model.

It should be noted that: each round of training comprises a plurality of trainbatchs; every training batch of accomplishing, the model is accomplished and is once updated, consequently, the current model that each training batch corresponds is all inequality, and wherein, training batch is: and selecting a small batch of data from the sample set for training. That is to say, in each round of training of the second-stage training, the reliable weights corresponding to the positive samples in each training batch are calculated by different models, and through multiple rounds of training, the influence of the unreliable labels in the pseudo labels in the training process is reduced, wherein the unreliable labels are the pseudo labels obviously different from most of the positive samples which are difficult to fit.

In an embodiment, in S11, data enhancement is further performed on the first sample set and/or the second sample set, where the data enhancement includes at least one of: image rotation, mosaic material parameters, color gamut conversion and image clipping.

Specifically, in the training process, the same sample set is trained for multiple times by using a data enhancement technology. In one training, randomly rotating the angle of the image within the image angle rotation range, and/or randomly clipping the image within the size range corresponding to the image clipping, and/or randomly adjusting the color gamut of the image within the color gamut adjustment range, and/or mosaic processing the image within the image mosaic material parameter increasing range to obtain a first sample set and a second sample set after data enhancement, then training the first sample set and the second sample set, and outputting a corresponding verification model. Through data enhancement, the trained model can be judged more accurately under different states of the same target, and therefore the detection accuracy and robustness of the intelligent camera in a target scene are improved.

When data enhancement is performed, two or more image data enhancement modes are generally adopted, and when four modes are adopted, processing needs to be sequenced, preferably image rotation is performed firstly, then image clipping is performed, then color gamut conversion is performed, and finally mosaic processing is performed according to mosaic material parameters. This not only greatly reduces the amount of data processing, but also minimizes data distortion in terms of data enhancement.

Specifically, in the training process, the same sample set is trained for multiple times by using data enhancement. In one training, the angle of the image is rotated within the image angle rotation range, preferably the image is rotated to a specified angle, then the image is cut under the condition that the size of the image to be cut is determined, then the color gamut of the image is adjusted within the color gamut adjustment range, then the image is subjected to mosaic processing within the image mosaic material parameter increasing range according to the image mosaic material parameter to obtain a first sample set and a second sample set after data enhancement, then the first sample set and the second sample set are trained, and a corresponding verification model is output. Through data enhancement, the trained model can be judged more accurately under different states of the same target, and therefore the detection accuracy and robustness of the intelligent camera in a target scene are improved.

In one embodiment, referring to fig. 5, the S12 includes:

s121: acquiring a first detection result corresponding to the detection of the current model on the local image;

s122: acquiring a second detection result corresponding to the detection of the local image by the verification model;

s123: and comparing the first detection result with the second detection result, and outputting a target model.

Specifically, after a fine tuning training of the current model is completed once, the target model detects a local real-time image acquired by the intelligent camera to obtain a corresponding second detection result, the second detection result is compared with a first detection result corresponding to the model before fine tuning, and the model corresponding to the detection result with a good effect is selected to serve as the target model. That is, if the first detection result is better than the second detection result, the current model is not updated; if the second detection result is superior to the first detection result, updating the current model into a verification model, and processing the data acquired by the intelligent camera by the updated verification model; such as: the current model is model number 0, the verification model is model number 1, and the current model is updated to model number 1. In an application embodiment, the first output result of the current model and the second output result of the verification model can be directly obtained by a revisit user.

In one embodiment, the fine tuning training further comprises a fine tuning threshold; after the fine tuning threshold is reached, the default effect is optimal, and then training is not performed if the shooting scene is not changed; and the fine tuning threshold is that after the current model is trained for multiple times, the output result corresponding to the current model is superior to the second output result corresponding to the verification model trained every time. Therefore, unnecessary training times and energy consumption can be reduced, and the service life of the intelligent camera is prolonged.

In an embodiment, please refer to fig. 6, before the step S10, the method includes:

s1: acquiring chip computing power of an intelligent camera, a task type and sample data of at least one application scene;

specifically, the chip computing power is the data processing capacity of a chip built in the intelligent camera, and the task types include a classification task and a detection task.

S2: establishing a basic detection model according to the chip computing power, the task type and the sample data;

specifically, the sample data is trained randomly according to the data processing parameters and the task types to obtain a basic detection model, wherein corresponding output nodes are arranged on a network output layer according to different tasks, and the output nodes correspond to detection results of the output task types.

S3: screening the sample data and outputting the first sample set;

specifically, the sample data is 2m images of a target scene, n images are randomly selected from the 2m images as a first sample set, and m and n are positive integers.

S4: and pre-placing the first sample set and the basic detection model into the intelligent camera.

Specifically, before the intelligent camera tests or enters the market, a basic detection model and a corresponding first sample set are implanted into the intelligent camera, then the intelligent camera is arranged in an actual application scene, a local real-time image acquired by the intelligent camera is processed by the basic detection model initially, and in the using process, the current model is subjected to fine adjustment training by using first sample data and second sample data, so that the model is updated, and the detection effect of the model is improved.

By adopting the method for optimizing the intelligent camera detection model by adopting the edge calculation in the embodiment, a first sample set preset in the intelligent camera and a second sample set corresponding to local image data acquired by the intelligent camera are obtained, the first sample set and the second sample set are utilized to carry out fine tuning training on the currently used model locally, the update of the current model is completed, and the check model is output; comparing the detection effect of the local image data of the verification model and the current model, and outputting a target model; training is performed locally by combining a local real-time image of an actual application scene with a preset first sample set, so that the fitting degree of a target model and the actual application scene is improved, and the detection accuracy is improved; meanwhile, the privacy of the user can be protected by locally training the equipment terminal, and the data security is improved.

Example 2

In embodiment 1, the pseudo labels are screened by the confidence threshold value to reduce false labels such as missing labels and wrong labels in the pseudo labels. When the confidence threshold is set unreasonably, false labels such as missing labels and wrong labels exist in the false labels, and the accuracy of the target model after fine tuning training is finally affected. Therefore, in embodiment 2 of the present invention, a further improvement is provided on the confidence threshold in S1031 on the basis of embodiment 1, and a method for selecting a confidence threshold for self-training of an intelligent camera is provided, in an embodiment, please refer to fig. 7, where the method includes:

s20: acquiring a test sample set and a sampling step length of a confidence coefficient threshold;

specifically, a plurality of samples are randomly extracted from a training sample set A to serve as a test sample set, and all samples in the test sample set A are manually verified and added with manual labels; and meanwhile, acquiring the sampling step length of the confidence coefficient threshold value, and outputting a group of confidence coefficient threshold values through the sampling step length.

Sampling step length: setting sampling interval points of the confidence coefficient thresholds within the value range of the confidence coefficient thresholds, wherein the interval distance corresponds to two adjacent confidence coefficient thresholds; if the confidence threshold is 0 to 1, the sampling step length is 0.2; the confidence threshold has sample points of 0, 0.2, 0.4, 0.6, 0.8, and 1.

S21: testing the current model by using the test sample set, and outputting a detection result in the test sample set;

specifically, the detection result may be for the test sample set, or each sample in the test sample set, or each positive sample in each sample; the detection result at least comprises one of the following: confidence, location information, and category information.

S22: outputting mAP values corresponding to the confidence coefficient thresholds one by one according to the confidence coefficient thresholds corresponding to the sampling step length and the detection result;

specifically, under different confidence thresholds, the positive samples of each category may obtain a corresponding AP value (AP) for the detection result of the test sample set; obtaining the average accuracy value (mAP) of each type of AP value from all types of AP values; that is, the mAP values corresponding to the confidence thresholds one to one are obtained.

S23: and outputting a target confidence threshold value of the current model according to each mAP value.

Specifically, comparing each mAP value, and taking a confidence threshold corresponding to the finally selected mAP value as a target confidence threshold.

In one embodiment, referring to fig. 8, the S23 includes:

s231: comparing the mAP values and outputting a target mAP value;

specifically, all the mAP values are compared, and the mAP value meeting the requirement is screened out and used as the target mAP value.

In an embodiment, referring to fig. 9, the S231 includes:

s2311: obtaining a group of confidence coefficient threshold values corresponding to the sampling step length of the confidence coefficient threshold values, and recording as a first confidence coefficient threshold value group;

specifically, a sampling step length of the confidence coefficient threshold value is set, and a group of confidence coefficient threshold values is output according to the sampling step length of the confidence coefficient threshold value and recorded as a first confidence coefficient threshold value group.

S2312: according to the mAP value corresponding to each confidence coefficient threshold value in the first confidence coefficient threshold value set, outputting the confidence coefficient threshold value corresponding to the maximum mAP value as a reference confidence coefficient threshold value, and a previous confidence coefficient threshold value and a next confidence coefficient threshold value which are adjacent to the reference confidence coefficient threshold value;

specifically, mAP values corresponding to the confidence threshold values are compared to obtain the maximum mAP value; and taking the confidence threshold corresponding to the maximum mAP value as a reference confidence threshold, and simultaneously acquiring a confidence threshold after a previous confidence threshold adjacent to the reference confidence threshold.

S2313: reducing the sampling step size, and outputting a first confidence coefficient threshold interval between the previous confidence coefficient threshold and the reference confidence coefficient threshold and a second confidence coefficient threshold interval between the reference confidence coefficient threshold and the next confidence coefficient threshold;

specifically, the interval corresponding to the previous confidence threshold and the reference confidence threshold is recorded as a first confidence threshold interval, and the interval corresponding to the next confidence threshold and the reference confidence threshold is recorded as a second confidence threshold interval. After the reference confidence coefficient threshold value is obtained, adjusting the sampling step length, and obtaining a group of new confidence coefficient threshold values according to the adjusted sampling step length; and extracting confidence coefficient threshold values belonging to the first confidence coefficient threshold value interval and the second confidence coefficient threshold value interval.

S2314: obtaining confidence threshold values of the first confidence threshold value interval and the second confidence threshold value interval to obtain a new first confidence threshold value group;

s2315: repeating S2312 through S2314 until the target mAP value is output.

Specifically, after confidence thresholds of a first confidence threshold interval and a second confidence threshold interval are obtained, the confidence thresholds form a new first confidence threshold group, then mAP values corresponding to the confidence thresholds are compared again, a new reference confidence threshold is output, the process is repeated for multiple times, and finally the output reference confidence threshold is used as a target confidence threshold; by the method, the mAP value corresponding to the target confidence coefficient threshold value can be guaranteed to be the maximum value closest to the mAP curve corresponding to the current model, and the target confidence coefficient threshold value can be used for guaranteeing that the accuracy and the recall rate corresponding to the detection result are most reasonable.

In one embodiment, in S2313, the next sampling step is 1/2 of the previous sampling step.

Specifically, the adjustment mode of the sampling step length is as follows: the next sampling step is 1/2 of the previous sampling step, so that only one new confidence coefficient threshold exists in the first confidence coefficient threshold interval and the second confidence coefficient threshold interval of each round, and the confidence coefficient threshold is always located at the midpoint position corresponding to the first confidence coefficient threshold interval or the second confidence coefficient threshold interval; and then directly comparing the mAP value corresponding to the new confidence threshold with the mAP value corresponding to the previous reference confidence threshold, and determining the new reference confidence threshold and the adjacent confidence threshold, wherein the comparison of the mAP value in each round is carried out at most twice, so that the new reference confidence threshold can be obtained, the data processing amount is reduced, and the calculation efficiency is improved. Referring to fig. 10, let b be the sampling step size before adjustment, a be the sampling step size after adjustment, a be 1/2 of b; when the sampling step length is B, the mAP value of the corresponding confidence coefficient threshold value is ●, the mAP value corresponding to the obtained reference confidence coefficient threshold value is B, the mAP value corresponding to the previous confidence coefficient threshold value is D, the mAP value corresponding to the next confidence coefficient threshold value is F, then the sampling step length is adjusted to a, O and ● are obtained, the mAP value corresponding to the confidence coefficient threshold value in the first confidence coefficient threshold value interval is C, the mAP value corresponding to the confidence coefficient threshold value in the second confidence coefficient threshold value interval is A, the mAP values of the A and C are compared with the mAP value of the B, and the confidence coefficient threshold value corresponding to the maximum mAP value is used as a new reference confidence coefficient threshold value; the process is repeated, a target confidence coefficient threshold value is finally obtained, data processing amount is reduced, and meanwhile calculation efficiency is improved. In a real-time mode, if the mAP value of the point A is larger than the mAP value of the point B, the mAP value comparison of the point C and the mAP value of the point B is not carried out, and the confidence coefficient threshold value corresponding to the point A is used as a new reference confidence coefficient threshold value; if the mAP value of the point A is smaller than the mAP value of the point B, comparing the mAP values of the point B and the point C, if the mAP value of the point C is also smaller than the mAP value of the point B, the confidence coefficient threshold value corresponding to the point B is still the reference confidence coefficient threshold value, at the moment, the sampling step length is adjusted to be a/2, the process is repeated, and the target confidence coefficient threshold value is finally obtained.

S232: and according to the target mAP value, taking a confidence coefficient threshold value corresponding to the target mAP value as the target confidence coefficient threshold value.

In one embodiment, referring to fig. 11, the S22 includes:

s221: according to the confidence coefficient thresholds and the detection results of the samples in the test sample set, outputting the accuracy and the recall rate of each category corresponding to each confidence coefficient threshold one to one;

specifically, the detection result at least comprises one of the following: number of positive samples, location information, confidence and category information; determining the accuracy and recall rate of the current model for detecting each category in the test sample set according to the confidence threshold; different confidence thresholds correspond to different accuracy rates and recall rates.

S222: and outputting mAP values corresponding to the confidence degree thresholds one by one according to the accuracy rate and the recall rate corresponding to each category.

In an embodiment, referring to fig. 12, the step S221 includes:

s2211: according to the detection result, the formula

Outputting the quasi-information corresponding to each categoryDetermining the rate;

s2212: according to the detection result, the formula

Outputting the recall rate corresponding to each category;

s2213: outputting mAP values corresponding to the confidence degree thresholds one by one according to the accuracy rate and the recall rate corresponding to each category;

the confidence coefficient of the detection result with TP as the positive sample is greater than the confidence coefficient threshold value; the confidence coefficient in the detection result with FP as the positive sample is less than the confidence coefficient threshold value; FN is a positive sample which is not detected at the corresponding position; precision is the accuracy and recalling is the Recall.

In one embodiment, the S13 includes:

the method comprises the following steps of firstly, obtaining confidence degrees of all positive samples of a detection result of a test sample set and confidence degree threshold values corresponding to sampling step lengths;

secondly, sequencing the confidence degrees of the positive samples and outputting a confidence degree sequence;

thirdly, carrying out percentage sampling on the confidence level sequence to obtain a plurality of sampling confidence levels;

and fourthly, comparing the mAP value corresponding to each sampling confidence coefficient with the mAP value corresponding to each confidence coefficient threshold, and outputting the confidence coefficient corresponding to the maximum mAP value as a target confidence coefficient threshold.

Specifically, a test sample set is input into a current model, the current model detects each sample, and a detection result is output; sequencing the confidence degrees of all positive samples in the detection result, sampling according to the percentage, obtaining the confidence degrees corresponding to a plurality of sampling points, and recording as the sampling confidence degrees; such as: detecting 10 positive samples to obtain a confidence coefficient sequence with the length of 10, sampling according to 2%, taking out confidence coefficients at the positions of 2, 4, 6, 8 and 10, then comparing the 5 confidence coefficients with mAP values corresponding to a plurality of confidence coefficient thresholds corresponding to sampling step length, and selecting the confidence coefficient threshold corresponding to the maximum mAP value or the sampling confidence coefficient as a target confidence coefficient threshold.

By adopting the method for selecting the self-training confidence threshold of the intelligent camera, a plurality of confidence thresholds are obtained by setting the sampling step length, the current model tests the test sample set, and the detection result is output; the mAP values corresponding to the confidence coefficient thresholds one by one are obtained from the detection result, all the mAP values are compared, the confidence coefficient threshold corresponding to the maximum mAP value is output as the target confidence coefficient threshold, and the accuracy and the recall rate of the detection result can be enabled to be optimal. When the detection model needs to be subjected to fine tuning training, the method can be directly adopted locally to determine the confidence coefficient threshold value, false labels such as label missing and label error in the false labels can be reduced, the quality of the false labels is improved, the effect of detection model training is improved, local data does not need to be uploaded to a server, and the safety of the data and the privacy of users can be guaranteed.

Example 3

In the embodiment 1 and the embodiment 2, a reasonable confidence threshold value is set, so that false labels such as label missing and label error in the false labels can be reduced; however, false labels such as missing labels and false labels still exist in the screened false labels, and if the current model is directly subjected to fine tuning training, the updated model is still affected by the false labels, so that the detection accuracy of the updated model is not high. Therefore, in embodiment 3 of the present invention, based on another aspect of embodiment 1 and/or embodiment 2 of the present invention, further aiming at further improving the current model fine tuning training by using the first sample set and the second sample set in S11, a method for self-training of an intelligent camera detection model is provided, in an embodiment, please refer to fig. 13, where the method includes:

s30: acquiring a training sample set, a first loss function and a second loss function which are used for training a current model of an intelligent camera, and reliable weights corresponding to positive samples in the training sample set;

specifically, a basic detection model which can be used in a plurality of application scenes is built in the intelligent camera, and the basic detection model is obtained by training a training set A which is formed by collected training samples of the plurality of application scenes; the training sample set comprises a plurality of samples (hereinafter referred to as a first sample set) randomly extracted from the training set A and a plurality of samples (hereinafter referred to as a second sample set) obtained in an actual application scene in which the intelligent camera is installed; and obtaining the reliable weight of each positive sample output by the model network layer after the training sample set is sent into the current model, and fine-tuning a first loss function of the first stage training and a second loss function of the second stage training in the training.

It should be noted that: the current model can be a basic detection model or a trained model; that is, when training is performed according to a training sample set for the first time, the current model is the basic detection model, and when training is performed according to a new training sample set for the second time, if the effect of the model trained for the first time is better than that of the basic detection model, the current model trained for the second time is the verification model obtained by the first training; if the effect of the model trained for the first time is worse than that of the basic detection model, the current model trained for the second time is still the basic detection model, and so on, the current model trained for the Nth time can be a training verification model obtained by training for the (N-1) th time or a current model corresponding to fine tuning training for the (N-1) th time.

In an embodiment, please refer to fig. 14, a weight channel for outputting a reliable weight is added to the network output layer, and the S30 includes:

specifically, referring to fig. 15, a weights channel is added in the target detection network for outputting a weight channel with reliable weight; the left side is an output layer of a common target detection network, a bbox channel represents a target position, a confidence channel is a target foreground confidence coefficient, and a classes channel is the probability that a target belongs to each class. The network output with the added weights channels is on the right. After the target detection network finishes the detection of any sample, the weight channel output is processed by an activation function to obtain the reliable weight of each positive sample in the sample.

It should be noted that: the data processing tasks of the current model for the application scene acquired by the intelligent camera comprise: detecting and classifying tasks; for different task weights channels, corresponding output nodes are added to output the reliable weight of each positive sample.

specifically, gridding is carried out on a sample obtained by an intelligent camera, and when a target object at a certain position is detected to meet a data processing task, position information of the target is obtained; the mapping function may be a Sigmoid function.

Specifically, after a target object conforming to a data processing task is detected, a network channel corresponding to an output layer of a target detection network outputs a corresponding detection result to the target; making a pseudo label of the positive sample corresponding to the position according to the confidence coefficient, the position information and the category information of the inspection result; meanwhile, the weights channel outputs a value corresponding to the position information, and the value outputs the reliable weight of the positive sample of the position through the mapping relation of the Sigmoid function.

It should be noted that: the confidence for the sample may be further confidence processed, such as: setting a confidence threshold value, and then performing binarization processing; taking the processed detection result as a pseudo label; for a detailed implementation, please refer to embodiment 2, which is not described herein.

In an embodiment, each round of training includes a plurality of training batchs, in S10, the training sample set includes a preset first sample set and a locally acquired second sample set, and positive samples corresponding to samples belonging to the second sample set in each training batch are sorted according to a reliable weight, and a reliable weight sequence is output.

Specifically, in the first-stage training process and the second-stage training process, each round of training comprises a plurality of training batchs (the training batchs are small batches of data selected from a sample set for training); the samples in each training batch include samples from a first sample set and/or a second sample set, such as: each training batch comprises Q samples, y samples from a first set of samples, z samples from a second set of samples, wherein Q = y + z; q, y and z are integers which are more than or equal to 0; in each round of training process, the weights channel outputs corresponding reliable weights according to the reliability of each positive sample in the second sample set, and it can be understood that pseudo labels corresponding to most positive samples which are difficult to fit are given lower reliable weights, and reliable positive samples are given higher reliable weights; and then sequencing the reliable weights to obtain a reliable weight sequence.

It should be noted that: the labels of the positive samples in the first sample set in the training sample set are artificial labels, and the labels in the second sample set are pseudo labels.

In one embodiment, referring to fig. 16, the S30 includes:

Specifically, the reliable weights corresponding to all positive samples belonging to the second sample set in each training batch are sorted in a descending manner; and setting the ratio of the high reliable weight as alpha, and the ratio of the low reliable weight as 1-alpha, and obtaining the number of the high reliable weight by the formula M = alpha N.

specifically, calculating an original loss value of a training sample set by using a first loss function; then, fine-tuning the current model according to the original loss value to realize network convergence, thereby ensuring the integrity of the current model and outputting an intermediate training model for the next-stage training; the pseudo tag includes a correct tag and an incorrect tag.

Specifically, after the first-stage training of the current model is completed, an intermediate training model after network convergence is obtained, which is trained by all samples, then the second-stage training is continued on the basis of the intermediate training model by using a second loss function and the reliable weights corresponding to all positive samples, and finally a verification model is output; each round of training of the second stage of training outputs the reliable weight corresponding to each positive sample, and the reliable weight of each positive sample is combined with the second loss function to adjust the current model so as to update the model; after the multi-round training of the second-stage training is finished, outputting a verification model; that is to say, the second stage training is continued according to the second loss function and different corresponding reliable weights of each round, the influence of the unreliable labels in the pseudo labels in the training process is reduced through multiple rounds of training, and finally a target training model is output; among them, an unreliable label is a pseudo label that is significantly different from the corresponding of most of the positive samples that are difficult to fit.

It should be noted that: the reliable weight of each positive sample in each round of training is independent; that is, when the current round of training is performed, the current model outputs corresponding reliable weights to each positive sample to participate in the current round of model training; during the next round of training, the current model corresponding to the next round outputs corresponding reliable weights to each positive sample to participate in the next front round of model training; there is no mapping between the reliable weights for either two rounds of training.

It should be noted that: each round of training comprises a plurality of training batchs, and each training batch is finished; the model is updated once; that is, in each training batch, the models corresponding to the reliable weights corresponding to the output positive samples are different; meanwhile, the reliable weights of the positive samples in each training batch are independent of each other.

In one embodiment, referring to fig. 17, in each training batch, when the positive sample belongs to the first sample set, the second loss function includes a first supervised function of the weight channel, and the S32 includes:

specifically, the first supervision function is used for supervising each training batch, so that the reliable weight of each positive sample belonging to the first sample set and output by the weights channel after the second-stage training is finished integrally meets the condition that the reliable weight is larger than a preset value, the first loss function can ensure that the positive sample of the artificial label has high reliable weight, and the loss value of the positive sample with the artificial label during model fine tuning training is minimum.

specifically, after the samples of the first sample set of the currently trained batch are sent to the current model, the weights channel of the network output layer outputs corresponding reliable weights to the positive samples.

S322: according to each reliable weight and the first supervision function

Performing the current training batch through the second loss function;

specifically, according to the reliable weight of each positive sample and the first supervision function, the parameters of the current model are adjusted through the first loss function, and the current model is updated by the current training batch.

specifically, under the action of a first supervision function, the reliable weights of the positive samples of the first sample set output by the weights channel of the final target training model are integrally larger than the preset value gamma of the reliable weights through multi-round training in the second training stage; that is, the reliable weights for the positive samples in the first set of samples of the final output are mostly greater than γ, and very few may be less than γ; thereby ensuring that the overall loss of positive samples of the first sample set is small.

In one embodiment, referring to fig. 18, in each training batch, when the positive sample belongs to a sample of the second sample set, the second loss function includes a second supervised function of the weight channel, and the S32 includes:

specifically, the second supervision function is used for supervising each training batch, so that each positive sample belonging to the second sample set and output by the weights channel has a first weight average value W of a high-reliability weight after the second-stage training is finished_HSecond weight average W corresponding to low reliable weight_LIf the difference is larger than delta, the network is forced to distinguish high reliability weight from low reliability weight; the reliable weight of the error label in the pseudo label is gradually reduced in the training process; the accuracy of model self-training is improved.

specifically, after the samples of the second sample set of the currently trained batch are sent into the current model, the weights channel of the network output layer outputs corresponding reliable weights to the positive samples, and then the reliable weights are sequenced to obtain a reliable weight sequence; the detection result at least comprises one of the following: confidence information, category information, and location information.

S325: according to each of said high reliability weights, by formula

S326: according to each of said low reliable weights, by formula

Performing the current training batch through the second loss function;

wherein M is the number of high reliable weights in a training batch, N is the total number of positive samples belonging to the second sample set in a training batch, and W is the total number of positive samples belonging to the second sample set in a training batch_aThe weight is the a-th reliable weight in the high reliable weights; w_bThe b-th reliable weight in the low reliable weight is obtained, and delta is a weight difference preset value.

In particular, in the second supervision function

Under the action of the first training stage, the final first weight average value W of the high-reliability weights output by the weights channel of the target training model is obtained through multi-round training of the second training stage_HSecond weight average W corresponding to low reliable weight_LThe difference of (d) is greater than δ; forcing the network to distinguish between high and low reliable weights; and reducing the influence of unreliable labels in the pseudo labels in the training process, and finally outputting the target training model.

In one embodiment, in the S20, the first loss function includes at least one of: coordinate loss, category loss, and confidence loss; the second loss function includes at least one of: coordinate loss, category loss, confidence loss, and weight channel loss.

Specifically, taking yolov3 loss function as an example, the first loss function includes:

loss of coordinates:

class loss:

confidence loss:

wherein the content of the first and second substances,

Otherwise, it is 0.

Obtaining a first loss function =

。

The second loss function is:

loss of coordinates:

class loss:

confidence loss:

weight channel loss:

the method comprises the following steps that a first supervision function:

and, a second supervisory function:

get the second loss function =

。

Wherein the content of the first and second substances,

and

Otherwise, the value is 0;

the reliable weight of the positive sample of the jth candidate box of the ith grid is represented, x and y represent the abscissa and the ordinate of the center point of the box respectively, w and n represent the width and the height of the box respectively, C represents the category, P represents the confidence coefficient, S is the number of grids of the network output layer, B is a fixed reference box, lambda, sigma and theta are constants, and delta is a preset value of the weight difference.

By adopting the method for self-training the intelligent camera model, the intelligent camera detects a training sample set formed by locally acquired image data, the loss value of each sample is calculated locally by directly utilizing a first loss function, then fine tuning training of a first stage is carried out on the detection model, and convergence of the model is completed, so that the possibility of local training of the model is provided; in the process of finishing the first-stage training, performing second-stage training on the detection model by introducing reliable weight and a second loss function, and under the action of the reliable weight and the second loss function, enabling the model to finish fine-tuning training along the expected direction under the condition of lacking human intervention; the method and the device have the advantages that fine tuning training of the detection model is directly completed at the user side, local data do not need to be uploaded to a server, and data safety and user privacy can be guaranteed.

Second embodiment

Example 4

Embodiment 4 of the present invention further provides an apparatus for optimizing an intelligent camera detection model by using edge calculation, as shown in fig. 19, including:

a data acquisition module: the intelligent camera is used for acquiring a preset first sample set and a second sample set corresponding to local image data acquired by the intelligent camera;

a model training module: the system comprises a first sample set, a second sample set and a third sample set, wherein the first sample set and the second sample set are used for carrying out fine tuning training on a current model to obtain a check model;

a model checking module: and the system is used for comparing the current model with the verification model and outputting a target model.

The device for optimizing the intelligent camera detection model by adopting the edge calculation of the embodiment is adopted to obtain a first sample set preset in the intelligent camera and a second sample set corresponding to local image data acquired by the intelligent camera, and the first sample set and the second sample set are utilized to perform fine tuning training on the currently used model locally, so that the updating of the current model is completed, and a check model is output; comparing the detection effect of the local image data of the verification model and the current model, and outputting a target model; training is performed locally by combining a local real-time image of an actual application scene with a preset first sample set, so that the fitting degree of a target model and the actual application scene is improved, and the detection accuracy is improved; meanwhile, the privacy of the user can be protected by locally training the equipment terminal, and the data security is improved.

Example 5

In example 4, pseudo-tags were screened by confidence threshold; so as to reduce false labels such as label missing, label error and the like in the false label; when the confidence threshold is set unreasonably, false labels such as missing labels and wrong labels exist in the false labels, and the accuracy of the target model after fine tuning training is affected finally; therefore, a corresponding sub-apparatus for further improving the confidence threshold is proposed on the basis of embodiment 4.

Please refer to fig. 20, which includes:

a sample data module: a sampling step size for obtaining a test sample set and a confidence threshold;

a data detection module: the device is used for testing the current model by utilizing the test sample set and outputting a detection result corresponding to each positive sample in the test sample set;

a data processing module: the system is used for outputting mAP values which are in one-to-one correspondence with the confidence coefficient thresholds according to the confidence coefficient thresholds corresponding to the sampling step length and the detection result;

a target output module: and the target confidence coefficient threshold value of the current model is output according to each mAP value.

By adopting the device for selecting the confidence coefficient threshold of the intelligent camera sample, a plurality of confidence coefficient thresholds are obtained by setting a sampling step length, a test sample set is tested by the current model, and a detection result is output; the mAP values corresponding to the confidence coefficient thresholds one by one are obtained from the detection result, all the mAP values are compared, the confidence coefficient threshold corresponding to the maximum mAP value is output as the target confidence coefficient threshold, and the accuracy and the recall rate of the detection result can be enabled to be optimal. When the detection model needs to be subjected to fine tuning training, the method can be directly adopted locally to determine the confidence coefficient threshold value, false labels such as label missing and label error in the false labels can be reduced, the quality of the false labels is improved, the effect of detection model training is improved, local data does not need to be uploaded to a server, and the safety of the data and the privacy of users can be guaranteed.

Example 6

In the embodiments 4 and 5, a reasonable confidence threshold is set, so that false tags such as missing tags and wrong tags in the pseudo tags can be reduced; however, false labels such as missing labels and wrong labels still exist in the screened false labels, and if the fine tuning training is directly performed on the current model, the updated model is still affected by the false labels, so that the detection accuracy of the updated model is not high, and therefore, the method is further provided on the basis of embodiment 4 and/or embodiment 5, and the fine tuning training of the current model is further improved by using the first sample set and the second sample set in the fine tuning training of the model;

please refer to fig. 21, which includes:

a training data module: the method comprises the steps of obtaining a training sample set, a first loss function, a second loss function and reliable weights corresponding to positive samples in the training sample set, wherein the training sample set, the first loss function and the second loss function are used for training a current model of the intelligent camera;

By adopting the device for self-training the intelligent camera model, the intelligent camera detects a training sample set formed by locally acquired image data, the loss value of each sample is calculated locally by directly utilizing a first loss function, then fine tuning training of a first stage is carried out on the detection model, and convergence of the model is completed, so that possibility is provided for local training of the model; in the process of finishing the first-stage training, performing second-stage training on the detection model by introducing reliable weight and a second loss function, and under the action of the reliable weight and the second loss function, enabling the model to finish fine-tuning training along the expected direction under the condition of lacking human intervention; the method and the device have the advantages that fine tuning training of the detection model is directly completed at the user side, local data do not need to be uploaded to a server, and data safety and user privacy can be guaranteed.

The third embodiment is as follows:

the present invention provides an electronic device and storage medium, as shown in FIG. 22, comprising at least one processor, at least one memory, and computer program instructions stored in the memory.

Specifically, the processor may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present invention, and the electronic device includes at least one of the following: the wearing equipment that has intelligent camera, the mobile device that has intelligent camera.

The memory may include mass storage for data or instructions. By way of example, and not limitation, memory may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is non-volatile solid-state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The processor reads and executes the computer program instructions stored in the memory to realize any one of the methods for optimizing the intelligent camera detection model by adopting edge calculation, the method for selecting the sample confidence threshold value and the method for self-training the model in the above embodiment modes.

In one example, the electronic device may also include a communication interface and a bus. The processor, the memory and the communication interface are connected through a bus and complete mutual communication.

The communication interface is mainly used for realizing communication among modules, devices, units and/or equipment in the embodiment of the invention.

A bus comprises hardware, software, or both that couple components of an electronic device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. A bus may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

In summary, embodiments of the present invention provide a method for optimizing an intelligent camera detection model by using edge calculation, a sample confidence threshold selection method, a method, an apparatus, a device, and a storage medium for model self-training.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A self-training method of an intelligent camera model is characterized by comprising the following steps:

2. The smart camera model self-training method of claim 1, wherein a weight channel for outputting a reliable weight is added to a network output layer, and the S30 includes:

3. The smart camera model self-training method of claim 2, wherein each round of training includes a plurality of training batchs, in the S30, the training sample set includes a preset first sample set and a locally acquired second sample set, each positive sample corresponding to a sample belonging to the second sample set in each training batch is sorted according to a reliable weight, and a reliable weight sequence is output.

4. The smart camera model self-training method of claim 3, wherein the S30 comprises:

5. The smart camera model self-training method of claim 3, wherein in each training batch, when the positive sample belongs to the first sample set, the second loss function further comprises a first supervision function of a weight channel, and the S32 comprises:

s322: according to each reliable weight and the first supervision function

Performing the current training batch through the second loss function;

6. The smart camera model self-training method of any one of claims 3 to 5, wherein in each training batch, when a positive sample belongs to a second sample set, the second loss function further comprises a second supervision function of a weight channel, and the S32 comprises:

s325: according to each of said high reliability weights, by formula

S326: according to each of said low reliable weights, by formula

Performing the current training batch through the second loss function;

7. A method for optimizing an intelligent camera detection model by using edge calculation, comprising the intelligent camera model self-training method of any one of claims 1 to 6.

8. The utility model provides an intelligence camera model is from trainer which characterized in that includes:

9. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the method of any of claims 1-7.

10. A storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1-7.