CN116129368A - Picture processing method, device, system and storage medium - Google Patents

Picture processing method, device, system and storage medium Download PDF

Info

Publication number
CN116129368A
CN116129368A CN202211667478.0A CN202211667478A CN116129368A CN 116129368 A CN116129368 A CN 116129368A CN 202211667478 A CN202211667478 A CN 202211667478A CN 116129368 A CN116129368 A CN 116129368A
Authority
CN
China
Prior art keywords
bounding boxes
picture
target picture
bounding
intra
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211667478.0A
Other languages
Chinese (zh)
Inventor
马瑞峰
曹斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Reach Automotive Technology Shenyang Co Ltd
Original Assignee
Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Reach Automotive Technology Shenyang Co Ltd filed Critical Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority to CN202211667478.0A priority Critical patent/CN116129368A/en
Publication of CN116129368A publication Critical patent/CN116129368A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method, a device, a system and a storage medium for processing pictures. Is applied to the field of automobiles. The method comprises the following steps: in response to receiving the target picture, a plurality of bounding boxes corresponding to the target picture are determined. And determining the conflict coefficient of the target picture according to the multiple bounding boxes, so as to determine the target picture as the picture of the training target object recognition model according to the conflict coefficient of the target picture. The conflict coefficient is used for representing the difficulty of identifying the target object by the picture of the target picture. Therefore, high-value pictures are screened out through the conflict coefficients, low-value pictures are filtered out, and the high-value pictures are used as pictures for training the recognition model, so that the problem that labor cost is high when the pictures are labeled due to the fact that the vehicle recognition model is trained by the low-value pictures is avoided.

Description

Picture processing method, device, system and storage medium
Technical Field
The present disclosure relates to the field of vehicle identification, and in particular, to a method, an apparatus, a system, and a storage medium for processing a picture.
Background
With the development of automobile technology, a vehicle identification model has become an important technical means for detecting traffic jams, optimizing traffic and the like as vehicles can be identified from complex scenes.
In the prior art, massive image data are often collected in advance, and then vehicle recognition model training is carried out. However, these massive amounts of picture data include high-value pictures and low-value pictures. The high-value picture is a picture with a vehicle related information value higher than a preset related threshold value in the picture data. The low-value picture is a picture of which the value of information related to the vehicle in the picture data is lower than a preset related threshold value. For example, the picture data does not contain vehicle information, and the picture is a low-value picture. These low value pictures have no effect on the training of the vehicle identification model and can increase the labor cost when labeling the pictures.
Disclosure of Invention
In view of this, the present application provides a method, apparatus, system and storage medium for processing pictures, which aims to reduce the labor cost when labeling pictures by screening high-value pictures from a large number of pictures as pictures for training a vehicle recognition model.
In a first aspect, the present application provides a method for processing a picture, the method including:
determining a plurality of bounding boxes corresponding to a received target picture in response to the target picture;
calculating boundary frame deviations of the plurality of boundary frames according to the plurality of boundary frames;
according to the boundary frame deviations of the boundary frames, determining conflict coefficients of the target picture; the conflict coefficient is used for indicating the identification difficulty of the target object in the target picture;
and responding to the conflict coefficient being larger than a preset conflict threshold, and taking the target picture as a picture of a training target object identification model.
Optionally, the method further comprises:
obtaining intra-frame predicted values of the plurality of boundary frames;
calculating intra-frame prediction deviations of the plurality of bounding boxes according to intra-frame prediction values of the plurality of bounding boxes;
the determining, according to the plurality of bounding box deviations, a conflict coefficient of the target picture includes:
and determining a conflict coefficient of the target picture according to the boundary box deviation and the intra-boundary box prediction deviation of the plurality of boundary boxes.
Optionally, the predicted value in the boundary box is a softmax layer output result;
the calculating the intra-frame prediction bias of the plurality of bounding boxes according to the intra-frame prediction values of the plurality of bounding boxes comprises
Dividing the softmax layer output results of the bounding boxes into softmax layer output results of a plurality of targets;
outputting a result by a softmax layer of the same target, and acquiring a prediction deviation in a boundary frame of each target in the plurality of targets based on cross entropy calculation;
and averaging the intra-frame prediction deviations of the plurality of targets to obtain the intra-frame prediction deviations of the plurality of boundary frames.
Optionally, the determining the conflict coefficient of the target picture according to the bounding box deviation and the intra-bounding box prediction deviation of the bounding boxes includes:
substituting the boundary frame deviation and the prediction deviation in the boundary frame into a preset conflict coefficient calculation formula, and calculating to obtain a conflict coefficient of the target picture;
the preset conflict coefficient calculation formula has a positive correlation with the boundary frame deviation and has a positive correlation with the intra-boundary frame prediction deviation.
Optionally, the plurality of bounding boxes includes a plurality of first bounding boxes and a plurality of second bounding boxes;
the determining a plurality of bounding boxes corresponding to the target picture includes:
inputting the target picture into a first preset model, and acquiring the plurality of first boundary frames; inputting the target picture into a second preset model to obtain a plurality of second bounding boxes;
wherein the first preset model and the second preset model are models with different principles;
the calculating the bounding box deviation of the bounding boxes according to the bounding boxes comprises the following steps:
calculating the bounding box deviations of the plurality of bounding boxes by using the plurality of first bounding boxes and the plurality of second bounding boxes based on a preset bounding box deviation calculation formula;
and the boundary frame deviation calculation formula is in positive correlation with a plurality of boundary frame difference degrees.
Optionally, the preset bounding box deviation calculation formula is a GIoU calculation formula.
In a second aspect, the present application provides an apparatus for processing a picture, the apparatus comprising:
a first response unit, configured to determine a plurality of bounding boxes corresponding to a received target picture in response to the target picture;
a first calculation unit configured to calculate a bounding box deviation of the plurality of bounding boxes from the plurality of bounding boxes;
a determining unit, configured to determine a collision coefficient of the target picture according to a bounding box deviation of the bounding boxes; the conflict coefficient is used for indicating the identification difficulty of the target object in the target picture;
and the second response unit is used for responding to the conflict coefficient being larger than a preset conflict threshold value, and taking the target picture as a picture of a training target object identification model.
Optionally, the apparatus further includes:
a second calculation unit configured to calculate intra-frame prediction deviations of the plurality of bounding boxes; the second calculating unit is specifically configured to: obtaining intra-frame predicted values of the plurality of boundary frames; calculating intra-frame prediction deviations of the plurality of bounding boxes according to intra-frame prediction values of the plurality of bounding boxes;
the determining unit is specifically configured to determine a collision coefficient of the target picture according to the bounding box deviations of the bounding boxes and the intra-bounding box prediction deviations.
In a third aspect, the present application provides a vehicle system comprising an apparatus for picture processing as in the second aspect.
In a fourth aspect, the present application provides a computer storage medium having code stored therein, which when executed, causes an apparatus for executing the code to carry out the method of any one of the preceding aspects.
The application discloses a method, a device, a system and a storage medium for processing pictures. In executing the method, a bounding box corresponding to a target picture is first determined in response to a received target picture. And determining boundary frame deviations of the boundary frames according to the boundary frames, and calculating conflict coefficients of the target pictures according to the boundary frame deviations so as to determine the target pictures as pictures for training the vehicle recognition model according to the conflict coefficients of the target pictures. The conflict coefficient is used for indicating the difficulty of identifying the target object in the target picture. Therefore, high-value pictures are screened out through the conflict coefficients, low-value pictures are filtered out, and the high-value pictures are used as pictures for training the recognition model, so that the problem that labor cost is high when the pictures are marked due to the fact that the vehicle recognition model is trained by the low-value pictures is avoided.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a flowchart of a method for processing a picture according to an embodiment of the present application;
fig. 2 is a schematic diagram of a picture according to an embodiment of the present application;
FIG. 3 is a schematic diagram of acquiring a plurality of first bounding boxes and a plurality of second bounding boxes according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of bounding box calculation according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a device for processing a picture according to an embodiment of the present application.
Detailed Description
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
As described above, when training the vehicle recognition model, a large amount of image data is collected first, and then the large amount of image data is input into the vehicle recognition model to train and recognize the vehicle. But the collected massive picture data may include low-value pictures, i.e. pictures with vehicle related information lower than a preset related threshold. Inputting such low value pictures into a vehicle identification model does not improve the model training effect, but increases the labor cost for labeling subsequent pictures.
Based on the above, the application provides a picture processing method, which aims at calculating a conflict coefficient of a target picture by acquiring different boundary frames corresponding to the target picture and calculating the boundary frame deviation, screening out a high-value picture according to the conflict coefficient, and filtering out a low-value picture. And taking the high-value picture as a target picture of the training recognition model to avoid the problem of high labor cost when the low-value picture enters the training vehicle recognition model to mark the picture.
The following describes a method for processing a picture provided in an embodiment of the present application in detail with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for processing a picture is provided in an embodiment of the present application. The method can be applied to a detection system for detecting the vehicle, and the execution subject of the method is a detection server in the detection system. The method comprises the following steps:
s101: the detection server receives the massive picture data.
And a detection server of the detection system receives the massive picture data. Where the mass of pictures may include high value pictures and low value pictures. The high-value picture is a picture of which the target related information value is higher than a preset related threshold value. The low-value picture is a picture of which the information value related to the object in the picture data is lower than a preset related threshold value.
It will be appreciated that low value pictures are used to indicate that the difficulty of picture object detection is low, such as a simple layout of the picture or fewer or no objects. Exemplary description: assuming that the target is a "vehicle", the low value picture is a picture with fewer "vehicles" or no vehicles. Referring to fig. 2, a schematic diagram of a picture is provided in an embodiment of the present application. As shown in fig. 2 (a), a low-value picture provided in the embodiment of the present application does not include vehicle information.
The high-value picture is used for indicating that the difficulty of target detection in the picture is high, and the reason for the high difficulty is many, such as more 'containing targets'. Exemplary description: assuming that the target object is a "vehicle", as shown in fig. 2 (b), a schematic diagram of a high-value picture provided in an embodiment of the present application contains more vehicles.
In the embodiment of the application, in a mass of pictures, the training effect of low-value pictures on the vehicle identification model is extremely low, but the labor cost when the pictures are labeled is increased. Thus, in training a vehicle identification model, low value pictures need to be filtered out, exemplary illustration: in the vehicle identification model, the picture of the type of fig. 2 (a) is filtered out. Only high-value pictures are selected, and the method comprises the following steps of: in the vehicle recognition model, a picture of the type (b) of fig. 2 is selected as a sample of model training.
S102: a plurality of bounding boxes corresponding to the target picture are determined.
And detecting mass pictures received by the server, including target pictures. In the embodiment of the present application, the target picture may be any picture in a mass of pictures.
In order to screen out high value pictures, low value pictures are filtered out, in one possible implementation, the target picture is processed, and a plurality of bounding boxes corresponding to the target picture are obtained.
In the embodiment of the present application, a plurality of bounding boxes may be acquired in a plurality of ways, for example, through different recognition models, and a plurality of bounding boxes corresponding to the target object in the target picture and target prediction values in the bounding boxes are acquired.
In one possible implementation, the plurality of bounding boxes includes a plurality of first bounding boxes and a plurality of second bounding boxes. The bounding box is used for representing the position of the object in the object picture, and can be a rectangular box or a block diagram with other shapes. The bounding box is not limited here. The following description will be made by taking rectangular frames as examples.
In the embodiment of the application, the first bounding box and the second bounding box are bounding boxes corresponding to the target picture, which are acquired through different methods.
In one possible implementation, the first bounding box may be obtained by inputting the target picture into a first preset model, and the second bounding box may be obtained by inputting the target picture into a second preset model. The first preset model and the second preset model are different in implementation principle, but the input is a target picture, and the output is two preset models of a boundary box.
In one possible implementation, the first predetermined model is a first order object recognition model, such as yolo series algorithm. The second preset model is obtained for a second order target recognition model, such as a master RCNN model.
In one possible implementation, the first predetermined model is a second order object recognition model, and the second predetermined model is a first order object recognition model.
It should be noted that, in the embodiments of the present application, the first bounding box and the second bounding box may be obtained in other manners, and may be adjusted by those skilled in the art as required.
In order to improve the screening efficiency and the screening preparation rate of the massive picture data, the second-order target recognition model can select a master RCNN model, and the first-order target recognition model selects a yolV 7 model.
YoloV7 employs a trainable BOF approach, including planning a reparameterized convolution, using E-ELAN (extended high-efficiency layer aggregation network) and model scaling architecture based on a tandem model. Model scaling based on a series model may take into account: resolution (i.e., size of the input image), width (i.e., number of channels), depth (i.e., number of network layers), etc.
Multiple models are trained using different training data but the same settings. Their weights are then averaged to obtain the final model. The model weights for different epochs are averaged to obtain a planned re-parameterized convolution.
Therefore, the processing speed and the processing accuracy are higher than those of other models, such as yoloV4 and yoloV7 models.
The specific implementation flow of the yoloV7 acquisition bounding box is as follows:
s1021: the number of categories is set.
For a large-size picture, such as 640 x 640, or 1280 x 1280, firstly, presetting the grid number of picture division and the predicted frame number of each grid. For example, the frame is divided into 7*7 grids, and each grid predicts 1 frame.
S1022: the pictures are scaled to 448 x 448, input to the CNN convolutional network process, and output to the bounding box.
CNN convolutional networks include two fully connected processes, rolling and pooling. And obtaining the bounding box to be 7 x 2 through processing, and outputting the obtained bounding box.
S1023: setting a threshold filtering frame.
The scores of the vehicles in all bounding boxes are calculated, including the bounding box confidence and bounding box position. The highest score in each bounding box is obtained and its index is recorded. The mask is filtered and created according to the threshold. The threshold filtered score, bounding box, and rank are output using the mask value.
S1024: non-maximum suppression is performed.
The bounding box with the highest score value is selected and added to the output list, and is deleted from the bounding list. And calculating the IOU of the boundary box with the highest score value and the IOU of other candidate boxes, and deleting the boundary box of which the IOU is larger than the set threshold value IOU. And repeatedly executing until the bounding box list is empty.
S1025: and displaying the bounding box of the final output list in the target picture.
The Faster RCNN network framework can be divided into 2 parts according to the model function, namely a feature extraction part and a decision part. The feature extraction part is used for generating a high-quality area proposal candidate frame, judging and correcting the candidate frame through a classification function and a frame regression function, and primarily positioning the target.
Firstly, extracting features of a target picture, extracting proposed feature blocks with the same size, inputting the extracted proposed feature blocks into a decision part, calculating the category of the proposed feature blocks by using a classification function, and accurately detecting the position of a frame by using a frame regression function.
Exemplary description: referring to fig. 3, a schematic diagram of acquiring a plurality of first bounding boxes and a plurality of second bounding boxes is provided in an embodiment of the present application. The white bounding box is a second bounding box, acquired through a master RCNN, and the gray bounding box is a first bounding box, acquired through yoloV 7.
In the embodiment of the application, the color, the thickness and the text pattern displayed above are all set for the image processing related program, regardless of the model. Can be adjusted by one skilled in the art.
It is noted that the first bounding box and the second bounding box are only illustrative, while a third bounding box, a fourth bounding box, etc. may also be obtained from the target picture. The present application does not limit the number of types of acquisition types of the bounding box.
In addition, while acquiring a plurality of bounding boxes, the embodiment of the application may also acquire the intra-bounding box predicted values of the plurality of bounding boxes at the same time, so as to determine the intra-bounding box predicted value deviation by using the intra-bounding box predicted values, and acquire the conflict coefficient of the target picture in combination with the bounding box deviation.
In one possible implementation, the intra-bounding box predictor outputs a result for the softmax layer. I.e. the output layer in combination with the softmax activation function outputs a probability value of 0 to 1.
In addition, the prediction value deviation in the boundary frame can also be a probability value output by a sigmoid output result, namely, a result of a function activated by an output layer and the sigmoid. The intra-bounding box predictor is not limited to the softmax layer output result here.
Dividing the softmax layer output results of the plurality of bounding boxes into softmax layer output results of a plurality of targets; outputting a result by a softmax layer of the same target, and acquiring a prediction deviation in a boundary frame of each target in a plurality of targets based on cross entropy calculation; averaging the intra-frame prediction deviations of the plurality of targets to obtain intra-frame prediction deviations of the plurality of boundary frames
S103: from the plurality of bounding boxes, bounding box deviations for the plurality of bounding boxes are calculated.
The detection server calculates a bounding box deviation of the plurality of bounding boxes from the acquired plurality of bounding boxes.
In one possible implementation, the plurality of bounding boxes includes a plurality of first bounding boxes acquired by a first preset model and a plurality of second bounding boxes acquired by a second preset model. Inputting the first bounding boxes and the second bounding boxes into a preset bounding box deviation calculation formula, and calculating to obtain bounding box deviations of the bounding boxes.
The boundary frame deviation calculation formula is in positive correlation with the different degrees of the boundary frames. It can be appreciated that, for the target picture, the target object is marked in different ways, and the obtained bounding boxes have differences. If the picture is a low-value picture, for example, the picture does not have an object, and the bounding box does not exist, the bounding boxes acquired in different modes are not different, namely, the difference degree is small, and the bounding box deviation is almost 0. If the pictures are high-value pictures, such as pictures with multiple types, the acquired boundary frames have more differences, namely the difference degree is high, and the boundary frame deviation is larger. Therefore, it is possible to determine whether the target picture is a high-value picture using the bounding box difference.
In one possible implementation, the bounding box deviation calculation formula is GIOU.
Assuming that the first bounding box is a and the second bounding box is B, as shown in fig. 4, a schematic diagram of bounding box calculation is provided in the embodiment of the present application. First, a minimum block C is acquired that includes a and B. Then:
Figure BDA0004015044940000101
Figure BDA0004015044940000102
where A n B represents the intersection area of the first bounding box and the second bounding box. C (A) is the area of C minus the area of A. And A U B is the union area of the first boundary box and the second boundary box.
By calculating the average value of the GIOU values of the plurality of bounding boxes, the bounding box deviation of the plurality of bounding boxes can be obtained.
In one possible implementation, DIoU may be selected to represent the collision coefficient. Compared with GIoU, the method has better convergence speed and accuracy, and the obtained conflict coefficient has better accuracy.
Figure BDA0004015044940000103
Wherein ρ represents b and b gt Euclidean distance between them. b represents the center point of the first prediction frame, b gt Representing the center point of the second prediction box. ρ 2 That is, the square c of the distance between the two center points represents the minimum bounding rectangle diagonal length of the two rectangles. If the two boxes overlap perfectly, ioU =1, diou=1-0=1. Diou=0-1= -1 if the two boxes are far apart. Therefore, the DIoU has a value range of [ -1,1]。
By calculating the average value of DIoU values of a plurality of bounding boxes, the bounding box deviation of the plurality of bounding boxes can be obtained.
In the embodiment of the application, other calculation formulas for calculating the deviation of the bounding box are also possible, and the calculation formulas can be adjusted by a person skilled in the art according to the needs.
S104: and determining the conflict coefficient of the target picture according to the boundary frame deviations of the boundary frames.
The collision coefficient is used to represent the difficulty of target detection by the target picture. The collision coefficient is positively correlated with difficulty, and the greater the difficulty, the greater the collision coefficient. For high-value pictures, the effective information is more, so that the difficulty of target detection is higher and the conflict coefficient is higher.
According to the method and the device, the conflict coefficient of the target picture can be determined according to the boundary frame deviation of the boundary frames.
In one possible implementation, the bounding box deviation is directly utilized as the collision coefficient of the target picture. This is because, in a high-value picture, there are many marked bounding boxes, and the bounding box deviation is large, whereas in a low-value picture, there are few marked bounding boxes, and the bounding box deviation is small. Therefore, the bounding box deviation can be directly utilized as a collision coefficient of the target picture.
In another possible implementation, the collision coefficient of the target picture is determined using the bounding box deviation and the intra-bounding box prediction deviation. Specifically, the boundary frame deviation and the prediction deviation in the boundary frame can be substituted into a preset conflict coefficient calculation formula to calculate and obtain the conflict coefficient of the target picture. The preset conflict coefficient calculation formula is in positive correlation with the deviation of the boundary frame and in positive correlation with the prediction deviation in the boundary frame.
Assuming that the prediction deviation in the bounding box is P, the bounding box deviation W, the conflict coefficient of the target picture is C, and the preset conflict coefficient formula may be:
C=aP+bW (4)
wherein a and b are parameters which can be adjusted by a person skilled in the art according to the need.
In addition, the preset conflict factor formula may be in direct proportion to the square of the prediction bias within the bounding box, and in direct proportion to the square of the bounding box bias. Other positive correlations are also possible, which can be adjusted by one skilled in the art as desired.
Wherein the intra-bounding box prediction bias is obtained by:
dividing the softmax layer output results of the plurality of bounding boxes into softmax layer output results of a plurality of targets;
outputting a result by a softmax layer of the same target, and acquiring a prediction deviation in a boundary frame of each target in a plurality of targets based on cross entropy calculation;
and averaging the intra-frame prediction deviations of the plurality of targets to obtain intra-frame prediction deviations of the plurality of boundary frames.
And compared with the method for calculating the conflict coefficient by considering only the bounding box deviation, the screening method has higher accuracy for the high-value pictures.
S105: and judging whether the conflict coefficient is larger than a preset conflict threshold value. If yes, S106 is executed. Otherwise, S107 is performed.
And the detection server judges whether the target picture is a high-value picture according to the acquired conflict coefficient. And when the conflict coefficient is larger than a preset conflict threshold value, determining that the target picture is a high-value picture, otherwise, determining that the target picture is a low-value picture.
For example, the preset conflict threshold is set to be 0.45, and when the obtained conflict coefficient is 0.75, the target picture is a high-value picture because the conflict coefficient is larger than the preset conflict threshold. When the obtained conflict coefficient is 0.15, the target picture is a low-value picture because the conflict coefficient is smaller than a preset conflict threshold value.
S106: and taking the target picture as a picture for training a vehicle identification model.
And for the pictures with the conflict coefficients larger than the preset conflict threshold, taking the target pictures as pictures of a training target object recognition model, such as a vehicle recognition model, and placing the pictures in a training sample database. Such as fig. 2 (b), is placed in the training sample database for high value pictures.
S107: the target picture is filtered out.
And deleting the target picture from the training sample database when the conflict coefficient is not greater than a preset conflict threshold value.
S108: and circularly executing S101-S107 to acquire a training sample set.
And the detection server circularly executes the steps S101-S106 until all the images in the pre-acquired mass images are processed, and a training sample set is acquired. The training sample set contains only high value pictures at this time.
The image processing method disclosed by the embodiment of the application firstly responds to a received target image, and a plurality of boundary boxes corresponding to the target image are determined. From the plurality of bounding boxes, bounding box deviations for the plurality of bounding boxes are determined. And determining the conflict coefficient of the target picture according to the boundary frame deviation, so as to determine the target picture as a picture for training the vehicle recognition model according to the conflict coefficient of the target picture. The conflict coefficient is used for indicating the difficulty of identifying the target object in the target picture. Therefore, high-value pictures are screened out through the conflict coefficients, low-value pictures are filtered out, and the high-value pictures are used as pictures for training the recognition model, so that the problem that labor cost is high when the pictures are marked due to the fact that the low-value pictures train the target object recognition model is avoided.
In addition, the embodiment of the application also provides a device for processing the picture. Referring to fig. 5, a schematic structural diagram of an apparatus 500 for processing pictures according to an embodiment of the present application is provided. The device comprises:
a first response unit 501 configured to determine, in response to a received target picture, a plurality of bounding boxes corresponding to the target picture;
a first calculating unit 502 for calculating a bounding box deviation of the plurality of bounding boxes according to the plurality of bounding boxes;
a determining unit 503, configured to determine a collision coefficient of the target picture according to the bounding box deviations of the bounding boxes; the conflict coefficient is used for indicating the identification difficulty of the target object in the target picture;
and the second response unit 504 is configured to use the target picture as a picture of the training target object recognition model in response to the collision coefficient being greater than the preset collision threshold.
Optionally, the apparatus 500 further includes:
a second calculation unit configured to calculate intra-frame prediction deviations of the plurality of bounding boxes; the second calculating unit is specifically configured to: obtaining predicted values in the boundary frames of the plurality of boundary frames; calculating intra-frame prediction deviations of the plurality of bounding boxes according to intra-frame prediction values of the plurality of bounding boxes;
the determining unit 503 is specifically configured to determine a collision coefficient of the target picture according to the bounding box deviations of the bounding boxes and the intra-bounding box prediction deviations.
Outputting a result for the softmax layer by the predicted value in the boundary box;
calculating intra-bounding prediction deviations of the plurality of bounding boxes from intra-bounding box prediction values of the plurality of bounding boxes, comprising:
dividing the softmax layer output results of the plurality of bounding boxes into softmax layer output results of a plurality of targets;
outputting a result by a softmax layer of the same target, and acquiring a prediction deviation of each target in a plurality of targets based on cross entropy calculation;
and averaging the prediction deviations of the plurality of targets to obtain intra-boundary prediction deviations of the plurality of boundary frames.
Optionally, the determining unit 503 is further configured to:
substituting the boundary frame deviation and the boundary frame intra-prediction deviation into a preset conflict coefficient calculation formula, and calculating to obtain a conflict coefficient of the target picture;
the preset conflict coefficient calculation formula is in positive correlation with the deviation of the boundary frame and in positive correlation with the prediction deviation in the boundary frame.
Optionally, the plurality of bounding boxes includes a plurality of first bounding boxes and a plurality of second bounding boxes;
the first response unit 501 is further configured to:
inputting a target picture into a first preset model to obtain a plurality of first boundary frames; inputting the target picture into a second preset model to obtain a plurality of second boundary boxes;
wherein the first preset model and the second preset model are models with different principles;
the first computing unit 502 is specifically configured to:
calculating the bounding box deviations of the plurality of bounding boxes by using the plurality of first bounding boxes and the plurality of second bounding boxes based on a preset bounding box deviation calculation formula;
the boundary frame deviation calculation formula has positive correlation with the boundary frame difference degrees.
Optionally, the preset bounding box deviation calculation formula is a GIoU calculation formula.
The implementation manner of the specific structure is described in the above embodiment of the method for image processing, and will not be described herein.
In the device for processing a picture disclosed in the embodiment of the present application, the first response unit 501 determines, in response to a received target picture, a first bounding box set and a second bounding box corresponding to the target picture. The first calculating unit 502 calculates a bounding box deviation of the plurality of bounding boxes from the plurality of bounding boxes. The determination unit 503 determines the conflict coefficient of the target picture from the bounding box deviations of the bounding boxes. The second response unit 504 determines, according to the conflict coefficient of the target picture, that the target picture is a picture of the training vehicle recognition model. The conflict coefficient is used for indicating the difficulty of identifying the target object in the target picture. Therefore, high-value pictures are screened out through the conflict coefficients, low-value pictures are filtered out, and the high-value pictures are used as pictures for training the recognition model, so that the problem that labor cost is high when the pictures are marked due to the fact that the vehicle recognition model is trained by the low-value pictures is avoided.
The embodiment of the application also provides a vehicle system, which comprises the device for processing the pictures. The embodiment of the application also provides corresponding equipment and a computer readable storage medium, which are used for realizing the scheme provided by the embodiment of the application.
The device comprises a memory and a processor, wherein the memory is used for storing instructions or codes, and the processor is used for executing the instructions or codes to enable the device to execute a picture processing method according to any embodiment of the application.
In practical applications, the computer-readable storage medium may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of picture processing, the method comprising:
determining a plurality of bounding boxes corresponding to a received target picture in response to the target picture;
calculating boundary frame deviations of the plurality of boundary frames according to the plurality of boundary frames;
according to the boundary frame deviations of the boundary frames, determining conflict coefficients of the target picture; the conflict coefficient is used for indicating the identification difficulty of the target object in the target picture;
and responding to the conflict coefficient being larger than a preset conflict threshold, and taking the target picture as a picture of a training target object identification model.
2. The method according to claim 1, wherein the method further comprises:
obtaining intra-frame predicted values of the plurality of boundary frames;
calculating intra-frame prediction deviations of the plurality of bounding boxes according to intra-frame prediction values of the plurality of bounding boxes;
the determining, according to the plurality of bounding box deviations, a conflict coefficient of the target picture includes:
and determining a conflict coefficient of the target picture according to the boundary box deviation and the intra-boundary box prediction deviation of the plurality of boundary boxes.
3. The method of claim 2, wherein the intra-bounding box predictor is a softmax layer output result;
the calculating the intra-frame prediction bias of the plurality of bounding boxes according to the intra-frame prediction values of the plurality of bounding boxes comprises:
dividing the softmax layer output results of the bounding boxes into softmax layer output results of a plurality of targets;
outputting a result by a softmax layer of the same target, and acquiring a prediction deviation in a boundary frame of each target in the plurality of targets based on cross entropy calculation;
and averaging the intra-frame prediction deviations of the plurality of targets to obtain the intra-frame prediction deviations of the plurality of boundary frames.
4. The method of claim 2, wherein the determining the collision coefficient of the target picture from the bounding box deviation and the intra-bounding box prediction deviation of the plurality of bounding boxes comprises:
substituting the boundary frame deviation and the prediction deviation in the boundary frame into a preset conflict coefficient calculation formula, and calculating to obtain a conflict coefficient of the target picture;
the preset conflict coefficient calculation formula has a positive correlation with the boundary frame deviation and has a positive correlation with the intra-boundary frame prediction deviation.
5. The method of claim 1, wherein the plurality of bounding boxes comprises a plurality of first bounding boxes and a plurality of second bounding boxes;
the determining a plurality of bounding boxes corresponding to the target picture includes:
inputting the target picture into a first preset model, and acquiring the plurality of first boundary frames; inputting the target picture into a second preset model to obtain a plurality of second bounding boxes;
wherein the first preset model and the second preset model are models with different principles;
the calculating the bounding box deviation of the bounding boxes according to the bounding boxes comprises the following steps:
calculating the bounding box deviations of the plurality of bounding boxes by using the plurality of first bounding boxes and the plurality of second bounding boxes based on a preset bounding box deviation calculation formula;
and the boundary frame deviation calculation formula is in positive correlation with a plurality of boundary frame difference degrees.
6. The method of claim 5, wherein the predetermined bounding box deviation calculation formula is a GIoU calculation formula.
7. An apparatus for processing pictures, said apparatus comprising:
a first response unit, configured to determine a plurality of bounding boxes corresponding to a received target picture in response to the target picture;
a first calculation unit configured to calculate a bounding box deviation of the plurality of bounding boxes from the plurality of bounding boxes;
a determining unit, configured to determine a collision coefficient of the target picture according to a bounding box deviation of the bounding boxes; the conflict coefficient is used for indicating the identification difficulty of the target object in the target picture;
and the second response unit is used for responding to the conflict coefficient being larger than a preset conflict threshold value, and taking the target picture as a picture of a training target object identification model.
8. The apparatus of claim 7, wherein the apparatus further comprises:
a second calculation unit configured to calculate intra-frame prediction deviations of the plurality of bounding boxes; the second calculating unit is specifically configured to: obtaining intra-frame predicted values of the plurality of boundary frames; calculating intra-frame prediction deviations of the plurality of bounding boxes according to intra-frame prediction values of the plurality of bounding boxes;
the determining unit is specifically configured to determine a collision coefficient of the target picture according to the bounding box deviations of the bounding boxes and the intra-bounding box prediction deviations.
9. A vehicle system comprising a device for picture processing according to any one of claims 7 or 8.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon code which, when run, performs the steps of the method according to any of claims 1-6.
CN202211667478.0A 2022-12-23 2022-12-23 Picture processing method, device, system and storage medium Pending CN116129368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211667478.0A CN116129368A (en) 2022-12-23 2022-12-23 Picture processing method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211667478.0A CN116129368A (en) 2022-12-23 2022-12-23 Picture processing method, device, system and storage medium

Publications (1)

Publication Number Publication Date
CN116129368A true CN116129368A (en) 2023-05-16

Family

ID=86309251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211667478.0A Pending CN116129368A (en) 2022-12-23 2022-12-23 Picture processing method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN116129368A (en)

Similar Documents

Publication Publication Date Title
US9940548B2 (en) Image recognition method for performing image recognition utilizing convolution filters
CN110853033B (en) Video detection method and device based on inter-frame similarity
CN108256404B (en) Pedestrian detection method and device
CN109191498B (en) Target detection method and system based on dynamic memory and motion perception
CN110659658B (en) Target detection method and device
CN110533046B (en) Image instance segmentation method and device, computer readable storage medium and electronic equipment
CN112419202B (en) Automatic wild animal image recognition system based on big data and deep learning
JP7026165B2 (en) Text recognition method and text recognition device, electronic equipment, storage medium
CN109740416B (en) Target tracking method and related product
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN112651274B (en) Road obstacle detection device, road obstacle detection method, and recording medium
JP6577397B2 (en) Image analysis apparatus, image analysis method, image analysis program, and image analysis system
CN114332708A (en) Traffic behavior detection method and device, electronic equipment and storage medium
CN111353440A (en) Target detection method
CN114332778B (en) Intelligent alarm work order generation method and device based on people stream density and related medium
CN106651803B (en) Method and device for identifying house type data
JP2014110020A (en) Image processor, image processing method and image processing program
CN112597995A (en) License plate detection model training method, device, equipment and medium
CN114445716B (en) Key point detection method, key point detection device, computer device, medium, and program product
CN112380948A (en) Training method and system for object re-recognition neural network and electronic equipment
JP7165353B2 (en) Image feature output device, image recognition device, image feature output program, and image recognition program
CN116129368A (en) Picture processing method, device, system and storage medium
CN111027551A (en) Image processing method, apparatus and medium
CN112150529A (en) Method and device for determining depth information of image feature points
CN116152751A (en) Image processing method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination