CN114913232B

CN114913232B - Image processing method, device, equipment, medium and product

Info

Publication number: CN114913232B
Application number: CN202210654662.5A
Authority: CN
Inventors: 吴新涛
Original assignee: Jiayang Smart Security Technology Beijing Co ltd
Current assignee: Jiayang Smart Security Technology Beijing Co ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2023-08-08
Anticipated expiration: 2042-06-10
Also published as: CN114913232A

Abstract

The embodiment of the application provides an image processing method, an image processing device, a medium and a product, comprising the following steps: acquiring an image to be processed in real time, wherein the image to be processed comprises a crane image; extracting image features in an image to be processed based on a first network of an image processing model, and determining position information of a first boundary frame based on the image features, wherein the image selected by the first boundary frame comprises a crane supporting leg image; and carrying out classification calculation on the position information of the first boundary frame through a second network of the image processing model to obtain target confidence coefficient, wherein the target confidence coefficient is used for representing the probability of adding the backing plate at the crane supporting leg. According to the embodiment of the application, whether the base plate is added to the crane supporting leg or not can be timely judged, so that the safety of crane operation is guaranteed.

Description

Image processing method, device, equipment, medium and product

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an image processing method, an image processing device, an image processing medium and an image processing product.

Background

In real life, a base plate is generally added at the supporting leg of the crane, so that the pressure brought to the ground by the weight of the vehicle is dispersed by enlarging the contact area between the supporting leg and the ground, and the operation safety of the crane is further ensured. At present, when a crane works, a worker checks whether the support leg of the crane is added with a base plate according to an operation flow designated by the worker, so that whether the support leg of the crane is added with the base plate cannot be judged timely, and further the safety of the crane work cannot be guaranteed.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment, an image processing medium and an image processing product, which can timely judge whether a base plate is added at a crane supporting leg or not so as to guarantee the safety of crane operation.

In a first aspect, an embodiment of the present application provides an image processing method, including: acquiring an image to be processed in real time, wherein the image to be processed comprises a crane image;

extracting image features in an image to be processed based on a first network of an image processing model, and determining position information of a first boundary frame based on the image features, wherein the image selected by the first boundary frame comprises a crane supporting leg image;

and carrying out classification calculation on the position information of the first boundary frame through a second network of the image processing model to obtain target confidence coefficient, wherein the target confidence coefficient is used for representing the probability of adding the backing plate at the crane supporting leg.

In an optional implementation manner of the first aspect, performing, by the second network of the image processing model, classification calculation on the location information of the first bounding box to obtain the target confidence level includes:

cutting an image to be processed based on the position information of the first boundary frame, and obtaining a crane landing leg image;

and carrying out classified calculation on the crane landing leg images through a second network of the image processing model to obtain the target confidence coefficient.

In an alternative embodiment of the first aspect, the method further comprises:

and sending the image to be processed to the alarm platform for the alarm platform to generate alarm information under the condition that the target confidence coefficient is larger than a preset threshold value.

In an optional implementation manner of the first aspect, before extracting image features in the image to be processed based on the first network of image processing models and determining the first processing result based on the image features, the method further includes:

acquiring a training sample set, wherein the training sample set comprises a plurality of image samples to be processed and label confidence degrees corresponding to each image sample to be processed;

and training a preset image processing model by using the image samples to be processed in the training sample set and the label confidence degrees corresponding to each image sample to be processed, so as to obtain a trained image processing model.

In an optional implementation manner of the first aspect, training a preset image processing model by using an image sample to be processed in a training sample set to obtain a trained image processing model includes:

extracting reference image features in an image sample to be processed based on a first network of a preset image processing model, and determining position information of a first reference boundary frame based on the reference image features, wherein the image sample selected by the first reference boundary frame comprises a crane leg image sample;

Classifying and calculating the position information of the first reference boundary frame through a second network of a preset image processing model to obtain reference confidence; the reference confidence is used for representing the probability of adding a backing plate at the crane supporting leg;

determining a loss function value of a preset image processing model according to the reference confidence coefficient of the target image sample to be processed and the label confidence coefficient of the target image sample to be processed, wherein the target image sample to be processed is any one of the image samples to be processed;

and training the preset image processing model by utilizing the image sample to be processed based on the loss function value of the preset image processing model to obtain a trained image processing model.

In an optional implementation manner of the first aspect, before acquiring the training sample set, the method further includes:

acquiring a plurality of original images, wherein the original images comprise crane images;

and carrying out data enhancement processing on the plurality of original images according to a preset data enhancement strategy to obtain a plurality of image samples to be processed corresponding to each original image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring an image to be processed in real time, wherein the image to be processed comprises a crane image;

The determining module is used for extracting image features in the image to be processed based on a first network of the image processing model, determining position information of a first boundary frame based on the image features, wherein the image selected by the first boundary frame comprises a crane supporting leg image;

the classification calculation module is used for carrying out classification calculation on the position information of the first boundary frame through a second network of the image processing model to obtain target confidence coefficient, wherein the target confidence coefficient is used for representing the probability of adding the base plate at the crane supporting leg.

In a third aspect, there is provided an electronic device comprising: a memory for storing computer program instructions; a processor for reading and executing computer program instructions stored in a memory to perform the image processing method provided in any of the alternative embodiments of the first aspect.

In a fourth aspect, there is provided a computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement the image processing method provided by any of the alternative embodiments of the first aspect.

In a fifth aspect, a computer program product is provided, instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform an image processing method implementing any of the alternative embodiments of the first aspect.

In the embodiment of the application, after the image to be processed including the crane image is acquired in real time, the image characteristics in the image to be processed are extracted based on the first network of the image processing model, the position information of the first boundary frame is determined based on the extracted image characteristics, the image selected by the first boundary frame includes the crane supporting leg image, and further, the position information of the first boundary frame can be classified and calculated through the second network of the image processing model, so that the target confidence level can be obtained, and the target confidence level can be used for representing the probability of adding the backing plate at the crane supporting leg. Therefore, the image to be processed is obtained in real time and is processed, so that the probability of adding the backing plate at the crane supporting leg is obtained, whether the backing plate is added at the crane supporting leg or not can be timely judged, and the safety of crane operation is guaranteed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.

Fig. 1 is a schematic diagram of a training flow of an image processing model in an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a training flow of an image processing model in another image processing method according to an embodiment of the present application;

FIG. 3 is a schematic structural view of a feature pyramid provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a path aggregation feature pyramid provided in an embodiment of the present application;

fig. 5 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application are described in detail below to make the objects, technical solutions and advantages of the present application more apparent, and to further describe the present application in conjunction with the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative of the application and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by showing examples of the present application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

In order to solve the problem that in the prior art, whether a base plate is added at a crane supporting leg or not cannot be timely judged to ensure the safety of crane operation, the embodiment of the application provides an image processing method, an image processing device, equipment, a medium and a product. Therefore, the image to be processed is obtained in real time and is processed, so that the probability of adding the backing plate at the crane supporting leg is obtained, whether the backing plate is added at the crane supporting leg or not can be timely judged, and the safety of crane operation is guaranteed.

The image processing method provided in the embodiment of the present application, the execution subject may be an image processing apparatus, or a control module for executing the image processing method in the image processing apparatus. In the embodiment of the present application, an image processing method performed by an image processing apparatus is taken as an example, and an image processing scheme provided in the embodiment of the present application is described.

In addition, in the image processing method provided in the embodiment of the present application, the image to be processed needs to be processed by using the pre-trained image processing model, so the image processing model needs to be trained before the image processing is performed by using the image processing model. Accordingly, a specific implementation of the training method for an image processing model provided in the embodiments of the present application is described below with reference to the accompanying drawings.

The embodiment of the application provides a training method of an image processing model, an execution subject of the method is an image processing device, and the method can be realized by the following steps:

1. acquiring training sample sets

The training sample set may include a plurality of image samples to be processed and a label confidence corresponding to each image sample to be processed. Wherein each image sample to be processed may include a crane image sample, and the label confidence may be used to characterize the probability of padding at the crane leg.

In order to obtain a more accurate training sample set and further train the image processing model better, in a specific embodiment, as shown in fig. 1, the step of obtaining the training sample set may specifically include the following steps:

s110, acquiring a plurality of image samples to be processed.

Specifically, the image processing apparatus may directly obtain, by the monitoring device, a plurality of image samples to be processed within a preset period of time. The preset time period may be a time period based on actual experience or required to be preset, may be one month or three months, and is not particularly limited herein.

Specifically, the image processing device may utilize a video of a crane operation on a working site in a preset time period, and obtain, according to a grouping frame extraction manner, an image sample containing the crane from the video every N frames, where N is a positive integer, and the monitoring device may be a monitoring device installed on a telegraph pole or a street lamp on the working site, and may be used to obtain the crane image sample in the working site. In addition, the horizontal distance between the monitoring device and the crane can be controlled within a preset distance, for example, within 100 meters, and the monitoring device is not limited in detail.

S120, labeling label confidence degrees corresponding to the plurality of image samples to be processed one by one.

Specifically, the label confidence of each image sample to be processed may be labeled by a manual labeling method, or the label confidence of each image sample to be processed may be directly labeled by the image processing apparatus, and the specific labeling method is not specifically limited herein.

In the labeling process, 75% of labeled sample data can be used as a training sample, 25% of labeled sample data can be used as a test sample, and the distribution ratio of the specific training sample and the test sample is not excessively limited here.

The image processing model needs to be subjected to iterative processing for a plurality of times to adjust the loss function value until the loss function value meets the training stop condition, so as to obtain the trained image processing device. However, in each iterative training, if only one image sample to be processed is input, too little sample size is not beneficial to training adjustment of the image processing model. Therefore, the training sample set needs to be divided into a plurality of image samples to be processed, so that the image processing model can be subjected to iterative processing by using the image samples to be processed in the training sample set.

Therefore, the label confidence degrees corresponding to the plurality of image samples to be processed one by one can be obtained by annotating the acquired image samples to be processed, and a training sample set containing the plurality of image samples to be processed can be obtained. Thus, the training of the subsequent model is facilitated.

2. And training a preset image processing model by using the label confidence coefficient corresponding to each image sample to be processed in the training sample set to obtain a trained image processing model.

As shown in fig. 2, the present step may specifically include the following steps:

s210, extracting reference image features in an image sample to be processed based on a first network of a preset image processing model, and determining position information of a first reference boundary box based on the reference image features.

The first network may be a YOLOv5 network, and the first network used in particular is not limited herein. In addition, the YOLOv5 network model in the embodiment of the application can adopt 8 times, 16 times and 32 times of downsampling characteristic diagrams. The reference image features are image features at the crane legs comprised in the image sample to be processed. The image samples framed by the first reference bounding box may include crane leg image samples. The position information of the first reference bounding box may be the position coordinates of the bounding box at the top left corner pixel vertex and the bottom right corner pixel vertex in the image to be processed, and in particular is not excessively limited here.

Specifically, the image processing device may input the image sample to be processed in the training sample set to the first network in the preset image processing model after the training sample set is acquired, and determine the position information of the first reference bounding box based on the reference image feature in the image sample to be processed extracted by the first network.

In one example, the image processing apparatus may input the image samples to be processed in the training sample set into a first network of preset image processing models after acquiring the training sample set, and the first network may input the position information of a first reference bounding box with a reference confidence level greater than a first preset threshold value by extracting reference image features of the image samples to be processed and processing the reference image features. The first preset threshold may be based on practical experience or a threshold that needs to be preset, for example, may be 0.4, which is not limited in this case.

Based on this, it should be noted that, the feature pyramid is a method commonly used in the field of target detection, and a specific structure of the feature pyramid may be shown in fig. 3, where low-level feature semantic information is relatively less, but the target position is accurate; the feature semantic information of the higher layer is rich, but the target position is rough. The feature pyramid can blend features of different sizes and targets of different sizes that are performed independently at different feature layers.

Based on the above, the path aggregation feature pyramid structure is used, and a specific structure can be shown as shown in fig. 4, so that the multi-scale image feature is realized, and the detection performance of target detection, especially small targets, is further improved. Firstly, introducing bottom-up path enhancement, shortening an information propagation path, and simultaneously utilizing accurate positioning information of low-level features; secondly, dynamic feature pooling is used, and each proposal area utilizes the features of all layers of the pyramid so as to avoid random allocation of the proposal areas; thirdly, fusion of the full connection layer is used, and the capturing capability of the model on different scale information is improved.

S220, classifying and calculating the position information of the first reference boundary frame through a second network of a preset image processing model to obtain the reference confidence.

The second network may be MobileNetV2, but other networks that may be used for classification calculation may be used, which is not particularly limited herein. The reference confidence is used to characterize the probability of padding the crane leg.

Specifically, since the image samples framed by the first reference bounding box include crane leg image samples, the image processing apparatus may perform classification calculation on the position information of the first reference bounding box through the second network of the preset image processing model after obtaining the position information of the first reference bounding box, so as to obtain the reference confidence.

It should be noted that, due to the high interference of background information and similarity of categories, the first network has a certain short board for the classification performance of the target, so that certain false alarm and false alarm are correspondingly generated. In view of the lack of classification capability in the first network, the present application proposes to use an additional classifier, i.e. MobileNetV2, to improve classification performance.

Specifically, in the training stage, a target area is cut from an image sample to be processed according to a true value mark, wherein the true value can be the mark of a training image and is an ideal result output by a model, and then a first reference boundary frame is transformed to 224×224 by a zero filling mode and sent to mobilenet v2 for training; in the reasoning stage, the first reference boundary frame is cut from the image sample to be processed according to the position information of the first reference boundary frame, and the first reference boundary frame is transformed to 224×224 in a zero filling mode to obtain the target confidence. The location information referred to may be the pixel location of the first reference bounding box in the image sample to be processed.

S230, determining a loss function value of a preset image processing model according to the reference confidence coefficient of the target image sample to be processed and the label confidence coefficient of the target image sample to be processed.

Wherein the target image sample to be processed is any one of the image samples to be processed.

Specifically, the image processing device can obtain the reference confidence coefficient based on any one of the plurality of image samples to be processed, further accurately determine the loss function value of the preset image processing device according to the label confidence coefficient corresponding to the image sample to be processed, and facilitate iterative training of the preset image processing model based on the loss function value, so as to obtain a more accurate image processing model.

S240, training the preset image processing model by using the image sample to be processed based on the loss function value of the preset image processing model, and obtaining a trained image processing model.

Specifically, in order to obtain a better trained image processing model, under the condition that the loss function value does not meet the training stop condition, model parameters of a preset image processing model are adjusted, and the image processing model with the adjusted parameters is continuously trained by using the image sample to be processed until the loss function value meets the training stop condition, so that the trained image processing model is obtained.

In this embodiment, the image processing apparatus may extract the reference image features in each of the image samples to be processed in the training sample set by inputting the image sample to be processed into the first network of the preset image processing model, and determine the position information of the first reference bounding box based on the reference image features, the image samples framed by the first reference bounding box including the crane leg image samples. After determining the position information of the first reference bounding box, the reference confidence coefficient used for representing the probability of adding the backing plate at the crane supporting leg can be obtained based on the position information of the first reference bounding box by the second network of the preset image processing model, then the loss function value can be determined according to the label confidence coefficient corresponding to each image sample to be processed, and then the preset image processing model can be trained by utilizing the image sample to be processed based on the loss function value until the loss function value meets the training stop condition, so that the more accurate image processing model can be obtained.

The crane support leg has the advantages that compared with the whole crane, the area of the crane support leg in the image to be processed is too small, the resolution ratio is low, the pixel occupation proportion of the crane support leg in the first boundary frame is relatively small, and meanwhile shielding of other interference objects on the support leg is also caused. In addition, the two categories of "leg padded" and "leg un-padded" are highly similar, with differences mainly occurring in fine-grained detail features, which can easily result in inaccurate confidence in the output.

Based on this, in one embodiment, the above-mentioned step S220 may specifically include the following steps:

cutting an image sample to be processed based on the position information of the first reference boundary frame to obtain a crane leg image sample;

and carrying out classified calculation on the crane leg image samples through a second network of a preset image processing model to obtain the reference confidence coefficient.

In this embodiment, after the image processing apparatus obtains the position information of the first reference bounding box, the image processing apparatus may cut out the image sample to be processed based on the position information of the first reference bounding box, and since the image sample framed by the first reference bounding box includes the crane leg image sample, the crane leg image sample may be obtained, and further, the crane leg image sample may be classified and calculated through the second network of the preset image processing model, so as to obtain the reference confidence. Therefore, the classification accuracy of the preset image processing model can be improved, and more accurate image processing models can be obtained conveniently.

In addition, since the image features of the acquired image sample to be processed vary greatly along with weather influence, and the single image of the image features is unfavorable for training of the model, in order to make the acquired image to be processed have diversity so as to make the trained model have robustness to weather illumination and other conditions, in an embodiment, before acquiring the training sample set, the image processing method related to the foregoing may further include:

acquiring a plurality of original images;

Specifically, the image processing device may acquire a plurality of original images before acquiring the training sample set, and perform data enhancement processing on each original image according to a preset data enhancement policy, so as to obtain a plurality of image samples to be processed corresponding to each original image.

Wherein the original image comprises a crane image. The preset data enhancement policy may be a policy preset based on actual needs or experience, for enhancing the image. And two data enhancement operations may be included in each preset data enhancement policy, each data enhancement operation may include a probability of using the operation and a magnitude associated with the operation, so that in the search space of the data enhancement policy thus composed, a combination of optimal data enhancement policies is obtained using reinforcement learning. Note that, the probability of 0 or the intensity of 0 indicates that this enhancement operation is not used.

In addition, it should be noted that, the preset data enhancement policy in the embodiment of the present application may be five data enhancement policies selected from the specific selection of TranslateX_BBox, equalize, translateY _only_BBoxes, cutout, sharpness, shearX_BBox, shearY_BBox, translateY_only_ BBoxes, rotate _BBox, and Color in the enhancement processing operations such as ShearX/Y, translateX/Y, rotate, autoContrast, invert, equalize, solarize, posterize, contrast, color, brightness, sharpness, cutout, sample Paing, etc.

Wherein, translateX_BBox: and translating the true value labeling frame and the original image. Equallize: and carrying out histogram equalization on each channel. TranslateY_only_BBoxes: and randomly translating the true value labeling frame. Cutout: and deleting partial rectangular areas in the image. Sharpness: image sharpening is performed. Shearx_bbox: miscut is performed on the image and the truth box. Rotate_bbox: the image and the truth box are rotated. Color: the image is color transformed.

Specific examples are shown in Table 1:

table 1 data enhancement strategy

In addition, the number of samples of each original image can be increased in a copying manner.

In this embodiment, the image processing apparatus may acquire a plurality of original images before acquiring the training sample set, and perform data enhancement processing on each of the plurality of original images according to a preset data enhancement policy, so as to obtain a plurality of image samples to be processed corresponding to each of the plurality of original images. Therefore, the single image characteristic is avoided, and a large number of image samples to be processed can be obtained, so that a more accurate image processing model can be trained conveniently.

Based on the image processing model obtained through training in the above embodiment, the embodiment of the present application further provides a specific implementation manner of the image processing method, which is specifically described in detail with reference to fig. 5.

S510, acquiring an image to be processed in real time.

The image processing device can acquire a frame of image to be processed every N frames in a real-time monitoring video acquired by monitoring equipment installed on a working site. Wherein the image to be processed comprises a crane image. The monitoring device can be a telegraph pole installed on a construction site or a monitoring device on a street lamp and can be used for acquiring crane images in construction site operation. In addition, the horizontal distance between the monitoring device and the crane can be controlled within a preset distance, for example, within 100 meters, and the monitoring device is not particularly limited herein. The image to be processed may comprise a crane image.

S520, extracting image features in the image to be processed based on the first network of the image processing model, and determining the position information of the first boundary frame based on the image features.

The first network may be a YOLOv5 network, and the first network used in particular is not limited herein. The image features may be image features at crane legs included in the image to be processed. The image framed by the first bounding box includes a crane leg image. The position information of the first bounding box may be the position coordinates of the top left corner pixel vertex and the bottom right corner pixel vertex in the image to be processed, and in particular is not limited herein too.

Specifically, the image processing apparatus may input the image to be processed acquired in real time into the first network of the image processing model, so that the image characteristics in the image to be processed extracted based on the first network of the image processing model may be further determined based on the image characteristics.

In one example, the image processing apparatus may input the image to be processed acquired in real time into a first network of the image processing model, extract image features of the image to be processed through the first network, and process the image features to output position information of a first bounding box with a confidence level greater than a first preset threshold. The first preset threshold may be based on practical experience or a threshold that needs to be preset, which is not specifically limited herein.

And S530, performing classification calculation on the position information of the first boundary frame through a second network of the image processing model to obtain the target confidence coefficient.

The second network may be MobileNetV2, but other networks that may be used for classification calculation may be used, which is not particularly limited herein. The target confidence is used to characterize the probability of padding the crane leg.

Specifically, since the image framed by the first reference bounding box includes the crane leg image, the image processing apparatus may perform a classification calculation on the position information of the first bounding box through the second network of the image processing model after obtaining the position information of the first bounding box, so as to obtain the target confidence.

In order to obtain the target confidence more accurately, in one embodiment, the step S530 may specifically include the following steps:

In this embodiment, the image processing device may cut out the image to be processed based on the acquired position information of the first bounding box to acquire the crane leg image, and further may perform classification calculation on the crane leg image through the second network card of the image processing model, so as to obtain a probability for characterizing the padding plate at the crane leg. Therefore, the image to be processed can be cut, and the cut image is processed through the image processing model, so that the target confidence coefficient can be accurately obtained, and the base plate can be timely added at the crane supporting leg, and further the safety of crane operation can be guaranteed.

In order to describe the image processing method provided in the embodiments of the present application more accurately and comprehensively, in one embodiment, the image processing method related to the foregoing may further include the following steps:

The preset threshold may be based on practical experience or a confidence threshold that needs to be preset.

Specifically, the image processing device can send the image to be processed to the alarm platform under the condition that the target confidence coefficient is larger than a preset threshold value, so that the alarm platform can generate alarm information, people can be reminded in time, crane supporting legs can be adjusted conveniently, and further safety of crane operation is guaranteed.

Table 2 statistical table of test results

Type of algorithm	Number of test scenes	Video frame number	Correct identification	Error identification	Accuracy rate of
						Crane landing leg without pad plate	2	6320	5963	357	94.3％
Crane landing leg without pad plate	2	6542	6171	371	94.3％
						Crane landing leg without pad plate	1	5324	5003	321	93.9％
Crane landing leg without pad plate	1	4890	4643	247	94.9％

Based on the image processing method provided by the embodiment of the application, whether the pad plate is additionally arranged on the crane in operation can be detected in real time, so that the management requirement is met, potential safety hazards can be timely sent out, and accidents are avoided. Based on the method, the self-shooting collected site scene data set can be adopted for training and testing under the conditions of the Inter Core i7 CPU, the 4G memory and the NVIDIA Geforce 2080Ti independent display card. The training of the model is iterated 70 times after the weight file is trained in advance by the ImageNet, four time periods are included, wherein the time periods comprise morning, noon and afternoon, 1-3 test scenes are selected at night, enough image tests are obtained, and specific results are shown in a table 2.

Based on the same inventive concept, the embodiment of the application also provides an image processing device. The image processing apparatus provided in the embodiment of the present application will be described in detail with reference to fig. 6.

Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

As shown in fig. 6, the image processing apparatus 600 may include: an acquisition module 610, a determination module 620, and a classification calculation module 630.

The acquiring module 610 is configured to acquire an image to be processed in real time, where the image to be processed includes a crane image.

The determining module 620 is configured to extract image features in an image to be processed based on a first network of image processing models, and determine location information of a first bounding box based on the image features, where the image framed by the first bounding box includes a crane leg image.

The classification calculation module 630 is configured to perform classification calculation on the position information of the first bounding box through the second network of the image processing model, so as to obtain a target confidence coefficient, where the target confidence coefficient is used to characterize a probability of adding a pad at the crane leg.

In an embodiment, the acquiring module related to the foregoing is further configured to crop the image to be processed based on the position information of the first bounding box, and acquire a crane leg image.

The classification calculation module is further used for performing classification calculation on the crane landing leg image through a second network of the image processing model to obtain the target confidence coefficient.

In one embodiment, the image processing apparatus referred to above may include a transmission module.

The sending module is used for sending the image to be processed to the alarm platform under the condition that the target confidence coefficient is larger than a preset threshold value, so that the alarm platform can be used for generating alarm information.

In one embodiment, the obtaining module is further configured to obtain a training sample set, where the training sample set includes a plurality of image samples to be processed and label confidence degrees corresponding to each of the image samples to be processed, before the first network based on the image processing model extracts image features in the image to be processed and determines a first processing result based on the image features.

The image processing apparatus referred to above may further include a training module.

The training module is used for training a preset image processing model by utilizing the image samples to be processed in the training sample set and the label confidence coefficient corresponding to each image sample to be processed, and obtaining a trained image processing model.

In one embodiment, the training module may be specifically configured to:

In one embodiment, the acquiring module is further configured to acquire a plurality of original images, including a crane image, before acquiring the training sample set.

The image processing apparatus referred to above may include a data enhancement module.

The data enhancement module is used for carrying out data enhancement processing on the plurality of original images according to a preset data enhancement strategy so as to obtain a plurality of image samples to be processed corresponding to each original image.

In the embodiment of the application, after the image to be processed including the crane image is acquired in real time, the image characteristics in the image to be processed are extracted based on the first network of the image processing model, the position information of the first boundary frame is determined based on the extracted image characteristics, the image selected by the first boundary frame includes the crane supporting leg image, and further, the position information of the first boundary frame can be classified and calculated through the second network of the image processing model, so that the target confidence level can be obtained, and the target confidence level can be used for representing the probability of adding the backing plate at the crane supporting leg. Therefore, the image to be processed is obtained in real time and is processed, so that the probability of adding the backing plate at the crane supporting leg is obtained, whether the backing plate is added at the crane supporting leg or not can be timely judged, and the safety of crane operation is guaranteed. .

The modules in the image processing apparatus provided in the embodiments of the present application may implement the method steps in the embodiments shown in fig. 1, fig. 2, or fig. 5, and may achieve technical effects corresponding to the steps, which are not described herein for brevity.

Fig. 7 shows a schematic hardware structure of an electronic device according to an embodiment of the present application.

A processor 701 may be included in an electronic device, as well as a memory 702 in which computer program instructions are stored.

In particular, the processor 701 described above may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 702 may include mass storage for data or instructions. By way of example, and not limitation, memory 702 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory 702 may include removable or non-removable (or fixed) media, where appropriate. Memory 702 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 702 is a non-volatile solid state memory.

The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to methods in accordance with aspects of the present disclosure.

The processor 701 implements any one of the image processing methods of the above embodiments by reading and executing computer program instructions stored in the memory 702.

In one example, the electronic device may also include a communication interface 703 and a bus 710. As shown in fig. 7, the processor 701, the memory 702, and the communication interface 703 are connected by a bus 710 and perform communication with each other.

The communication interface 703 is mainly used for implementing communication between each module, device, unit and/or apparatus in the embodiments of the present application.

Bus 710 includes hardware, software, or both that couple the components of the online data flow billing device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 710 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

In addition, in combination with the image processing method in the above embodiment, the embodiment of the application may be implemented by providing a computer storage medium. The computer storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement the image processing method provided in the embodiments of the present application.

Embodiments of the present application also provide a computer program product, where instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform an image processing method as provided in the embodiments of the present application.

It should be clear that the present application is not limited to the particular arrangements and processes described above and illustrated in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions, or change the order between steps, after appreciating the spirit of the present application.

The functional blocks shown in the above block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be different from the order in the embodiments, or several steps may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable image processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, which are intended to be included in the scope of the present application.

Claims

1. An image processing method, the method comprising:

acquiring an image to be processed in real time, wherein the image to be processed comprises a crane image;

extracting image features in the image to be processed based on a first network of an image processing model, and determining position information of a first boundary frame based on the image features, wherein the image selected by the first boundary frame comprises a crane leg image, and the first network comprises a YOLOv5 network;

classifying and calculating the position information of the first boundary frame through a second network of the image processing model to obtain target confidence, wherein the target confidence is used for representing the probability of adding a base plate at a crane supporting leg, and the second network comprises a MobileNet V2 network;

The classifying calculation is performed on the position information of the first boundary frame through the second network of the image processing model to obtain the target confidence coefficient, and the classifying calculation comprises the following steps:

cutting the image to be processed based on the position information of the first boundary frame, and obtaining a crane landing leg image;

classifying and calculating the crane landing leg images through a second network of the image processing model to obtain target confidence coefficient;

before the first network based on the image processing model extracts the image features in the image to be processed and determines the position information of the first bounding box based on the image features, the method further includes:

training a preset image processing model by using the image samples to be processed in the training sample set and the label confidence corresponding to each image sample to be processed to obtain a trained image processing model;

training a preset image processing model by using the image samples to be processed in the training sample set and the label confidence corresponding to each image sample to be processed to obtain a trained image processing model, wherein the training comprises the following steps:

Extracting reference image features in the image samples to be processed based on a first network of a preset image processing model, and determining position information of a first reference boundary frame based on the reference image features, wherein the image samples selected by the first reference boundary frame comprise crane leg image samples;

determining a loss function value of a preset image processing model according to a reference confidence coefficient of a target image sample to be processed and a label confidence coefficient of the target image sample to be processed, wherein the target image sample to be processed is any one of the image samples to be processed;

2. The method according to claim 1, wherein the method further comprises:

and sending the image to be processed to an alarm platform under the condition that the target confidence coefficient is larger than a preset threshold value, so that the alarm platform can generate alarm information.

3. The method of claim 1, wherein prior to acquiring the training sample set, the method further comprises:

4. An image processing apparatus, characterized in that the apparatus comprises:

the determining module is used for extracting image features in the image to be processed based on a first network of the image processing model, determining position information of a first boundary frame based on the image features, wherein the image selected by the first boundary frame comprises a crane leg image, and the first network comprises a YOLOv5 network;

the classification calculation module is used for carrying out classification calculation on the position information of the first boundary frame through a second network of the image processing model to obtain target confidence coefficient, wherein the target confidence coefficient is used for representing the probability of adding a base plate at a crane supporting leg, and the second network comprises a MobileNet V2 network;

The acquisition module is further used for cutting the image to be processed based on the position information of the first boundary frame to acquire a crane landing leg image;

the classification calculation module is specifically configured to perform classification calculation on the crane leg image through a second network of the image processing model to obtain a target confidence coefficient;

the device also comprises a training module;

the acquiring module is further configured to acquire a training sample set before the first network based on the image processing model extracts image features in the image to be processed and determines position information of a first boundary frame based on the image features, where the training sample set includes a plurality of image samples to be processed and label confidence degrees corresponding to each image sample to be processed;

the training module is used for training a preset image processing model by utilizing the image samples to be processed in the training sample set and the label confidence corresponding to each image sample to be processed to obtain a trained image processing model;

the training module is specifically used for:

5. An electronic device, the device comprising: a processor and a memory storing computer program instructions;

the processor reads and executes the computer program instructions to implement the image processing method according to any of claims 1-3.

6. A computer storage medium, characterized in that the computer storage medium has stored thereon computer program instructions which, when executed by a processor, implement the image processing method according to any of claims 1-3.