CN112085031A

CN112085031A - Target detection method and system

Info

Publication number: CN112085031A
Application number: CN202010955668.7A
Authority: CN
Inventors: 王云峰; 黎作鹏
Original assignee: Hebei University of Engineering
Current assignee: Hebei University of Engineering
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-15

Abstract

The invention is suitable for the technical field of target detection, and discloses a target detection method and a system, wherein the method comprises the following steps: training a pre-constructed deep learning target detection model through a cloud server to obtain trained model parameters, and sending the trained model parameters to an edge server; acquiring an image to be detected through terminal equipment, determining a target area and a non-target area of the image to be detected, compressing the target area according to a first compression ratio, compressing the non-target area according to a second compression ratio to obtain a compressed image to be detected, and sending the compressed image to be detected to an edge server; the first compression ratio is less than the second compression ratio; and obtaining a trained deep learning target detection model according to the trained model parameters through the edge server, and carrying out target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result. The invention can improve the target detection precision.

Description

Target detection method and system

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a target detection method and a target detection system.

Background

The rapid development of the internet of things technology realizes the acquisition of ubiquitous information data in production and life of people, and the cloud computing and big data technology provides computing resources for the accurate processing of mass information data provided by various information sources including the internet of things. The Internet of things and cloud computing form a computing system architecture capable of realizing a ubiquitous computing concept. Nowadays, in the security industry, a widely popularized intelligent video sensor (camera) network can realize multi-dimensional security data acquisition and preprocessing, and a cloud computing center can realize accurate processing of mass media data. This model greatly supports the informatization processing capability of the public safety field.

With the continuous expansion of the scale of security networks, the amount of data collected and transmitted is increased dramatically, and the media data processing mode of the internet of things + cloud computing faces significant challenges: (1) the mass data transmission causes the shortage of network bandwidth resources; (2) the cloud computing resources are challenged by massive data processing, and the real-time requirement of security information processing is difficult to meet. In order to meet the requirement change of the public security industry for media data processing, the data processing system architecture of the internet of things + cloud computing needs to be improved necessarily. The edge calculation is a new calculation mode, and by fully utilizing the calculation and storage resources of the edge network, the requirement of mass data transmission on network bandwidth resources can be greatly reduced, and the real-time requirement of security on media data processing can be met.

However, most of the existing target detection methods based on edge calculation have low detection accuracy.

Disclosure of Invention

In view of this, embodiments of the present invention provide a target detection method and system, so as to solve the problem of low detection accuracy in the prior art.

The first aspect of the embodiment of the invention provides a target detection method, which is applied to a target detection system, wherein the target detection system comprises a terminal device, an edge server and a cloud server which are sequentially connected;

the target detection method comprises the following steps:

training a pre-constructed deep learning target detection model through a cloud server to obtain trained model parameters, and sending the trained model parameters to an edge server;

acquiring an image to be detected through terminal equipment, determining a target area and a non-target area of the image to be detected, compressing the target area of the image to be detected according to a first compression ratio, compressing the non-target area of the image to be detected according to a second compression ratio to obtain a compressed image to be detected, and sending the compressed image to be detected to an edge server; wherein the first compression ratio is less than the second compression ratio;

and obtaining a trained deep learning target detection model according to the trained model parameters through the edge server, and carrying out target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result.

A second aspect of the embodiments of the present invention provides a target detection system, which is characterized by including a terminal device, an edge server, and a cloud server, which are connected in sequence;

the cloud server is used for training a pre-constructed deep learning target detection model to obtain trained model parameters and sending the trained model parameters to the edge server;

the terminal equipment is used for acquiring an image to be detected, determining a target area and a non-target area of the image to be detected, compressing the target area of the image to be detected according to a first compression ratio, compressing the non-target area of the image to be detected according to a second compression ratio to obtain a compressed image to be detected, and sending the compressed image to be detected to the edge server; wherein the first compression ratio is less than the second compression ratio;

and the edge server is used for obtaining the trained deep learning target detection model according to the trained model parameters, and carrying out target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: the method comprises the steps that a cloud server is used for training a pre-constructed deep learning target detection model to obtain trained model parameters, and the trained model parameters are sent to an edge server; acquiring an image to be detected through terminal equipment, determining a target area and a non-target area of the image to be detected, compressing the target area and the non-target area of the image to be detected according to different compression rates to obtain a compressed image to be detected, and sending the compressed image to be detected to an edge server; obtaining a trained deep learning target detection model according to the trained model parameters through an edge server, and carrying out target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result; the target detection accuracy can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a target detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an object detection system provided in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of feature layer fusion provided by an embodiment of the present invention;

FIG. 4 is a diagram illustrating a deep learning object detection model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image to be detected according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an image to be detected with an initial target area marked;

FIG. 7 is a schematic diagram of an image to be detected with a final target area marked thereon according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a compressed image to be detected according to an embodiment of the present invention;

FIG. 9 is a schematic representation of target detection results using the SSD algorithm;

FIG. 10 is a schematic diagram of a target detection result obtained by using a target detection method provided by an embodiment of the present invention;

FIG. 11 is a graphical illustration of detection accuracy for different targets using different methods.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 is a schematic flow chart of an implementation of a target detection method according to an embodiment of the present invention, and for convenience of description, only a part related to the embodiment of the present invention is shown. The target detection method can be applied to a target detection system, as shown in fig. 2, the target detection system includes a terminal device, an edge server, and a cloud server, which are connected in sequence.

Specifically, the target detection system may be divided into three architecture layers, namely a terminal network layer, an edge server layer, and a cloud layer. The terminal network layer may include one or more terminal devices, and the terminal devices may be devices such as an intelligent video sensor, for example, a camera, etc. The edge server tier may include one or more edge servers. The cloud tier may include one or more cloud servers.

Each terminal device included in the terminal network layer is communicatively connected or partially communicatively connected with each edge server included in the edge server layer, for example, the terminal devices may be communicatively connected through a wireless network or a local area network. Each edge server included in the edge server layer is in communication connection or partial communication connection with each cloud server included in the cloud layer, for example, the communication connection may be performed through a core switching network.

As shown in fig. 1, the target detection method may include the steps of:

s101: training a pre-constructed deep learning target detection model through a cloud server to obtain trained model parameters, and sending the trained model parameters to an edge server.

In the embodiment of the invention, the cloud server is responsible for training the deep learning target detection model, and delivers the training result, namely the trained model parameters to the edge server, so that the edge server can run the deep learning target detection model after training.

The deep learning target detection model may be built by the cloud server, or may be sent to the cloud server after being built by the edge server, which is not limited specifically herein.

S102: acquiring an image to be detected through terminal equipment, determining a target area and a non-target area of the image to be detected, compressing the target area of the image to be detected according to a first compression ratio, compressing the non-target area of the image to be detected according to a second compression ratio to obtain a compressed image to be detected, and sending the compressed image to be detected to an edge server; wherein the first compression ratio is less than the second compression ratio.

Because the image data volume is huge in security protection, and the generation rate of the image data volume is far greater than the bandwidth of a wireless channel, in addition, the wireless communication environment has certain instability, and data loss is easily caused. Therefore, the acquired image is firstly subjected to primary significance detection in the terminal network layer, and then different detection areas in the image are subjected to data compression by adopting different compression rates, so that the data volume of network transmission is reduced. The compression ratio is larger, which indicates a higher degree of compression.

Specifically, the edge terminal device needs to transmit the collected data to the edge server through a wireless network. However, the data generation rate of the terminal device is much greater than the bandwidth of the wireless channel and the wireless transmission is unstable, and in order not to affect the identification accuracy of the image, a reasonable mode needs to be selected to selectively compress the image. Since the target detection of the edge server mainly analyzes a target region in an image, a target region and a background region (non-target region) of the image are distinguished in a terminal network layer, the target region is not compressed or slightly compressed, and the background region is sufficiently compressed. In general, the target area is less occupied in the image, so selective compression of the image can reduce the data transmission amount while ensuring the detection accuracy of the server.

In an embodiment of the present invention, the "determining the non-target region of the image to be detected" in S102 may include the following steps:

detecting an initial target area of an image to be detected based on a BING algorithm;

expanding the initial target area based on a preset multiple to obtain an expanded initial target area;

smoothly moving a preset detection frame in the expanded initial target area, and calculating edge response values of the detection frame under different coordinates;

and determining the area where the detection frame corresponding to the coordinate with the maximum edge response value is located as a final target area, and determining the area except the final target area in the image to be detected as a non-target area.

The final target area is the target area of the image to be detected, and the non-target area is the non-target area of the image to be detected.

On a resource-limited terminal device, a light-weight, high-detection-rate and high-rate BING algorithm is selected for salient object detection, and then an object area (a target area) and a background area (a non-target area) in an image are compressed to different degrees by adopting a JPEG2000 standard capable of designating any compression area on the image. The problem that the detection frame deviates far from a real target exists when the BING algorithm detects the image, and therefore the situation that the detection frame is not accurately positioned is caused. In the embodiment of the invention, part of the target area may be fully compressed, which affects the subsequent target detection and is difficult to meet the application requirements in the actual scene. Therefore, in order to accurately distinguish the target area from the background area in a complex scene, the BING algorithm is modified to a certain extent, and the positioning accuracy is further enhanced.

Specifically, firstly, according to the BING algorithm, a target area of an image to be detected is detected, and the target area is set as an initial target area, wherein the initial target area may be one or more. Then, expanding each initial target area to a preset multiple to obtain each expanded initial target area; wherein the preset multiple may be 1.5 times. And then, smoothly moving the detection frame in the horizontal direction and the vertical direction in the corresponding expanded initial target area, and calculating edge response values of the detection frame under different coordinates. And finally, determining the area where the detection frame corresponding to the coordinate with the maximum edge response value is located as a final target area in the process that the detection frame moves in the corresponding expanded initial target area. And after the final target areas corresponding to the initial target areas are all determined, the areas except the final target areas in the image to be detected are set as non-target areas.

The detection frame may be a rectangular frame, and the size of the detection frame may be the same as the size of the corresponding initial target area.

Optionally, before the step of moving the preset detection frame smoothly in the expanded initial target region and calculating the edge response values of the detection frame under different coordinates, the method may further include:

and carrying out median filtering on the expanded initial target area to remove noise interference.

In an embodiment of the present invention, the calculating edge response values of the detection frame at different coordinates includes:

determining an edge segmentation threshold T of the region where the detection frame is located based on the Sobel operator;

according to

And

calculating the coordinate of the upper left corner of the detection box as (x)_i,y_i) And the coordinate of the lower right corner is (x)_j,y_j) An edge response value S;

wherein f (x, y) is a pixel value of the coordinate point (x, y); g (x, y) is a target value of the coordinate point (x, y), G (x, y) is 1, the coordinate point (x, y) is represented as a target pixel point, G (x, y) is 0, and the coordinate point (x, y) is represented as a non-target pixel point; x is the number of_a≤x_i＜x_j≤x_b，y_a≤y_i＜y_j≤y_b，(x_a,y_a) To enlarge the upper left corner coordinate of the initial target region, (x)_b,y_b) The coordinates of the lower right corner of the expanded initial target area.

Specifically, edge information of the region is extracted by using a Sobel (Sobel) operator to obtain an edge segmentation threshold T, and then, the edge segmentation threshold T is projected in the horizontal and vertical directions of the image after edge threshold segmentation, and an edge response value is calculated as shown in the above formula.

Wherein x is_j-x_i＝W，y_j-y_iWhere W is the width of the detection frame and H is the height of the detection frame.

The target pixel points are pixel points of a target to be detected, and the non-target pixel points are non-target pixel points in the image to be detected. The maximum edge response value indicates that the region contains the most target pixel points, and the region is most likely to be the target region.

Alternatively, the position coordinates after the movement of the detection frame may be obtained from the initial position coordinates of the detection frame, the horizontal movement distance and the vertical movement distance during the smooth movement.

S103: and obtaining a trained deep learning target detection model according to the trained model parameters through the edge server, and carrying out target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result.

And in the edge server layer, the compressed image to be detected is subjected to target detection through the deep learning target detection model after the operation training of the edge server, so that a target detection result is obtained.

As can be seen from the above description, in the embodiment of the present invention, the cloud server is used to train the pre-constructed deep learning target detection model to obtain the trained model parameters, and the trained model parameters are sent to the edge server; acquiring an image to be detected through terminal equipment, determining a target area and a non-target area of the image to be detected, compressing the target area and the non-target area of the image to be detected according to different compression rates to obtain a compressed image to be detected, and sending the compressed image to be detected to an edge server; obtaining a trained deep learning target detection model according to the trained model parameters through an edge server, and carrying out target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result; the target detection accuracy can be improved.

In an embodiment of the present invention, the target detection method further includes:

constructing an SSD network model through an edge server; wherein the SSD network model comprises a conv4_3 feature layer, a conv5_3 feature layer and a conv10_2 feature layer;

fusing the conv4_3 feature layer, the conv5_3 feature layer and the conv10_2 feature layer to obtain a conv' 4_3 feature layer;

and replacing the conv4_3 feature layer in the SSD network model with the conv' 4_3 feature layer to obtain a deep learning target detection model.

In order to realize efficient and real-time target detection application on the edge server, a reasonable scheme is to adopt a ssd (single Shot multi box detector) algorithm based on deep learning. Compared with the fast R-CNN algorithm, the SSD algorithm has obvious speed advantage, and the detection precision can be comparable to that of YOLO. Although the SSD has a good effect on VOC data sets, it does not detect small objects (objects with pixel areas smaller than 30 pixels × 30 pixels). If the method is directly applied to target detection in security images, the problem of target missing detection is possibly caused. The output information of each layer of the multi-layer detection adopted by the SSD is independent, the information transfer among different layers is ignored, and the shallow network lacks high-semantic information, so that the detection performance of the small target is improved by transferring the high-semantic information captured in the convolution forward calculation back to a shallow characteristic layer. Based on this principle, the embodiment of the present invention provides a Feature Fusion-based 2FSSD (Feature Fusion-SSD) algorithm, that is, the above-mentioned trained deep learning target detection model is used for target detection.

Specifically, since the conv5_3 feature layer has a larger receptive field, and the conv10_2 feature layer has stronger semantic information, the embodiment of the present invention fuses the conv5_3 feature layer and the conv10_2 feature layer in the SSD backbone network with the conv4_3 layer from which the small target features were originally extracted, to form a new feature extraction layer conv' 4_3 to enhance the extraction capability of the small target features. In the SSD network, the conv4_3 feature layer is used as one of the outputs, and in the embodiment of the present invention, the conv '4 _3 feature layer obtained after fusion is used as one of the outputs instead of the conv4_3 feature layer, that is, in the deep learning target detection model, the conv4_3 feature layer is no longer used as one of the outputs, but the conv' 4_3 feature layer is used as one of the outputs.

In an embodiment of the present invention, referring to fig. 3, the above-mentioned merging the conv4_3 feature layer, the conv5_3 feature layer, and the conv10_2 feature layer to obtain a conv' 4_3 feature layer includes:

carrying out bilinear interpolation on the conv5_3 feature layer, so that the size of the feature map extracted by the conv5_3 feature layer after interpolation is the same as that of the feature map extracted by the conv4_3 feature layer;

convolving the conv10_2 feature layer by adopting a 1 × 1 convolution core, wherein the depth of the conv10_2 feature layer after convolution is the same as that of the conv4_3 feature layer;

carrying out bilinear interpolation on the convolved conv10_2 feature layer to enable the size of the feature map extracted by the conv10_2 feature layer after interpolation to be the same as the size of the feature map extracted by the conv4_3 feature layer;

and fusing the conv4_3 feature layer, the interpolated conv5_3 feature layer and the interpolated conv10_2 feature layer by adopting an element fusion method to obtain a conv' 4_3 feature layer.

Specifically, to ensure that the resolution and depth are the same between feature layers to be fused, adjustments are made to the conv5_3 and conv10_2 feature layers. The specific steps of feature fusion are shown in fig. 3.

Step 1: the conv5_3 feature layer is made to be the same resolution (size) as the conv4_3 feature layer. Bilinear Interpolation (Bilinear Interpolation) is a method for amplifying feature maps, and has high calculation speed and good Interpolation characteristic. The image features extracted from the conv4_3 layer are used as a target image, the image features extracted from the conv5_3 layer are used as an original image, and the image of the conv5_3 feature layer is interpolated.

Assuming that the size of the original image is srcW × srcH, srcW ═ srcH of the conv5_3 feature layer is 19; the size of the target image is dstW × dstH, and the size of the target image is dstW ═ dstH ═ 38 of the conv4_3 feature layer. Then by the formula

The position of the target pixel in the original image may be calculated. The corresponding relation between the target image pixel point and the original image pixel point can be determined through the formula.

Setting the coordinate of the pixel point to be solved as Q (x, y), and defining the coordinate of the pixel point adjacent to the coordinate as M₁₁(x₁,y₁),M₁₂(x₁,y₂),M₂₁(x₂,y₁),M₂₂(x₂,y₂) Wherein x is₁＝x-1，y₁＝y-1，x₂＝x+1，y₂Y + 1. Firstly, linear interpolation is carried out in the x direction to obtain a formula

Interpolation is carried out in the y direction to obtain a formula

The values of the target image pixel points can be determined through the two formulas. The interpolation process of the image can be completed through the corresponding relation between the target image pixel point and the original image pixel point and the value of the target image pixel point.

The conv5_3 feature layer uses bilinear difference to transform the resolution into 38 × 38, and then performs Batch Normalization (BN) to enhance the robustness of the network and prevent the overflow of the weight bias of the network.

Step 2: the resolution of the conv10_2 feature layer is 3 × 3, the depth of the conv10_2 feature layer is 256, and in order to ensure that the conv4_3 feature layer can be fused, the conv10_2 feature layer is firstly convolved by adopting a 1 × 1 convolution kernel, the depth of the conv10_2 feature layer is converted into 512, and then bilinear interpolation and batch normalization processing are carried out. The transformed network layer guarantees the same feature map size and depth as conv4_ 3.

And step 3: the conv4_3 layer, the conv5_3 layer and the conv10_2 layer are fused in an element fusion manner.

Through the above process, a deep learning target detection model based on SSD improvement, namely 2FSSD, can be obtained, and the structure is shown in FIG. 4.

In an embodiment of the present invention, the above-mentioned method of using element fusion to fuse the conv4_3 feature layer with the interpolated conv5_3 feature layer and the interpolated conv10_2 feature layer to obtain a conv' 4_3 feature layer includes:

according to

Obtaining a conv' 4_3 characteristic layer;

wherein Z is_{conv’4_3}To obtain the conv' 4_3 feature layer extracted feature map, X_iFeature map, Y, extracted for conv4_3 feature layer_iFeature maps extracted for interpolated conv5_3 feature layers, D_iFeature map, K, extracted for interpolated conv10_2 feature layer_iC is the depth of the conv4_3 feature layer for the convolution kernel parameters.

sending the target detection result to a cloud server through an edge server;

and storing the target detection result through the cloud server.

In the embodiment of the present invention, the edge server may send the target detection result to the cloud server, and the cloud server stores the target detection result.

Optionally, the edge server may upload data that needs further mining for computation and long-term storage to the cloud layer through the core network, and further mining or long-term storage is performed through the cloud server.

As can be seen from the above description, the embodiment of the present invention can significantly reduce the deployment cost by using a three-layer framework of a terminal network layer, an edge server layer, and a cloud layer of a target detection system, and at the network edge, a terminal device is connected to a server through a wireless or local area network without deploying a dedicated high-speed optical fiber network; the real-time performance of the detection task can be improved, and compared with the method of sending data to a remote cloud end, the real-time detection can be realized locally without interaction with the cloud through a backbone network; the redundant computing power of the terminal is fully utilized, the image is preprocessed on the terminal equipment with the redundant computing power, and the pressure of the server is reduced.

In the embodiment of the invention, 16551 pictures of the training part of VOC2007 and VOC2012 are used as a training data set, and 4952 pictures of the testing part of VOC2007 are used as a testing set.

According to the designed edge calculation target detection method and system, the training process of the detection algorithm is carried out on the cloud. The training parameter sets bich _ size to 16 and the initial learning rate lr to 0.001. After 80k iterative trainings, the learning rate is adjusted to 1/10 every 20k times, and the total number of iterations is set to 120 k. The detection accuracy and the detection speed are used as evaluation indexes of the model. The detection precision is represented by Average Precision (AP) to the recognition degree of a certain class by the algorithm, and the average precision (mAP) to the recognition degree of all target classes by the algorithm.

We select the processing effect maps of the images at various stages and perform comparative analysis. The effect of the processing of the images at different stages is shown in fig. 5-10.

Fig. 5 is an original image to be detected, fig. 6 is an image to be detected with an initial target area marked after using a BING algorithm, and as can be seen from fig. 6, the BING algorithm can detect the target area of the image, but has a position offset. Fig. 7 shows that the target area detection method provided by the embodiment of the present invention, i.e. the improved BING algorithm, is used to mark the final image to be detected of the target area, and the detection frame is adjusted, so that the occurrence of the position deviation is avoided. Fig. 8 is a compression process based on fig. 7, and it can be seen that the non-target area (background area) of fig. 8 is relatively blurred, because we adopt a compression rate of 60% for the background area in a selective compression manner, and reduce the data of the original image. Fig. 9 shows the result of target detection using the SSD algorithm, and it can be seen that the SSD algorithm has a missing detection phenomenon for a small target. Fig. 10 shows the result of target detection using the target detection method provided by the embodiment of the invention, and the information expression of the small target is more sufficient, so that the small target in the image can be detected.

In summary, the embodiment of the present invention modifies the BING algorithm at the terminal layer, thereby enhancing the recognition rate of the edge server layer algorithm to a certain extent. And the improved 2FSSD algorithm for target detection further enhances the detection rate of the algorithm for small targets. Therefore, the embodiment of the invention is ideal for the processing effect of the image in each stage.

The target detection method provided by the embodiment of the invention, the original algorithm SSD and the typical improved algorithms DSSD and FSSD thereof are used as comparison experiments, and the differences of different algorithms in the aspects of detection precision and detection speed are compared. The comparison of the detection accuracy is shown in fig. 11.

As can be seen from fig. 11, the 2FSSD algorithm provided in the embodiment of the present invention adopts a feature fusion method, which improves the detection accuracy by 2.3% compared with the original SSD algorithm; compared with the DSSD algorithm, the detection precision is improved by 0.7%; compared with FSSD, the feature layer fused in the embodiment of the invention contains richer semantic information, and the terminal layer preprocesses the image to reduce certain background information, so that the detection precision is slightly improved. In addition, as can be seen from the data in the figure, the average detection precision of the method provided by the embodiment of the invention on 6 types of small targets such as birds, ships, bottles, chairs, potted plants, televisions and the like is 66.9%, which is respectively improved by 3.8%, 1.1% and 1.8% compared with other algorithms, wherein the detection precision of the birds and the bottles is obviously improved. Therefore, the detection precision of the small target is obviously improved.

In terms of detection speed, the detection speeds of the target detection method and the SSD, DSSD and FSSD algorithms are 27.3fps, 35.7fps, 9.3fps and 29.5fps respectively. Compared with the SSD algorithm, the method has the advantages that the feature fusion operation is added, so that the detection speed is reduced; compared with DSSD algorithm, the method does not adopt ResNet-101 network with deeper network level and does not adopt complex deconvolution operation, so the detection speed is greatly improved; compared with the FSSD algorithm, only different feature layers are fused, so the detection speeds are not greatly different.

In summary, compared with other related algorithms, the target detection method according to the embodiment of the present invention has significantly improved detection accuracy (especially for detecting small targets), and can also meet the requirement of real-time detection (>25fps) in detection speed. Therefore, the target detection method provided by the embodiment of the invention can meet the real-time and efficient detection requirement on the image data in the security field.

The embodiment of the invention provides a target detection system in an edge environment, namely a layered architecture, aiming at the application of a deep learning algorithm to the target detection of a security image in the edge computing environment. On the basis, a significant region detection algorithm BING of the terminal layer is improved, and the positioning accuracy is improved; an improved algorithm 2FSSD for the problem of small target missing detection in a security image is provided on an edge server layer, the performance of the algorithm in the aspect of small target detection is tested, and the result shows that the algorithm is improved in the aspect of detection accuracy compared with other algorithms and can meet the requirement of real-time detection. The embodiment of the invention can be suitable for image target detection application in security protection.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Corresponding to the target detection method, an embodiment of the present invention further provides a target detection system, which has the same beneficial effects as the target detection method. Fig. 2 is a schematic diagram of an object detection system according to an embodiment of the present invention, and only a part related to the embodiment of the present invention is shown for convenience of description.

In the embodiment of the present invention, the target detection system may include a terminal device 21, an edge server 22, and a cloud server 23, which are connected in sequence;

the cloud server 23 is configured to train a pre-constructed deep learning target detection model to obtain trained model parameters, and send the trained model parameters to the edge server;

the terminal device 21 is configured to obtain an image to be detected, determine a target region and a non-target region of the image to be detected, compress the target region of the image to be detected according to a first compression ratio, compress the non-target region of the image to be detected according to a second compression ratio to obtain a compressed image to be detected, and send the compressed image to be detected to the edge server; wherein the first compression ratio is less than the second compression ratio;

and the edge server 22 is configured to obtain a trained deep learning target detection model according to the trained model parameters, and perform target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result.

Optionally, the terminal device 21 is further configured to:

according to

And

calculating the coordinate of the upper left corner of the detection box as (x)_i，y_i) And the coordinate of the lower right corner is (x)_j，y_j) An edge response value S;

wherein f (x, y) is a pixel value of the coordinate point (x, y); g (x, y) is a target value of the coordinate point (x, y), G (x, y) is 1, the coordinate point (x, y) is represented as a target pixel point, G (x, y) is 0, and the coordinate point (x, y) is represented as a non-target pixel point; x is the number of_a≤x_i＜x_j≤x_b，y_a≤y_i＜y_j≤yb，(x_a，y_a) To enlarge the upper left corner coordinate of the initial target region, (x)_b，y_b) The coordinates of the lower right corner of the expanded initial target area.

Optionally, the edge server 22 may also be configured to:

constructing an SSD network model; wherein the SSD network model comprises a conv4_3 feature layer, a conv5_3 feature layer and a conv10_2 feature layer;

Optionally, the edge server 22 may also be configured to:

according to

Obtaining a conv' 4_3 characteristic layer;

Optionally, the edge server 22 may also be configured to: and sending the target detection result to a cloud server.

The cloud server 23 may also be used to: and storing the target detection result.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing functional units and modules are merely illustrated in terms of division, and in practical applications, the foregoing functional allocation may be performed by different functional units and modules as needed, that is, the internal structure of the object detection system is divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed object detection system and method may be implemented in other ways. For example, the above-described embodiments of the object detection system are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. The target detection method is characterized by being applied to a target detection system, wherein the target detection system comprises a terminal device, an edge server and a cloud server which are sequentially connected;

the target detection method comprises the following steps:

training a pre-constructed deep learning target detection model through the cloud server to obtain trained model parameters, and sending the trained model parameters to the edge server;

acquiring an image to be detected through the terminal equipment, determining a target area and a non-target area of the image to be detected, compressing the target area of the image to be detected according to a first compression ratio, compressing the non-target area of the image to be detected according to a second compression ratio to obtain a compressed image to be detected, and sending the compressed image to be detected to the edge server; wherein the first compression ratio is less than the second compression ratio;

2. The object detection method according to claim 1, wherein said determining the target area and the non-target area of the image to be detected comprises:

detecting an initial target area of the image to be detected based on a BING algorithm;

3. The object detection method of claim 2, wherein the calculating the edge response values of the detection frame at different coordinates comprises:

determining an edge segmentation threshold T of the region where the detection frame is located based on a Sobel operator;

according to

And

calculating the coordinate of the upper left corner of the detection frame as (x)_i,y_i) And the coordinate of the lower right corner is (x)_j,y_j) An edge response value S;

wherein f (x, y) is a pixel value of the coordinate point (x, y); g (x, y) is a target value of the coordinate point (x, y), G (x, y) is 1, the coordinate point (x, y) is represented as a target pixel point, G (x, y) is 0, and the coordinate point (x, y) is represented as a non-target pixel point; x is the number of_a≤x_i<x_j≤x_b，y_a≤y_i<y_j≤y_b，(x_a,y_a) (x) as the coordinates of the upper left corner of the enlarged initial target area_b,y_b) The coordinate of the lower right corner of the expanded initial target area.

4. The object detection method according to claim 1, characterized in that the object detection method further comprises:

constructing an SSD network model through the edge server; wherein the SSD network model comprises a conv4_3 feature layer, a conv5_3 feature layer, and a conv10_2 feature layer;

and replacing the conv4_3 feature layer in the SSD network model with the conv' 4_3 feature layer to obtain the deep learning target detection model.

5. The object detection method of claim 4, wherein the merging the conv4_3 feature layer, the conv5_3 feature layer and the conv10_2 feature layer to obtain a conv' 4_3 feature layer comprises:

carrying out bilinear interpolation on the conv5_3 feature layer, so that the size of the feature map extracted by the interpolated conv5_3 feature layer is the same as that of the feature map extracted by the conv4_3 feature layer;

convolving the conv10_2 feature layer by adopting a convolution kernel of 1 × 1, so that the depth of the convolved conv10_2 feature layer is the same as that of the conv4_3 feature layer;

carrying out bilinear interpolation on the convolved conv10_2 feature layer to enable the size of the feature map extracted by the interpolated conv10_2 feature layer to be the same as the size of the feature map extracted by the conv4_3 feature layer;

6. The object detection method of claim 5, wherein the method of element fusion, which fuses the conv4_3 feature layer and the interpolated conv5_3 feature layer and the interpolated conv10_2 feature layer to obtain a conv' 4_3 feature layer, comprises:

according to

Obtaining a conv' 4_3 characteristic layer;

wherein Z is_{conv’4_3}Extracting feature maps, X, for the obtained conv' 4_3 feature layer_iFeature map, Y, extracted for the conv4_3 feature layer_iFeature maps extracted for said interpolated conv5_3 feature layer, D_iFeature map, K, extracted for the interpolated conv10_2 feature layer_iC is the depth of the conv4_3 feature layer for the convolution kernel parameters.

7. The object detection method according to any one of claims 1 to 6, characterized in that the object detection method further comprises:

sending the target detection result to the cloud server through the edge server;

and storing the target detection result through the cloud server.

8. A target detection system is characterized by comprising terminal equipment, an edge server and a cloud server which are connected in sequence;

and the edge server is used for obtaining a trained deep learning target detection model according to the trained model parameters, and performing target detection on the compressed image to be detected according to the trained deep learning target detection model to obtain a target detection result.

9. The object detection system of claim 8, wherein the terminal device is further configured to:

10. The object detection system of claim 9, wherein the terminal device is further configured to:

according to

And