CN112084815A

CN112084815A - Target detection method based on camera focal length conversion, storage medium and processor

Info

Publication number: CN112084815A
Application number: CN201910510818.0A
Authority: CN
Inventors: 刘若鹏; 栾琳; 季春霖; 刘凯品; 陈欢
Original assignee: Chengdu Tianfu New District Guangqi Future Technology Research Institute
Current assignee: Chengdu Tianfu New District Guangqi Future Technology Research Institute
Priority date: 2019-06-13
Filing date: 2019-06-13
Publication date: 2020-12-15

Abstract

The invention provides a target detection method based on camera focal length transformation, a storage medium and a processor, wherein the method comprises the steps of establishing a first data training set and a second data training set, wherein the first training data set is a training data set for shooting original pictures, and the second training data set is a training data set of an area where a target for camera focal length simulation transformation is located; designing a target detection network, and respectively training a first data training set and a second data training set; calling a target detection network based on the first data training set to detect the first data training set to obtain a target confidence coefficient of the first data training set; and judging whether the target confidence of the first data training set is in the confidence threshold range, if so, judging that a target object exists in the region, otherwise, changing the focal length of the camera, and obtaining a third data training set near the target region. By changing the focal length of the camera, the size of the target on the imaging result is increased, the confidence coefficient of the detected target object is improved, the false detection rate is reduced, and the detection performance is improved.

Description

Target detection method based on camera focal length conversion, storage medium and processor

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of target detection, in particular to a target detection method based on camera focal length conversion, a storage medium and a processor.

[ background of the invention ]

High resolution drone imagery is becoming more common worldwide, containing a wealth of information that may be associated with maintenance, land development, disease control, defect localization, monitoring, and other applications. These data are often transmitted over a network to a ground station where the image data is analyzed by ground station personnel to determine if any targets are available and to analyze the information contained in their targets. Because the scene that unmanned aerial vehicle often shot is great, flying height is higher, makes its target object size in the imaging result less, and this often needs huge manpower and time to analyze. With the rise of artificial intelligence, the precision of conventional target detection is greatly improved, and a target detection algorithm based on deep learning is taken as a research focus in many engineering applications and academic research. Secondly, the target detection algorithm is also urgently needed to be applied to the unmanned aerial vehicle, for example, pedestrian counting, target type judgment and the like in a monitoring scene of the unmanned aerial vehicle. Therefore, automatic target detection under the unmanned aerial vehicle nodding scene is a problem which needs to be solved urgently.

In the prior art, the method for detecting the foreign matter on the water surface has the advantages that the technical problem that the accuracy rate of detecting the foreign matter on the water surface is low in the prior art is solved, only the water surface image shot by the unmanned aerial vehicle is processed, other scenes cannot be covered, the limitation of the scenes is realized, and the scenes are small. Although the target detection algorithm of yolo v3 based on the convolutional neural network is used, the image input requirement of the algorithm is high, the processing is troublesome, and the real-time requirement of the algorithm is not satisfied.

The system also comprises an oil-gas pipeline full-intelligent inspection system based on deep learning, wherein an unmanned aerial vehicle is used for carrying a high-resolution visible light camera to obtain image data from the air, and a user can see a processed image on the ground in real time by using image transmission equipment; then, a preset deep neural network recognizer is used for automatically discovering behaviors which harm the safety of the oil and gas pipeline; once the act of compromising the pipeline safety is discovered, the relevant data can be saved and the location data, time data and image data transmitted to the regulatory authority over the 4G network and an alarm triggered. The scheme mainly aims at detecting oil pipe scenes, and simultaneously identifies behaviors damaging the safety of an oil and gas pipeline, and does not include detecting the target of the behaviors; the number of modules involved is large, and the method is complex.

The technical scheme is used for detecting the target in a specific scene, and the method is not widely applicable.

[ summary of the invention ]

The invention aims to solve the technical problem of providing a target detection method based on camera focal length conversion, a storage medium and a processor, which can increase the size of a target on an imaging result by changing the focal length of a camera, improve the confidence coefficient of a detected target object, reduce the false detection rate and improve the detection performance.

In order to solve the above technical problem, an embodiment of the present invention provides a target detection method based on camera focal length conversion, including:

establishing a first data training set and a second data training set, wherein the first training data set is an original image shooting training data set, and the second training data set is a training data set of an area where a target for simulating focal length transformation of a camera is located;

designing a target detection network based on a deep convolutional network, and respectively training a first data training set and a second data training set;

calling a target detection network based on the first data training set to detect the first data training set to obtain a target confidence coefficient of the first data training set;

judging whether the target confidence of the first data training set is in a confidence threshold range, if so, judging that a target object exists in the region, otherwise, changing the focal length of the camera, and obtaining a third data training set near the target region;

and training the third data training set by using a target detection network based on the second data training set to obtain the position information and the confidence coefficient of the target in the first data training set.

Preferably, establishing the first training set of data and the second training set of data comprises: the method comprises the steps that a camera shoots a target area video, the target area video is decoded, and an original image picture is obtained.

Preferably, establishing the first training set of data and the second training set of data comprises: and simulating focal length conversion by a camera to obtain the peripheral picture of the target area.

Preferably, the confidence threshold range is 0.5-1.

Preferably, the target detection network based on the deep convolutional network adopts four residual modules to form a backbone network of the detection network.

Preferably, the shooting of the target area video by the camera, the decoding of the target area video, and the obtaining of the original image picture includes: and in the decoding process, storing one frame of original image every 20-30 frames.

Preferably, the original image including more than three target objects is selected from the original image as data to be labeled in the first data training set.

Preferably, simulating focal length transformation by a camera, and acquiring the picture around the target region includes: and changing the focal length of the camera, recording the periphery of the target area to acquire a video, and decoding the video.

Preferably, a first residual module of the four residual modules includes one residual base module, a second residual module includes two residual base modules, a third residual module includes two residual base modules, and a fourth residual module includes four residual base modules, and an input size of the fourth residual module is changed from 416 pixels × 416 pixels to 26 pixels × 26 pixels after passing through the four residual modules.

Preferably, the target in the original image of the target object is labeled, so as to obtain the coordinates of the upper left corner and the lower right corner of the rectangular frame in the image and the type of the target object, and the labeling result is stored.

Preferably, in the decoding process, a frame of peripheral picture of the target area is saved every 20-30 frames.

In another aspect, an embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and the program executes the above object detection method when running.

An embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes the above object detection method when running.

Compared with the prior art, the technical scheme has the following advantages: and establishing two training sets, and designing a target detection network to respectively train the two data sets. In the design network process, through observing unmanned aerial vehicle imaging data, design a detection model who is fit for unmanned aerial vehicle and crouches the data, make wherein the fine abstract of the characteristic of small-object come out. In the model calling stage, a model trained by using original data is called, a detected target larger than a threshold value is enabled to directly output a result by setting a target confidence threshold value, in the detected target which is not larger than the threshold value, an image near the detected target area is obtained by adjusting the focal length of an airborne camera, the model trained by using data simulating focal length conversion is called, the model can be better detected, and the confidence of the model in an original image is improved. Through the change to unmanned aerial vehicle machine camera focus for the size grow of target on the imaging result improves the confidence that is detected the target object, compares with only calling a model and singly modifying the confidence threshold, has reduced the false retrieval rate, has promoted the detection performance.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flow chart of a target detection method based on camera focal length conversion.

FIG. 2 is a flow chart of a tow-stage target detection algorithm based on deep learning.

FIG. 3 is a flowchart of a deep learning based one-stage target detection algorithm.

Fig. 4 is a schematic structural diagram of basic modules of a residual error network in a target detection method based on camera focal length transformation.

Fig. 5 is a network training flow chart in the target detection method based on camera focal length conversion.

FIG. 6 is a flow chart of a preferred embodiment of a target detection method based on camera focal length conversion.

[ detailed description ] embodiments

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Fig. 1 is a flow chart of a target detection method based on camera focal length conversion. As shown in fig. 1, a target detection method based on camera focal length transformation includes the steps of:

s11, establishing a first data training set and a second data training set, wherein the first training data set is an original image shooting training data set, and the second training data set is an area training data set where a target for simulating focal length transformation of a camera is located;

s12, designing a target detection network based on a deep convolutional network, and respectively training a first data training set and a second data training set;

s13, calling a target detection network based on the first data training set to detect the first data training set to obtain a target confidence coefficient of the first data training set;

s14, judging whether the confidence coefficient of the target in the first data training set is in the threshold range of the confidence coefficient, if so, judging that a target exists in the region, otherwise, changing the focal length of the camera, and obtaining a third data training set near the target region;

and S15, training a third data training set by using a target detection network based on the second data training set, and obtaining the position information and the confidence coefficient of the target in the first data training set.

FIG. 2 is a flow chart of a tow-stage target detection algorithm based on deep learning. The target detection algorithm based on deep learning is mainly divided into a one-stage idea and a two-stage idea. two steps are needed for two stages to obtain a detection result by inputting a picture into a detection network, and a candidate frame possibly containing a target object in an image needs to be generated in the first step; and in the second step, fine adjustment and classification are carried out on the position information of the candidate frame by utilizing the characteristics of the candidate frame, so that the confidence coefficient and the position of the target category are obtained. The one-stage idea does not need to generate a candidate frame possibly containing a target object, and the confidence coefficient and the position of the target type are directly obtained through a feature map generated by a convolutional neural network.

FIG. 3 is a flowchart of a deep learning based one-stage target detection algorithm. As shown in fig. 3, in the step of network design, in order to ensure real-time performance, the present invention designs a detection network with fewer parameters, and designs the detection network for training by using the one-stage idea of the target detection algorithm, that is, only one convolutional neural network is needed from image input to output.

Among the two ideas, the two ideas are high in precision, but low in speed, and cannot meet the real-time requirement of engineering application; the speed of the one-stage idea has advantages. In recent years, with the development of one-stage algorithm in the target detection algorithm based on deep learning, the precision of the idea in the target detection field can be compared with the algorithms using the two-stage idea, such as the SSD series and the yolo series. Therefore, in the scheme, the network design is carried out by utilizing a one-stage idea.

Fig. 4 is a schematic structural diagram of basic modules of a residual error network in a target detection method based on camera focal length transformation. As shown in fig. 4, the target detection network used in the present invention uses a deep residual error network as a backbone network. The residual network is the network proposed in 2015, and the characteristic extraction performance of the residual network is superior to that of other deep networks. The one-stage idea target detection network has a deeper backbone network, such as SSD series and yolo series, which is more beneficial to extracting and abstracting image features. However, under the long-scene overlook shooting of the unmanned aerial vehicle, the size of the target object is smaller than that of the original image, and if a deeper backbone network is adopted for feature extraction of the image, the small target may be completely abstracted, so that the performance of the algorithm is affected, and therefore a shallower residual module is used as the backbone network.

The invention adopts four residual modules to form a backbone network of the detection network, so that the network result can well extract image characteristics, and the abstract characteristics of small target objects are reserved. The number of residual error modules adopted by the invention is less than that of the SSD series and the yolo series. A basic residual block is shown in fig. 4, where x is the input to the residual block, and f (x) is the original mapping of the convolutional neural network. relu is the activation function in the depth residual module, h (x) is the output function of the depth residual module, and the depth residual module adds the original mapping f (x) and the input x to form a new network output function. The first residual module of the four residual modules comprises a residual basic module, the second residual module comprises two residual basic modules, the third residual module comprises two residual basic modules, the fourth residual module comprises four residual basic modules, and the input size of the fourth residual module is changed from 416 pixels by 416 pixels to 26 pixels by 26 pixels after passing through the four residual basic modules.

In one-stage idea, the last detection module generates a plurality of frames with different aspect ratios for each pixel (which can be regarded as a grid) on the feature map generated based on the backbone network to complete the prediction of the target position. If the center of the target object falls within a certain grid of the feature map, the grid will participate in the target position prediction. Therefore, the frames with different aspect ratios need to be predefined. Because of the large range of target sizes in their common data set, a grid is required that defines a large range aspect ratio. However, in a large scene of the unmanned aerial vehicle, the target size is small relative to the original image, and the target size does not change in the original image in a large range, so that the aspect ratio of the frame is adopted in a small range in the present invention.

In the deep learning field, a network forms the cognitive ability of the network on data through the learning, namely training process of the data. During the training process, the algorithm generally guides the network learning through a loss function. When the function value of the loss function reaches the minimum value, the network training is finished, and the optimal state is reached. The loss function used in the present invention is:

the invention totally uses the mean square sum error as the loss function, and consists of three parts: coordinate error, IOU error, and classification error. The corrdererr is a coordinate error which mainly guides the network to learn the coordinate position of a frame to be predicted, and the iouErr mainly guides the network to learn whether a certain grid (pixel) on a characteristic diagram contains a target object or not so as to guide the network to predict the target position. clsrer mainly directs learning where a certain target object is contained in the grid. In the above formula, i is the position on the feature, S²Is the size of the feature map, sxs.

Fig. 5 is a network training flow chart in the target detection method based on camera focal length conversion. As shown in fig. 5, a network training procedure in a target detection method based on camera focal length transformation includes the steps of:

s21, establishing a training data set, wherein the training data set comprises a first data training set and a second data training set; the first training data set is a training data set for shooting original images, and the second training data set is a training data set of an area where a target for simulating focal length transformation of a camera is located;

s22, designing a target detection network, wherein the target detection network is based on a deep convolutional network and is respectively used for training a first data training set and a second data training set;

s231, training a first model by utilizing a first data training set;

s232, training a second model by using a second data training set;

s241, outputting a first network model;

and S242, outputting the second network model.

And after network training, network output is carried out. The scheme has established data sets, trains the network by using the two data sets, guides the network training by a loss function and assumes that the network model has reached the optimal state. Here, the model trained using the original image is referred to as a first network model, and the model trained using the target peripheral region is referred to as a second network model.

The invention improves the precision of target detection by controlling the change of the focal length of the airborne camera. The method comprises the following specific steps:

when the unmanned aerial vehicle navigates, firstly, the returned image data is decoded, then the model 1 is called to detect the target object in the original picture, and the image is subjected to target detection to obtain the confidence coefficient and the position of the target type. It is generally defined that an object is considered to be a certain object when the confidence is greater than 0.5 and less than 1. In this step, if the confidence of the detected target object is greater than the threshold value 0.5, the target detection result is directly output, and the target object is considered to be the target region.

When the confidence coefficient of the detected target object is larger than the threshold value 0.5, the result is directly output, but in a large scene of the unmanned aerial vehicle in a nodding mode, the confidence coefficient obtained by a detection algorithm is often lower. If a lower confidence threshold is set for the original image detection directly, more false detections may be caused. Therefore, in the invention, when the confidence coefficient detected by the original image is between 0.1 and 0.5, the focal length of the onboard camera is adjusted by software, so that the target area with the snapshot threshold value between 0.1 and 0.5 is obtained, the third data training set near the target area is obtained, the obtained image is input into the second network model, and the confidence coefficient of the object is judged again, so that the confidence coefficient is improved, and the false detection is reduced. In the step, the focal length of the airborne camera is controlled to be increased through software, so that the target is increased relative to the original image in the imaging effect. When the system judges that the confidence coefficient of a certain target is low and the focal length of the unmanned aerial vehicle needs to be adjusted, the focal length range to be adjusted is compressed into a binary file through network communication or infrared Bluetooth and is transmitted to a camera embedded system in the unmanned aerial vehicle, so that the focal length is controlled and changed. Therefore, the confidence coefficient is improved, the probability of false detection in the original image is reduced, and the purpose of improving the detection performance is achieved

And integrating the detection results of the first network model and the second network model, wherein the confidence of the target outputs a higher value according to the two models, and the position information of the target is uniformly output according to the coordinates of the original image.

FIG. 6 is a flow chart of a preferred embodiment of a target detection method based on camera focal length conversion. As shown in fig. 6, the network training phase is entered: establishing a data set, designing a network and training the network, and outputting a first network model and a second network model. Inputting an original image shot by a camera, calling a first network model, judging whether the confidence coefficient is greater than 0.5, and outputting a first result if the confidence coefficient is greater than 0.5; if not, the focal length is enlarged, an image near the target area with the confidence coefficient lower than 0.5 is obtained, then the second network model is called, and a second result is output. And integrating the output first result and the output second result, and outputting a final result.

Example two

The embodiment of the invention also provides a storage medium, which comprises a stored program, wherein when the program runs, the target detection method flow based on the camera focal length transformation is executed.

Alternatively, in the present embodiment, the storage medium may be configured to store program codes for executing the following target detection method flow based on camera focal length conversion:

s11, establishing a first data training set and a second data training set, wherein the first training data set is an original image shooting training data set, and the second training data set is an area training data set where a target for simulating focal length transformation of a camera is located; establishing the first training set of data and the second training set of data includes: shooting a target area video by a camera, decoding the target area video, and acquiring an original image; and simulating focal length conversion by a camera to obtain the peripheral picture of the target area. Shooting a target area video by a camera, decoding the target area video, and acquiring an original image picture, wherein the steps comprise: and in the decoding process, storing one frame of original image every 20-30 frames. And selecting an original image comprising more than three target objects from the original image as data to be labeled in a first data training set. Simulating focal length conversion by a camera to acquire the peripheral picture of the target area comprises the following steps: and changing the focal length of the camera, recording the periphery of the target area to acquire a video, and decoding the video.

And S12, designing a target detection network based on the deep convolutional network, and respectively training a first data training set and a second data training set.

And S13, calling a target detection network based on the first data training set to detect the first data training set, and obtaining the target confidence of the first data training set.

And S14, judging whether the target confidence of the first data training set is in a confidence threshold range, if so, judging that a target object exists in the region, otherwise, changing the focal length of the camera, and obtaining a third data training set near the target region. The confidence threshold range is generally set to 0.5-1.

And S15, training a third data training set by using a target detection network based on the second data training set, and obtaining the position information and the confidence coefficient of the target in the first data training set. The target detection network based on the deep convolutional network adopts four residual modules to form a backbone network of the detection network. And marking the target in the original image of the target object to obtain the coordinates of the upper left corner and the lower right corner of the rectangular frame in the image and the type of the target object, and storing a marking result.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Therefore, by adopting the storage medium of the invention, the storage capacity is reduced, the size of the target on the imaging result is increased by changing the focal length of the camera through the built-in program of the target detection method flow based on the focal length change of the camera, the confidence coefficient of the detected target object is improved, the false detection rate is reduced, and the detection performance is improved.

EXAMPLE III

Embodiments of the present invention further provide a processor, configured to execute a program, where the program executes to perform the steps in the target detection method based on camera focal length transformation.

Optionally, in this embodiment, the program is configured to perform the following steps:

Optionally, for a specific example in this embodiment, reference may be made to the above-described embodiment and examples described in the specific implementation, and details of this embodiment are not described herein again.

Therefore, by adopting the processor and the built-in program of the target detection method flow based on the camera focal length conversion, the size of the target on the imaging result is increased by changing the focal length of the camera, the confidence coefficient of the detected target object is improved, the false detection rate is reduced, and the detection performance is improved.

As can be seen from the above description, with the target detection method based on camera focal length transformation, the storage medium, and the processor according to the present invention, two training sets are established, and a target detection network is designed to respectively train the two data sets; in the network design process, by observing the imaging data of the unmanned aerial vehicle, a detection model suitable for the nodding data of the unmanned aerial vehicle is designed, so that the characteristics of small targets in the detection model are well abstracted; in the model calling stage, firstly calling a model trained by using original data, directly outputting a result of a detected target which is larger than a threshold value by setting a target confidence threshold value, in the detected target which is not larger than the threshold value, obtaining an image near the detected target area by adjusting the focal length of an airborne camera and calling the model trained by using data simulating focal length conversion, so that the model can be better detected, and the confidence of the model in an original image is improved; through the change to unmanned aerial vehicle machine camera focus for the size grow of target on the imaging result improves the confidence that is detected the target object, compares with only calling a model and singly modifying the confidence threshold, has reduced the false retrieval rate, has promoted the detection performance.

The above embodiments of the present invention are described in detail, and the principle and the implementation of the present invention are explained by applying specific embodiments, and the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A target detection method based on camera focal length conversion is characterized by comprising the following steps:

2. The camera focal length transformation-based target detection method of claim 1, wherein establishing the first training set of data and the second training set of data comprises: the method comprises the steps that a camera shoots a target area video, the target area video is decoded, and an original image picture is obtained.

3. The camera focal length transformation-based target detection method of claim 1, wherein establishing the first training set of data and the second training set of data comprises: and simulating focal length conversion by a camera to obtain the peripheral picture of the target area.

4. The target detection method based on camera focal length transformation according to claim 1, wherein the confidence threshold is in a range of 0.5-1.

5. The method according to claim 1, wherein the target detection network based on the deep convolutional network adopts four residual modules to form a backbone network of the detection network.

6. The method for detecting the target based on the focal length transformation of the camera as claimed in claim 2, wherein the step of capturing the target area video by the camera, decoding the target area video, and acquiring the original image comprises the steps of: and in the decoding process, storing one frame of original image every 20-30 frames.

7. The method for detecting the target based on the focal length transformation of the camera as claimed in claim 2, wherein the original image including three or more target objects is selected from the original image as the data to be labeled in the first training set of data.

8. The target detection method based on camera focal length transformation according to claim 3, wherein the obtaining the picture of the periphery of the target area by simulating focal length transformation by the camera comprises: and changing the focal length of the camera, recording the periphery of the target area to acquire a video, and decoding the video.

9. The method of claim 5, wherein a first residual block of the four residual blocks comprises a residual base block, a second residual block comprises two residual base blocks, a third residual block comprises two residual base blocks, and a fourth residual block comprises four residual base blocks, and an input size of the fourth residual block is 416 pixels by 416 pixels, and becomes 26 pixels by 26 pixels after passing through the four residual blocks.

10. The target detection method based on focal length transformation of the camera as claimed in claim 7, characterized in that the target in the original image of the target object is labeled to obtain the coordinates of the upper left corner and the lower right corner of the rectangular frame in the image and the type of the target object, and the labeling result is saved.

11. The method for detecting the target based on the camera focal length transformation as claimed in claim 8, wherein a frame of the target area peripheral picture is saved every 20-30 frames in the decoding process.

12. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program when executed performs the object detection method of any one of claims 1 to 11.

13. A processor, configured to run a program, wherein the program when running performs the object detection method of any one of claims 1 to 11.