CN116740622B

CN116740622B - Dense oil drop target detection counting method and device based on multi-scale feature coding

Info

Publication number: CN116740622B
Application number: CN202311027313.1A
Authority: CN
Inventors: 王安东; 刘畅; 路峰; 宋建彬; 车纯广; 高树润
Original assignee: Shandong Yellow River Delta National Nature Reserve Management Committee; Beijing Information Science and Technology University
Current assignee: Shandong Yellow River Delta National Nature Reserve Management Committee; Beijing Information Science and Technology University
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-10-27
Anticipated expiration: 2043-08-16
Also published as: CN116740622A

Abstract

The invention discloses a dense oil drop target detection counting method and device based on multi-scale feature coding. The method comprises the following steps: step 1, collecting dense oil drop target video data; step 2, identifying and positioning oil drops of video data to obtain a boundary box of each oil drop; the step 2 comprises the following steps: firstly, encoding multi-scale features of dense oil drop target video data to obtain feature weights; then, carrying out enhancement processing on the feature map to obtain an enhanced feature map; secondly, guiding the enhanced feature map to obtain a feature map of dense oil drops; and finally, detecting the feature map of the dense oil drop target by using regression and classification ideas to obtain the boundary box of each oil drop. The invention relates to the technical field of computer vision and the field of oil exploitation and processing, and solves the technical problems that in dense oil drop target detection, multiple forms exist and shielding is difficult to detect. The invention can improve the learning ability of the network to targets in different forms and can also enhance the recognition ability to dense oil drop shielding scenes.

Description

Dense oil drop target detection counting method and device based on multi-scale feature coding

Technical Field

The invention relates to the field of computer vision and the field of oil exploitation and processing, in particular to a dense oil drop target detection counting method and device based on multi-scale feature coding.

Background

The machine vision technology can be used for acquiring and analyzing parameters of oil-water two-phase and multiphase flow fluid and monitoring oil drops in the oil well exploitation process. Machine vision technology, one of the important branches of artificial intelligence technology, has now been tightly coupled with oil exploitation and well parameter measurement. There are some typical problems in dense oil drop target detection tasks, mainly including target information loss, low tolerance to boundary frame disturbances, and variable target oil drop morphology, so it is very difficult to detect dense oil drop targets in video. The invention applies image video recognition understanding to the field of petroleum exploitation and processing, and realizes detection and counting of multiple oil drops in an oil-water two-phase flow by utilizing a machine vision technology.

Disclosure of Invention

The invention aims to provide a method and a device for detecting and counting dense oil drop targets based on multi-scale feature coding, which can solve the technical problems that the dense oil drop targets are difficult to detect due to multi-morphology and shielding in detection.

In order to achieve the above object, the present invention provides a method and apparatus for detecting and counting dense oil drop targets based on multi-scale feature encoding, which includes:

step 1, a monitoring device collects dense oil drop target video data;

step 2, identifying and positioning all oil drops in the dense oil drop target video data through a dense oil drop target detection network model to obtain a boundary box of each target;

the method for identifying and positioning in the step 2 specifically comprises the following steps:

step 21a, performing multi-scale feature coding on the dense oil drop target video data to obtain feature weights W;

step 22a, performing enhancement processing on the feature weights W of step 21a to obtain an enhanced feature map；

Step 23a, conducting guiding treatment on the enhanced feature map in the step 22a to obtain a feature map of the dense oil drop target;

and step 24a, detecting the characteristic diagram of the dense oil drop targets in step 23a by using regression and classification ideas to obtain a boundary box of each target.

Further, step 21a specifically includes:

step 211, outputting a basic feature map F according to the dense oil drop target video data and the cross-stage local network, and performing multi-scale feature coding on the basic feature map F by three kinds of cavity convolution with different preset expansion rates by adopting the following formula (1) to obtain a multi-scale feature map；

（1）

In the method, in the process of the invention,convolution operation representing a convolution of a hole with a preset expansion rate a,>convolution operation representing a convolution of a hole with a preset expansion ratio b, < >>A convolution operation representing a hole convolution with a preset expansion rate c;

step 212, calculating a multi-scale feature map using the following equation (2)Feature weights W assigned to each scale feature:

（2）

in the method, in the process of the invention,for normalization function->Is a convolution operation;

step 22a specifically includes: the multiscale feature map is represented by the following formula (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>：

（3）

In the method, in the process of the invention,is an element-wise operation function.

Further, a=3, b=5, c=7.

Further, step 23a specifically includes:

step 231, mapping the enhanced features of step 22a with the following formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average partitioningFor N block subset->，/>Is->Is>Block subset feature, feature of another projection space +.>Average partitioning into N block subsets，/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into N block subsets>，/>Is->Is>Block subset feature->；

（4）

Step 232, respectively taking、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>、Respectively corresponding to Query, key and Value in the self-attention mechanism, and then calculating an enhanced feature map by the following formula (7)Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:

（7）

in the method, in the process of the invention,representing the attention features in the self-attention mechanism, < >>Representing a self-attention operation in a self-attention mechanism;

step 233, repeating step 232 until N block subsets { are obtained，/>，…，/>-re-subset {>，/>，…，/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.

Further, the method for acquiring the dense oil drop target detection network model in the step 2 comprises the following steps:

step 21b, setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;

step 22b, constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;

step 23b, performing geometric transformation on the image in the dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.

The invention also provides a dense oil drop target detection counting device based on multi-scale feature coding, which comprises:

the monitoring device is used for collecting dense oil drop target video data;

the object boundary box unit is used for identifying and positioning all oil drops in the dense oil drop object video data through the dense oil drop object detection network model to obtain a boundary box of each object;

the target bounding box unit specifically comprises:

the feature coding network is used for carrying out multi-scale feature coding on the dense oil drop target video data to obtain feature weights W;

the multi-scale feature coding module is used for carrying out enhancement processing on the feature weights W to obtain an enhanced feature map；

The attention guiding module is used for guiding the enhanced feature map of the multi-scale feature encoding module to obtain a feature map of the dense oil drop target;

and the head prediction module is used for detecting the characteristic diagram of the dense oil drop targets of the attention guide module by using regression and classification ideas to obtain a boundary box of each target.

Further, the feature encoding network specifically includes:

the multi-scale feature map module is used for outputting a basic feature map F according to dense oil drop target video data and a cross-stage local network, and carrying out multi-scale feature coding on the basic feature map F by adopting the following formula (1) through three kinds of cavity convolutions with different preset expansion rates to obtain the multi-scale feature map；

（1）

a weight distribution module for calculating a multi-scale feature map by the following formula (2)Feature weights W assigned to each scale feature:

（2）

the multi-scale feature encoding module specifically comprises: the multiscale feature map is represented by the following formula (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>：

（3）

Further, a=3, b=5, c=7.

Further, the attention directing module is characterized in that it specifically comprises:

a subset average molecular module for mapping the enhanced features of the multi-scale feature encoding module using the following formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average division into N block subsets>，/>Is->Is>Block subset feature, feature of another projection space +.>Average division into N block subsets>，/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into N block subsets>，/>Is->Is>The block subset feature is that,；

（4）

small object feature focus sub-modules for taking separately、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>、/>Respectively correspond to the self-attention mechanismQuery, key, value, and then (7) calculates the enhancement feature map ++>Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:

（7）

a small target feature map focus submodule for obtaining N block subsets { at the small target feature focus submodule，/>，…，/>After }, subset {>，/>，…，/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.

Further, the method for acquiring the dense oil drop target detection network model comprises the following steps:

Due to the adoption of the technical scheme, the invention has the following advantages:

the method and the device can acquire information around the dense oil drops, including the internal, outline and peripheral information of the target, improve the learning ability of the network to targets in different forms, allocate different weights to the characteristic information, and enhance the recognition ability to the shielding scene of the dense oil drops.

Drawings

Fig. 1 is a schematic flow chart of a method and a device for detecting and counting dense oil drop targets based on multi-scale feature coding according to the embodiment of the invention.

Fig. 2 is a schematic diagram of a feature map for achieving a dense oil droplet target according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a target bounding box unit according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings and examples.

As shown in fig. 1, the method and the device for detecting and counting dense oil drop targets based on multi-scale feature coding provided by the embodiment of the invention include:

and step 1, the monitoring equipment collects the dense oil drop target video data.

In this embodiment, the object to be detected by the small object is oil droplets, where "small" is understood as that the scale of the oil droplet object in the image is smaller than 32×32 pixels. The platform selected in this embodiment is a combination of NVIDIA RTX 3090 and 24G RAM, and the operating system is ubuntu18.04, implemented under the deep learning framework Pytorch 1.7.0.

And 2, identifying and positioning all oil drops in the dense oil drop target video data through a dense oil drop target detection network model, and obtaining a boundary box of each target.

and 21a, performing multi-scale feature coding on the dense oil drop target video data to obtain feature weights W.

Step 22a, performing enhancement processing on the feature weights W of step 21a to obtain an enhanced feature map。

And step 23a, conducting guiding processing on the enhanced feature map of the step 22a to obtain a feature map of the dense oil drop target. Wherein the "directing process" is a dense oil droplet that is adapted to focus attention on the image by means of an attention directing mechanism.

And step 24a, detecting the feature map of the dense oil drop targets in step 23a by using regression and classification ideas to obtain a boundary box of each target, and realizing positioning and classification. The concept of regression and classification refers to that the target detection task comprises a classification branch and a regression branch, wherein the classification branch is used for target identification, and the regression branch is used for target positioning.

In one embodiment, as a preferred implementation of step 21a, it specifically includes:

step 211, outputting a basic feature map F according to the dense oil drop target video data and a cross-stage local network (Cross Stage Paritial Network, abbreviated as CSP in english), wherein the height, width and channel of the basic feature map F are h, w and C respectively, and performing multi-scale feature encoding on the basic feature map F by three kinds of cavity convolution with different preset expansion rates by adopting the following formula (1), wherein the extracted features are as follows: small target internal feature, outline feature and peripheral informationPartial features, finally forming a multi-scale feature map。

（1）

In the method, in the process of the invention,convolution operation representing a convolution of a hole with a preset expansion rate a,>convolution operation representing a convolution of a hole with a preset expansion ratio b, < >>A convolution operation representing a convolution of a hole with a preset expansion rate c.

（2）

in the method, in the process of the invention,for normalization function->Is a convolution operation.

In one embodiment, by setting the hole convolution expansion rates to a=3, b=5, c=7, respectively, samples can be taken at fixed steps (3, 5, 7) on the input signature, but the locations of the samples are not continuous, but rather, the samples are taken at intervals by skipping a number of pixels, which can help the model capture relevance and context information between pixels further away.

Of course, step 21a may be implemented by using a Mask segmentation network in Mask RCNN instead of the above embodiment, for example, on the basis of target positioning, a pixel level Mask of each target instance is generated, then the Mask segmentation network operates on the candidate frame, and an accurate segmentation Mask of the target is generated, so as to implement pixel level segmentation of the target, thereby accurately detecting the small target.

In one embodiment, step 22a may be implemented as follows:

the multiscale feature map is represented by the following formula (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>：

（3）

In practice, step 22a may also be implemented using, for example, a VGG16 convolutional network or other prior art technique.

As an implementation manner of step 23a, this embodiment adopts a self-attention structure, as shown in fig. 2, and the module guides the attention of the network to the small occlusion object, so as to enhance the recognition capability of the network model to the small occlusion scene object. Step 23a specifically includes:

step 231, mapping the enhanced features of step 22a with the following formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average division into N block subsets>，/>Is->Is>Block subset feature, feature of another projection space +.>Average partitioning into N block subsets，/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into N block subsets>，/>Is->Is>Block subset feature->；

（4）

（7）

in the method, in the process of the invention,representing the attention features in the self-attention mechanism, < >>Representing a self-attention operation in a self-attention mechanism.

Another implementation of step 23a may be implemented, for example, using an existing Convolutional Block Attention Module convolution attention mechanism module.

In one embodiment, the method for obtaining the dense oil drop target detection network model in step 2 includes:

step 21b, a Yolov5, a multi-scale feature encoding module and an attention guiding module are provided, wherein the Yolov5 comprises a feature encoding network and a head prediction module. Wherein Yolo 5 is the target detection model, the english full name of "Yolo" is "You only look once", the corresponding chinese full name is "you view once", and the chinese of "v5" means "fifth version".

In step 22b, a dense oil drop target dataset is constructed, which includes images in the dense oil drop target video data that have been annotated with all oil drops.

The method for labeling oil drops in the dense oil drop target video data adopts CC Leabeler software to label all the oil drops in the image. In addition, the data labeling form adopts a method of a boundary box, and each boundary box comprises four groups of data, namely, the horizontal and vertical coordinate values of the upper left corner and the lower right corner of the boundary box. To enhance the generalization performance of the model, a small target detection dataset may be constructed and data enhanced. The data enhancement includes multi-mode random adjustment pairs, such as random adjustment of small target number, picture brightness, contrast and saturation, to obtain a richer image dataset.

In step 23b, in order to enhance the detection performance of the dense oil drop target detection network model on the dense oil drop target, data enhancement is performed during network training, and the data covers more actual scenes and various morphological changes of the target as much as possible. For example: the geometric transformation is carried out on the image in the dense oil drop target video data marked with all oil drops, the morphological transformation of various targets caused by the shooting angle and the self position of the targets during actual shooting is simulated, and the geometric transformation such as random cutting, random overturning, random scale scaling, random perspective transformation and the like is carried out on the data. Meanwhile, through carrying out random blurring processing, adding random noise, saturation, contrast adjustment and other color transformations on images in the dense oil drop target video data marked with all oil drops, the actual shooting environment conditions, such as the time, weather, shooting equipment and the like of the actual shooting environment, are simulated.

In order to improve the performance of the dense oil drop target detection network model, an appropriate anchor frame can be further arranged so as to fit small targets in the dense oil drop target dataset. And (3) re-clustering the anchor frames in the target data set by using a Kmeans algorithm to obtain anchor frame parameters which are more suitable for the current dense oil drop target data set, thereby improving the accuracy and convergence speed of the dense oil drop target detection network model.

The embodiment of the invention also provides a dense oil drop target detection counting device based on multi-scale feature coding, which comprises monitoring equipment and a target boundary box unit, wherein: the monitoring equipment is used for collecting the dense oil drop target video data, and the target bounding box unit is used for identifying and positioning all oil drops in the dense oil drop target video data through the dense oil drop target detection network model to obtain the bounding box of each target.

The target bounding box unit specifically includes a feature encoding network, a multi-scale feature encoding module, an attention guiding module and a head predicting module, as shown in fig. 3, wherein:

the feature coding network is used for carrying out multi-scale feature coding on the dense oil drop target video data to obtain feature weights W.

The multi-scale feature coding module is used for carrying out enhancement processing on the feature weight W to obtain an enhanced feature map。

The attention guiding module is used for guiding the enhanced feature map of the multi-scale feature encoding module to obtain a feature map of the dense oil drop target.

The head prediction module is used for detecting the feature map of the dense oil drop targets of the attention guide module by using regression and classification ideas to obtain a boundary box of each target and realize positioning and classification.

In one embodiment, the feature encoding network specifically includes a multi-scale feature map module and a weight distribution module, where:

the multi-scale feature map module is used for outputting a basic feature map F according to dense oil drop target video data and a cross-stage local network, wherein the height, width and channel of the basic feature map F are h, w and C respectively, and then the basic feature map F is subjected to multi-scale feature coding by adopting the formula (1) through three kinds of cavity convolutions with different preset expansion rates to obtain the multi-scale feature map。

The weight distribution module is used for calculating a multi-scale characteristic diagram by using the formula (2)A feature weight W assigned to each scale feature.

The multi-scale feature encoding module specifically comprises: mapping the multiscale features by the method (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>。

In one embodiment, the attention directing module specifically includes a subset average molecular module, a small target feature focus sub-module, and a small target feature map focus sub-module, wherein:

the subset average sub-module is used for mapping the enhancement features of the multi-scale feature encoding module by the above formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average partitioning into N block subsets，/>Is->Is>Block subset feature, feature of another projection space +.>Average division into N block subsets>，/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into->Block subset->，/>Is->Is>Block subset feature->。

Small object feature focus submodule for taking separately、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>、/>Respectively corresponding to Query, key and Value in the self-attention mechanism, and calculating enhancement feature map ++using the above formula (7)>Similarity between pixels and one of the N subsets of blocks, the process-intensive oil droplet targets are weighted adaptively from global.

The small target feature map focusing submodule is used for obtaining N block subsets { at the small target feature focusing submodule，/>，…，After }, subset {>，/>，…，/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.

The dense oil drop target detection counting device based on multi-scale feature coding in one embodiment further comprises a dense oil drop target detection network model acquisition module, wherein the dense oil drop target detection network model acquisition module is used for acquiring a dense oil drop target detection network model by the following steps:

setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;

constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;

the method comprises the steps of performing geometric transformation on images in dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.

When the oil drop detection is carried out, one frame of dense oil drop target video data is required to be sent into a trained and optimized dense oil drop target detection network model. The network performs multi-scale feature coding through a feature coding network and a multi-scale feature coding module, and outputs a feature map of a dense oil drop target through a attention guiding module. This feature map is then subject to object detection by the head prediction module using regression and classification concepts. Finally, the classification and positioning results of the small targets can be obtained through confidence threshold filtering. The process is repeated for each dense oil drop target detection network model in the test set until the target detection task of the whole test set is completed.

In the above embodiments, the data amount of the training set: the structured dense oil drop dataset amounted to 1000 pictures containing 43525 annotation instances. All labeling examples are manually labeled and repeatedly checked, and a data basis is provided for subsequent research. The detection precision of the trained model: a 92% detection accuracy was achieved on the dense oil drop dataset. Detection speed FPS:60fps.

Finally, it should be pointed out that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Those of ordinary skill in the art will appreciate that: the technical schemes described in the foregoing embodiments may be modified or some of the technical features may be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The method for detecting and counting the dense oil drop targets based on multi-scale feature coding is characterized by comprising the following steps of:

step 1, a monitoring device collects dense oil drop target video data;

step 2, identifying and positioning all oil drops in the monitoring video data through a dense oil drop target detection network model to obtain a boundary box of each target;

step 22a, performing enhancement processing on the feature weights W of step 21a to obtain an enhanced feature map F _w ；

Step 23a, conducting guiding treatment on the enhanced feature map in the step 22a to obtain a feature map of the dense oil drop target; wherein the directing process is adapted to focus attention to dense oil droplets in the image by an attention directing mechanism;

step 24a, detecting the feature map of the dense oil drop targets in step 23a by using regression and classification ideas to obtain a boundary box of each target;

step 21a specifically includes:

step 211, outputting a basic feature map F according to the dense oil drop target video data and the cross-stage local network, and performing multi-scale feature coding on the basic feature map F by three kinds of cavity convolution with different preset expansion rates by adopting the following formula (1) to obtain the multi-scale feature map F _o ；

F _o ＝V _a F+V _b F+V _c F (1)

Wherein V is _a Representing a convolution of a cavity with a preset expansion rate aConvolution operation, V _b Convolution operation representing hole convolution with preset expansion rate b, V _c A convolution operation representing a hole convolution with a preset expansion rate c;

step 212, calculating a multi-scale feature map F using the following equation (2) _o Feature weights W assigned to each scale feature:

W＝Softmax(Conv(F _o )) (2)

where Softmax (·) is the normalization function and Conv (·) is the convolution operation;

step 22a specifically includes: the multiscale feature map F is obtained by the following formula (3) _o After weighting each scale feature and its corresponding feature weight W, connecting with the residual error of the input feature map to obtain an enhanced feature map F _w ：

F _w ＝Scale(W，F _o )+F (3)

Where Scale (·) is the element-wise operation function.

2. A multi-scale feature encoding based dense oil droplet target detection counting method as claimed in claim 1, wherein a=3, b=5, c=7.

3. The method for detecting and counting dense oil drop targets based on multi-scale feature encoding according to any one of claims 1-2, wherein step 23a specifically comprises:

step 231, enhancing the feature map F of step 22a by the following formula (4) _w Mapping to different projection spaces by three 1×1 convolutions respectively, and mapping the characteristic F of one projection space _q Average division into N block subsets { F _q1 ，F _q2 ，...，F _qi ，...，F _qN }，F _qi Is F _q Is characterized by another projection space feature F _k Average division into N block subsets { F _k1 ，F _k2 ，...，F _ki ，...，F _kN }，F _ki Is F _k Is characterized by the i-th block subset of the projection space, and features F of the projection space _v Average division into N block subsets { F _v1 ，F _v2 ，...，F _vi ，...，F _vN }，F _vi Is F _v I=1, 2,..n;

step 232, take F respectively _q 、F _k 、F _v Block 1 subset feature F _q1 、F _k1 、F _v1 And will subset feature F _q1 、F _k1 、F _v1 Respectively corresponding to Query, key and Value in the self-attention mechanism, and then calculating an enhanced feature map F by the following formula (7) _w Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:

F ₁ ＝nonlocal{F _q1 ，F _k1 ，F _v1 } (7)

wherein F is ₁ Representing attention features in the self-attention mechanism, nonlocal representing self-attention operations in the self-attention mechanism;

step 233, repeating step 232 until N block subsets { F ₁ ，F _i ，...，F _N And then subset { F } ₁ ，F _i ，...，F _N And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.

4. The method for detecting and counting dense oil drop targets based on multi-scale feature encoding as claimed in claim 3, wherein the method for acquiring the dense oil drop target detection network model in step 2 comprises the following steps:

5. The utility model provides a dense oil drop target detection counting assembly based on multiscale feature code which characterized in that includes:

the monitoring device is used for collecting dense oil drop target video data;

the target bounding box unit specifically comprises:

the multi-scale feature coding module is used for carrying out enhancement processing on the feature weights W to obtain an enhanced feature map F _w ；

The attention guiding module is used for guiding the enhanced feature map of the multi-scale feature encoding module to obtain a feature map of the dense oil drop target; wherein the directing process is adapted to focus attention to dense oil droplets in the image by an attention directing mechanism;

the head prediction module is used for detecting the characteristic diagram of the dense oil drop targets of the attention guide module by using regression and classification ideas to obtain a boundary box of each target;

the feature encoding network specifically comprises:

the multi-scale feature map module is used for outputting a basic feature map F according to dense oil drop target video data and a cross-stage local network, and carrying out multi-scale feature coding on the basic feature map F by adopting the following formula (1) through three kinds of cavity convolutions with different preset expansion rates to obtain the multi-scale feature map F _o ；

F _o ＝V _a F+V _b F+V _c F (1)

Wherein V is _a Convolution operation representing a hole convolution of a preset expansion rate, V _b Convolution operation representing hole convolution with preset expansion rate b, V _c A convolution operation representing a hole convolution with a preset expansion rate c;

a weight distribution module for calculating a multi-scale feature map F using the following formula (2) _o Feature weights W assigned to each scale feature:

W＝Softmax(Conv(F _o )) (2)

the multi-scale feature encoding module specifically comprises: the multiscale feature map F is obtained by the following formula (3) _o After weighting each scale feature and its corresponding feature weight W, connecting with the residual error of the input feature map to obtain an enhanced feature map F _w ：

F _w ＝Scale(W，F _o )+F (3)

Where Scale (·) is the element-wise operation function.

6. The multi-scale feature encoding-based dense oil droplet target detection counting device of claim 5, further comprising:

the dense oil drop target detection network model acquisition module is used for acquiring the dense oil drop target detection network model by the following modes: