CN116740622B - Dense oil drop target detection counting method and device based on multi-scale feature coding - Google Patents

Dense oil drop target detection counting method and device based on multi-scale feature coding Download PDF

Info

Publication number
CN116740622B
CN116740622B CN202311027313.1A CN202311027313A CN116740622B CN 116740622 B CN116740622 B CN 116740622B CN 202311027313 A CN202311027313 A CN 202311027313A CN 116740622 B CN116740622 B CN 116740622B
Authority
CN
China
Prior art keywords
oil drop
feature
dense oil
feature map
drop target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311027313.1A
Other languages
Chinese (zh)
Other versions
CN116740622A (en
Inventor
王安东
刘畅
路峰
宋建彬
车纯广
高树润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yellow River Delta National Nature Reserve Management Committee
Beijing Information Science and Technology University
Original Assignee
Shandong Yellow River Delta National Nature Reserve Management Committee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yellow River Delta National Nature Reserve Management Committee, Beijing Information Science and Technology University filed Critical Shandong Yellow River Delta National Nature Reserve Management Committee
Priority to CN202311027313.1A priority Critical patent/CN116740622B/en
Publication of CN116740622A publication Critical patent/CN116740622A/en
Application granted granted Critical
Publication of CN116740622B publication Critical patent/CN116740622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/467Encoded features or binary features, e.g. local binary patterns [LBP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dense oil drop target detection counting method and device based on multi-scale feature coding. The method comprises the following steps: step 1, collecting dense oil drop target video data; step 2, identifying and positioning oil drops of video data to obtain a boundary box of each oil drop; the step 2 comprises the following steps: firstly, encoding multi-scale features of dense oil drop target video data to obtain feature weights; then, carrying out enhancement processing on the feature map to obtain an enhanced feature map; secondly, guiding the enhanced feature map to obtain a feature map of dense oil drops; and finally, detecting the feature map of the dense oil drop target by using regression and classification ideas to obtain the boundary box of each oil drop. The invention relates to the technical field of computer vision and the field of oil exploitation and processing, and solves the technical problems that in dense oil drop target detection, multiple forms exist and shielding is difficult to detect. The invention can improve the learning ability of the network to targets in different forms and can also enhance the recognition ability to dense oil drop shielding scenes.

Description

Dense oil drop target detection counting method and device based on multi-scale feature coding
Technical Field
The invention relates to the field of computer vision and the field of oil exploitation and processing, in particular to a dense oil drop target detection counting method and device based on multi-scale feature coding.
Background
The machine vision technology can be used for acquiring and analyzing parameters of oil-water two-phase and multiphase flow fluid and monitoring oil drops in the oil well exploitation process. Machine vision technology, one of the important branches of artificial intelligence technology, has now been tightly coupled with oil exploitation and well parameter measurement. There are some typical problems in dense oil drop target detection tasks, mainly including target information loss, low tolerance to boundary frame disturbances, and variable target oil drop morphology, so it is very difficult to detect dense oil drop targets in video. The invention applies image video recognition understanding to the field of petroleum exploitation and processing, and realizes detection and counting of multiple oil drops in an oil-water two-phase flow by utilizing a machine vision technology.
Disclosure of Invention
The invention aims to provide a method and a device for detecting and counting dense oil drop targets based on multi-scale feature coding, which can solve the technical problems that the dense oil drop targets are difficult to detect due to multi-morphology and shielding in detection.
In order to achieve the above object, the present invention provides a method and apparatus for detecting and counting dense oil drop targets based on multi-scale feature encoding, which includes:
step 1, a monitoring device collects dense oil drop target video data;
step 2, identifying and positioning all oil drops in the dense oil drop target video data through a dense oil drop target detection network model to obtain a boundary box of each target;
the method for identifying and positioning in the step 2 specifically comprises the following steps:
step 21a, performing multi-scale feature coding on the dense oil drop target video data to obtain feature weights W;
step 22a, performing enhancement processing on the feature weights W of step 21a to obtain an enhanced feature map
Step 23a, conducting guiding treatment on the enhanced feature map in the step 22a to obtain a feature map of the dense oil drop target;
and step 24a, detecting the characteristic diagram of the dense oil drop targets in step 23a by using regression and classification ideas to obtain a boundary box of each target.
Further, step 21a specifically includes:
step 211, outputting a basic feature map F according to the dense oil drop target video data and the cross-stage local network, and performing multi-scale feature coding on the basic feature map F by three kinds of cavity convolution with different preset expansion rates by adopting the following formula (1) to obtain a multi-scale feature map
(1)
In the method, in the process of the invention,convolution operation representing a convolution of a hole with a preset expansion rate a,>convolution operation representing a convolution of a hole with a preset expansion ratio b, < >>A convolution operation representing a hole convolution with a preset expansion rate c;
step 212, calculating a multi-scale feature map using the following equation (2)Feature weights W assigned to each scale feature:
(2)
in the method, in the process of the invention,for normalization function->Is a convolution operation;
step 22a specifically includes: the multiscale feature map is represented by the following formula (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>
(3)
In the method, in the process of the invention,is an element-wise operation function.
Further, a=3, b=5, c=7.
Further, step 23a specifically includes:
step 231, mapping the enhanced features of step 22a with the following formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average partitioningFor N block subset->,/>Is->Is>Block subset feature, feature of another projection space +.>Average partitioning into N block subsets,/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into N block subsets>,/>Is->Is>Block subset feature->
(4)
Step 232, respectively taking、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>Respectively corresponding to Query, key and Value in the self-attention mechanism, and then calculating an enhanced feature map by the following formula (7)Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:
(7)
in the method, in the process of the invention,representing the attention features in the self-attention mechanism, < >>Representing a self-attention operation in a self-attention mechanism;
step 233, repeating step 232 until N block subsets { are obtained,/>,…,/>-re-subset {>,/>,…,/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.
Further, the method for acquiring the dense oil drop target detection network model in the step 2 comprises the following steps:
step 21b, setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;
step 22b, constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;
step 23b, performing geometric transformation on the image in the dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.
The invention also provides a dense oil drop target detection counting device based on multi-scale feature coding, which comprises:
the monitoring device is used for collecting dense oil drop target video data;
the object boundary box unit is used for identifying and positioning all oil drops in the dense oil drop object video data through the dense oil drop object detection network model to obtain a boundary box of each object;
the target bounding box unit specifically comprises:
the feature coding network is used for carrying out multi-scale feature coding on the dense oil drop target video data to obtain feature weights W;
the multi-scale feature coding module is used for carrying out enhancement processing on the feature weights W to obtain an enhanced feature map
The attention guiding module is used for guiding the enhanced feature map of the multi-scale feature encoding module to obtain a feature map of the dense oil drop target;
and the head prediction module is used for detecting the characteristic diagram of the dense oil drop targets of the attention guide module by using regression and classification ideas to obtain a boundary box of each target.
Further, the feature encoding network specifically includes:
the multi-scale feature map module is used for outputting a basic feature map F according to dense oil drop target video data and a cross-stage local network, and carrying out multi-scale feature coding on the basic feature map F by adopting the following formula (1) through three kinds of cavity convolutions with different preset expansion rates to obtain the multi-scale feature map
(1)
In the method, in the process of the invention,convolution operation representing a convolution of a hole with a preset expansion rate a,>convolution operation representing a convolution of a hole with a preset expansion ratio b, < >>A convolution operation representing a hole convolution with a preset expansion rate c;
a weight distribution module for calculating a multi-scale feature map by the following formula (2)Feature weights W assigned to each scale feature:
(2)
in the method, in the process of the invention,for normalization function->Is a convolution operation;
the multi-scale feature encoding module specifically comprises: the multiscale feature map is represented by the following formula (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>
(3)
In the method, in the process of the invention,is an element-wise operation function.
Further, a=3, b=5, c=7.
Further, the attention directing module is characterized in that it specifically comprises:
a subset average molecular module for mapping the enhanced features of the multi-scale feature encoding module using the following formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average division into N block subsets>,/>Is->Is>Block subset feature, feature of another projection space +.>Average division into N block subsets>,/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into N block subsets>,/>Is->Is>The block subset feature is that,
(4)
small object feature focus sub-modules for taking separately、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>、/>Respectively correspond to the self-attention mechanismQuery, key, value, and then (7) calculates the enhancement feature map ++>Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:
(7)
in the method, in the process of the invention,representing the attention features in the self-attention mechanism, < >>Representing a self-attention operation in a self-attention mechanism;
a small target feature map focus submodule for obtaining N block subsets { at the small target feature focus submodule,/>,…,/>After }, subset {>,/>,…,/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.
Further, the method for acquiring the dense oil drop target detection network model comprises the following steps:
step 21b, setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;
step 22b, constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;
step 23b, performing geometric transformation on the image in the dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.
Due to the adoption of the technical scheme, the invention has the following advantages:
the method and the device can acquire information around the dense oil drops, including the internal, outline and peripheral information of the target, improve the learning ability of the network to targets in different forms, allocate different weights to the characteristic information, and enhance the recognition ability to the shielding scene of the dense oil drops.
Drawings
Fig. 1 is a schematic flow chart of a method and a device for detecting and counting dense oil drop targets based on multi-scale feature coding according to the embodiment of the invention.
Fig. 2 is a schematic diagram of a feature map for achieving a dense oil droplet target according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a target bounding box unit according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the method and the device for detecting and counting dense oil drop targets based on multi-scale feature coding provided by the embodiment of the invention include:
and step 1, the monitoring equipment collects the dense oil drop target video data.
In this embodiment, the object to be detected by the small object is oil droplets, where "small" is understood as that the scale of the oil droplet object in the image is smaller than 32×32 pixels. The platform selected in this embodiment is a combination of NVIDIA RTX 3090 and 24G RAM, and the operating system is ubuntu18.04, implemented under the deep learning framework Pytorch 1.7.0.
And 2, identifying and positioning all oil drops in the dense oil drop target video data through a dense oil drop target detection network model, and obtaining a boundary box of each target.
The method for identifying and positioning in the step 2 specifically comprises the following steps:
and 21a, performing multi-scale feature coding on the dense oil drop target video data to obtain feature weights W.
Step 22a, performing enhancement processing on the feature weights W of step 21a to obtain an enhanced feature map
And step 23a, conducting guiding processing on the enhanced feature map of the step 22a to obtain a feature map of the dense oil drop target. Wherein the "directing process" is a dense oil droplet that is adapted to focus attention on the image by means of an attention directing mechanism.
And step 24a, detecting the feature map of the dense oil drop targets in step 23a by using regression and classification ideas to obtain a boundary box of each target, and realizing positioning and classification. The concept of regression and classification refers to that the target detection task comprises a classification branch and a regression branch, wherein the classification branch is used for target identification, and the regression branch is used for target positioning.
In one embodiment, as a preferred implementation of step 21a, it specifically includes:
step 211, outputting a basic feature map F according to the dense oil drop target video data and a cross-stage local network (Cross Stage Paritial Network, abbreviated as CSP in english), wherein the height, width and channel of the basic feature map F are h, w and C respectively, and performing multi-scale feature encoding on the basic feature map F by three kinds of cavity convolution with different preset expansion rates by adopting the following formula (1), wherein the extracted features are as follows: small target internal feature, outline feature and peripheral informationPartial features, finally forming a multi-scale feature map
(1)
In the method, in the process of the invention,convolution operation representing a convolution of a hole with a preset expansion rate a,>convolution operation representing a convolution of a hole with a preset expansion ratio b, < >>A convolution operation representing a convolution of a hole with a preset expansion rate c.
Step 212, calculating a multi-scale feature map using the following equation (2)Feature weights W assigned to each scale feature:
(2)
in the method, in the process of the invention,for normalization function->Is a convolution operation.
In one embodiment, by setting the hole convolution expansion rates to a=3, b=5, c=7, respectively, samples can be taken at fixed steps (3, 5, 7) on the input signature, but the locations of the samples are not continuous, but rather, the samples are taken at intervals by skipping a number of pixels, which can help the model capture relevance and context information between pixels further away.
Of course, step 21a may be implemented by using a Mask segmentation network in Mask RCNN instead of the above embodiment, for example, on the basis of target positioning, a pixel level Mask of each target instance is generated, then the Mask segmentation network operates on the candidate frame, and an accurate segmentation Mask of the target is generated, so as to implement pixel level segmentation of the target, thereby accurately detecting the small target.
In one embodiment, step 22a may be implemented as follows:
the multiscale feature map is represented by the following formula (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>
(3)
In the method, in the process of the invention,is an element-wise operation function.
In practice, step 22a may also be implemented using, for example, a VGG16 convolutional network or other prior art technique.
As an implementation manner of step 23a, this embodiment adopts a self-attention structure, as shown in fig. 2, and the module guides the attention of the network to the small occlusion object, so as to enhance the recognition capability of the network model to the small occlusion scene object. Step 23a specifically includes:
step 231, mapping the enhanced features of step 22a with the following formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average division into N block subsets>,/>Is->Is>Block subset feature, feature of another projection space +.>Average partitioning into N block subsets,/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into N block subsets>,/>Is->Is>Block subset feature->
(4)
Step 232, respectively taking、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>Respectively corresponding to Query, key and Value in the self-attention mechanism, and then calculating an enhanced feature map by the following formula (7)Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:
(7)
in the method, in the process of the invention,representing the attention features in the self-attention mechanism, < >>Representing a self-attention operation in a self-attention mechanism.
Step 233, repeating step 232 until N block subsets { are obtained,/>,…,/>-re-subset {>,/>,…,/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.
Another implementation of step 23a may be implemented, for example, using an existing Convolutional Block Attention Module convolution attention mechanism module.
In one embodiment, the method for obtaining the dense oil drop target detection network model in step 2 includes:
step 21b, a Yolov5, a multi-scale feature encoding module and an attention guiding module are provided, wherein the Yolov5 comprises a feature encoding network and a head prediction module. Wherein Yolo 5 is the target detection model, the english full name of "Yolo" is "You only look once", the corresponding chinese full name is "you view once", and the chinese of "v5" means "fifth version".
In step 22b, a dense oil drop target dataset is constructed, which includes images in the dense oil drop target video data that have been annotated with all oil drops.
The method for labeling oil drops in the dense oil drop target video data adopts CC Leabeler software to label all the oil drops in the image. In addition, the data labeling form adopts a method of a boundary box, and each boundary box comprises four groups of data, namely, the horizontal and vertical coordinate values of the upper left corner and the lower right corner of the boundary box. To enhance the generalization performance of the model, a small target detection dataset may be constructed and data enhanced. The data enhancement includes multi-mode random adjustment pairs, such as random adjustment of small target number, picture brightness, contrast and saturation, to obtain a richer image dataset.
In step 23b, in order to enhance the detection performance of the dense oil drop target detection network model on the dense oil drop target, data enhancement is performed during network training, and the data covers more actual scenes and various morphological changes of the target as much as possible. For example: the geometric transformation is carried out on the image in the dense oil drop target video data marked with all oil drops, the morphological transformation of various targets caused by the shooting angle and the self position of the targets during actual shooting is simulated, and the geometric transformation such as random cutting, random overturning, random scale scaling, random perspective transformation and the like is carried out on the data. Meanwhile, through carrying out random blurring processing, adding random noise, saturation, contrast adjustment and other color transformations on images in the dense oil drop target video data marked with all oil drops, the actual shooting environment conditions, such as the time, weather, shooting equipment and the like of the actual shooting environment, are simulated.
In order to improve the performance of the dense oil drop target detection network model, an appropriate anchor frame can be further arranged so as to fit small targets in the dense oil drop target dataset. And (3) re-clustering the anchor frames in the target data set by using a Kmeans algorithm to obtain anchor frame parameters which are more suitable for the current dense oil drop target data set, thereby improving the accuracy and convergence speed of the dense oil drop target detection network model.
The embodiment of the invention also provides a dense oil drop target detection counting device based on multi-scale feature coding, which comprises monitoring equipment and a target boundary box unit, wherein: the monitoring equipment is used for collecting the dense oil drop target video data, and the target bounding box unit is used for identifying and positioning all oil drops in the dense oil drop target video data through the dense oil drop target detection network model to obtain the bounding box of each target.
The target bounding box unit specifically includes a feature encoding network, a multi-scale feature encoding module, an attention guiding module and a head predicting module, as shown in fig. 3, wherein:
the feature coding network is used for carrying out multi-scale feature coding on the dense oil drop target video data to obtain feature weights W.
The multi-scale feature coding module is used for carrying out enhancement processing on the feature weight W to obtain an enhanced feature map
The attention guiding module is used for guiding the enhanced feature map of the multi-scale feature encoding module to obtain a feature map of the dense oil drop target.
The head prediction module is used for detecting the feature map of the dense oil drop targets of the attention guide module by using regression and classification ideas to obtain a boundary box of each target and realize positioning and classification.
In one embodiment, the feature encoding network specifically includes a multi-scale feature map module and a weight distribution module, where:
the multi-scale feature map module is used for outputting a basic feature map F according to dense oil drop target video data and a cross-stage local network, wherein the height, width and channel of the basic feature map F are h, w and C respectively, and then the basic feature map F is subjected to multi-scale feature coding by adopting the formula (1) through three kinds of cavity convolutions with different preset expansion rates to obtain the multi-scale feature map
The weight distribution module is used for calculating a multi-scale characteristic diagram by using the formula (2)A feature weight W assigned to each scale feature.
The multi-scale feature encoding module specifically comprises: mapping the multiscale features by the method (3)After weighting each scale feature and its corresponding feature weight W, connecting with the residual of the input feature map to obtain enhanced feature map ++>
In one embodiment, the attention directing module specifically includes a subset average molecular module, a small target feature focus sub-module, and a small target feature map focus sub-module, wherein:
the subset average sub-module is used for mapping the enhancement features of the multi-scale feature encoding module by the above formula (4)Mapping to different projection spaces by three 1X 1 convolutions respectively, and mapping the characteristic of one projection space to the corresponding projection space>Average partitioning into N block subsets,/>Is->Is>Block subset feature, feature of another projection space +.>Average division into N block subsets>,/>Is->Is>Block subset feature, feature of a further projection space +.>Average division into->Block subset->,/>Is->Is>Block subset feature->
Small object feature focus submodule for taking separately、/>、/>Block 1 subset feature->、/>、/>And sub-set features->、/>、/>Respectively corresponding to Query, key and Value in the self-attention mechanism, and calculating enhancement feature map ++using the above formula (7)>Similarity between pixels and one of the N subsets of blocks, the process-intensive oil droplet targets are weighted adaptively from global.
The small target feature map focusing submodule is used for obtaining N block subsets { at the small target feature focusing submodule,/>,…,After }, subset {>,/>,…,/>And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.
The dense oil drop target detection counting device based on multi-scale feature coding in one embodiment further comprises a dense oil drop target detection network model acquisition module, wherein the dense oil drop target detection network model acquisition module is used for acquiring a dense oil drop target detection network model by the following steps:
setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;
constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;
the method comprises the steps of performing geometric transformation on images in dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.
When the oil drop detection is carried out, one frame of dense oil drop target video data is required to be sent into a trained and optimized dense oil drop target detection network model. The network performs multi-scale feature coding through a feature coding network and a multi-scale feature coding module, and outputs a feature map of a dense oil drop target through a attention guiding module. This feature map is then subject to object detection by the head prediction module using regression and classification concepts. Finally, the classification and positioning results of the small targets can be obtained through confidence threshold filtering. The process is repeated for each dense oil drop target detection network model in the test set until the target detection task of the whole test set is completed.
In the above embodiments, the data amount of the training set: the structured dense oil drop dataset amounted to 1000 pictures containing 43525 annotation instances. All labeling examples are manually labeled and repeatedly checked, and a data basis is provided for subsequent research. The detection precision of the trained model: a 92% detection accuracy was achieved on the dense oil drop dataset. Detection speed FPS:60fps.
Finally, it should be pointed out that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Those of ordinary skill in the art will appreciate that: the technical schemes described in the foregoing embodiments may be modified or some of the technical features may be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. The method for detecting and counting the dense oil drop targets based on multi-scale feature coding is characterized by comprising the following steps of:
step 1, a monitoring device collects dense oil drop target video data;
step 2, identifying and positioning all oil drops in the monitoring video data through a dense oil drop target detection network model to obtain a boundary box of each target;
the method for identifying and positioning in the step 2 specifically comprises the following steps:
step 21a, performing multi-scale feature coding on the dense oil drop target video data to obtain feature weights W;
step 22a, performing enhancement processing on the feature weights W of step 21a to obtain an enhanced feature map F w
Step 23a, conducting guiding treatment on the enhanced feature map in the step 22a to obtain a feature map of the dense oil drop target; wherein the directing process is adapted to focus attention to dense oil droplets in the image by an attention directing mechanism;
step 24a, detecting the feature map of the dense oil drop targets in step 23a by using regression and classification ideas to obtain a boundary box of each target;
step 21a specifically includes:
step 211, outputting a basic feature map F according to the dense oil drop target video data and the cross-stage local network, and performing multi-scale feature coding on the basic feature map F by three kinds of cavity convolution with different preset expansion rates by adopting the following formula (1) to obtain the multi-scale feature map F o
F o =V a F+V b F+V c F (1)
Wherein V is a Representing a convolution of a cavity with a preset expansion rate aConvolution operation, V b Convolution operation representing hole convolution with preset expansion rate b, V c A convolution operation representing a hole convolution with a preset expansion rate c;
step 212, calculating a multi-scale feature map F using the following equation (2) o Feature weights W assigned to each scale feature:
W=Softmax(Conv(F o )) (2)
where Softmax (·) is the normalization function and Conv (·) is the convolution operation;
step 22a specifically includes: the multiscale feature map F is obtained by the following formula (3) o After weighting each scale feature and its corresponding feature weight W, connecting with the residual error of the input feature map to obtain an enhanced feature map F w
F w =Scale(W,F o )+F (3)
Where Scale (·) is the element-wise operation function.
2. A multi-scale feature encoding based dense oil droplet target detection counting method as claimed in claim 1, wherein a=3, b=5, c=7.
3. The method for detecting and counting dense oil drop targets based on multi-scale feature encoding according to any one of claims 1-2, wherein step 23a specifically comprises:
step 231, enhancing the feature map F of step 22a by the following formula (4) w Mapping to different projection spaces by three 1×1 convolutions respectively, and mapping the characteristic F of one projection space q Average division into N block subsets { F q1 ,F q2 ,...,F qi ,...,F qN },F qi Is F q Is characterized by another projection space feature F k Average division into N block subsets { F k1 ,F k2 ,...,F ki ,...,F kN },F ki Is F k Is characterized by the i-th block subset of the projection space, and features F of the projection space v Average division into N block subsets { F v1 ,F v2 ,...,F vi ,...,F vN },F vi Is F v I=1, 2,..n;
step 232, take F respectively q 、F k 、F v Block 1 subset feature F q1 、F k1 、F v1 And will subset feature F q1 、F k1 、F v1 Respectively corresponding to Query, key and Value in the self-attention mechanism, and then calculating an enhanced feature map F by the following formula (7) w Similarity between pixels and one of the N subsets of blocks, processing dense oil drop targets are weighted adaptively from global:
F 1 =nonlocal{F q1 ,F k1 ,F v1 } (7)
wherein F is 1 Representing attention features in the self-attention mechanism, nonlocal representing self-attention operations in the self-attention mechanism;
step 233, repeating step 232 until N block subsets { F 1 ,F i ,...,F N And then subset { F } 1 ,F i ,...,F N And splicing the generated attention features correspondingly to obtain a feature map of the dense oil drop target.
4. The method for detecting and counting dense oil drop targets based on multi-scale feature encoding as claimed in claim 3, wherein the method for acquiring the dense oil drop target detection network model in step 2 comprises the following steps:
step 21b, setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;
step 22b, constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;
step 23b, performing geometric transformation on the image in the dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.
5. The utility model provides a dense oil drop target detection counting assembly based on multiscale feature code which characterized in that includes:
the monitoring device is used for collecting dense oil drop target video data;
the object boundary box unit is used for identifying and positioning all oil drops in the dense oil drop object video data through the dense oil drop object detection network model to obtain a boundary box of each object;
the target bounding box unit specifically comprises:
the feature coding network is used for carrying out multi-scale feature coding on the dense oil drop target video data to obtain feature weights W;
the multi-scale feature coding module is used for carrying out enhancement processing on the feature weights W to obtain an enhanced feature map F w
The attention guiding module is used for guiding the enhanced feature map of the multi-scale feature encoding module to obtain a feature map of the dense oil drop target; wherein the directing process is adapted to focus attention to dense oil droplets in the image by an attention directing mechanism;
the head prediction module is used for detecting the characteristic diagram of the dense oil drop targets of the attention guide module by using regression and classification ideas to obtain a boundary box of each target;
the feature encoding network specifically comprises:
the multi-scale feature map module is used for outputting a basic feature map F according to dense oil drop target video data and a cross-stage local network, and carrying out multi-scale feature coding on the basic feature map F by adopting the following formula (1) through three kinds of cavity convolutions with different preset expansion rates to obtain the multi-scale feature map F o
F o =V a F+V b F+V c F (1)
Wherein V is a Convolution operation representing a hole convolution of a preset expansion rate, V b Convolution operation representing hole convolution with preset expansion rate b, V c A convolution operation representing a hole convolution with a preset expansion rate c;
a weight distribution module for calculating a multi-scale feature map F using the following formula (2) o Feature weights W assigned to each scale feature:
W=Softmax(Conv(F o )) (2)
where Softmax (·) is the normalization function and Conv (·) is the convolution operation;
the multi-scale feature encoding module specifically comprises: the multiscale feature map F is obtained by the following formula (3) o After weighting each scale feature and its corresponding feature weight W, connecting with the residual error of the input feature map to obtain an enhanced feature map F w
F w =Scale(W,F o )+F (3)
Where Scale (·) is the element-wise operation function.
6. The multi-scale feature encoding-based dense oil droplet target detection counting device of claim 5, further comprising:
the dense oil drop target detection network model acquisition module is used for acquiring the dense oil drop target detection network model by the following modes:
setting a Yolov5, a multi-scale feature coding module and an attention guiding module, wherein the Yolov5 comprises a feature coding network and a head prediction module;
constructing a dense oil drop target data set, wherein the dense oil drop target data set comprises images in dense oil drop target video data marked with all oil drops;
the method comprises the steps of performing geometric transformation on images in dense oil drop target video data marked with all oil drops, and simulating morphological transformation of various targets caused by shooting angles and positions of the targets during actual shooting; and simultaneously, the color conversion is carried out on the image in the dense oil drop target video data marked with all oil drops, so that the actual shooting environment condition is simulated.
CN202311027313.1A 2023-08-16 2023-08-16 Dense oil drop target detection counting method and device based on multi-scale feature coding Active CN116740622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311027313.1A CN116740622B (en) 2023-08-16 2023-08-16 Dense oil drop target detection counting method and device based on multi-scale feature coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311027313.1A CN116740622B (en) 2023-08-16 2023-08-16 Dense oil drop target detection counting method and device based on multi-scale feature coding

Publications (2)

Publication Number Publication Date
CN116740622A CN116740622A (en) 2023-09-12
CN116740622B true CN116740622B (en) 2023-10-27

Family

ID=87906472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311027313.1A Active CN116740622B (en) 2023-08-16 2023-08-16 Dense oil drop target detection counting method and device based on multi-scale feature coding

Country Status (1)

Country Link
CN (1) CN116740622B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429466A (en) * 2020-03-19 2020-07-17 北京航空航天大学 Space-based crowd counting and density estimation method based on multi-scale information fusion network
CN115272828A (en) * 2022-08-11 2022-11-01 河南省农业科学院农业经济与信息研究所 Intensive target detection model training method based on attention mechanism
CN116168240A (en) * 2023-01-19 2023-05-26 西安电子科技大学 Arbitrary-direction dense ship target detection method based on attention enhancement
CN116563726A (en) * 2023-05-08 2023-08-08 大连海事大学 Remote sensing image ship target detection method based on convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429466A (en) * 2020-03-19 2020-07-17 北京航空航天大学 Space-based crowd counting and density estimation method based on multi-scale information fusion network
CN115272828A (en) * 2022-08-11 2022-11-01 河南省农业科学院农业经济与信息研究所 Intensive target detection model training method based on attention mechanism
CN116168240A (en) * 2023-01-19 2023-05-26 西安电子科技大学 Arbitrary-direction dense ship target detection method based on attention enhancement
CN116563726A (en) * 2023-05-08 2023-08-08 大连海事大学 Remote sensing image ship target detection method based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Small-Size Target Detection in Remotely Sensed Image Using Improved Multi-Scale Features and Attention Mechanism";HU ZHAO etc.;《IEEE Access》;第56703-56710页 *

Also Published As

Publication number Publication date
CN116740622A (en) 2023-09-12

Similar Documents

Publication Publication Date Title
Xu et al. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark
Wen et al. Detection, tracking, and counting meets drones in crowds: A benchmark
CN110598610B (en) Target significance detection method based on neural selection attention
CN109644255B (en) Method and apparatus for annotating a video stream comprising a set of frames
CN111369581A (en) Image processing method, device, equipment and storage medium
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN110689000B (en) Vehicle license plate recognition method based on license plate sample generated in complex environment
CN109522963A (en) A kind of the feature building object detection method and system of single-unit operation
Liao et al. Unsupervised foggy scene understanding via self spatial-temporal label diffusion
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
Wang et al. Sddet: An enhanced encoder–decoder network with hierarchical supervision for surface defect detection
CN113660484B (en) Audio and video attribute comparison method, system, terminal and medium based on audio and video content
Xi et al. Implicit motion-compensated network for unsupervised video object segmentation
CN116740622B (en) Dense oil drop target detection counting method and device based on multi-scale feature coding
CN116721288A (en) Helmet detection method and system based on YOLOv5
CN110738149A (en) Target tracking method, terminal and storage medium
CN110807416A (en) Digital instrument intelligent recognition device and method suitable for mobile detection device
CN112380970B (en) Video target detection method based on local area search
CN111583341B (en) Cloud deck camera shift detection method
CN114743257A (en) Method for detecting and identifying image target behaviors
CN111986233A (en) Large-scene minimum target remote sensing video tracking method based on feature self-learning
CN111353509A (en) Key point extractor generation method of visual SLAM system
Nie et al. LFC-SSD: Multiscale aircraft detection based on local feature correlation
CN116612537B (en) Semi-supervised action detection method based on background weakening and consistency calculation
CN117522925B (en) Method and system for judging object motion state in mobile camera under attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant