CN112733886A - Sample image processing method, device, equipment and storage medium - Google Patents

Sample image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN112733886A
CN112733886A CN202011543738.4A CN202011543738A CN112733886A CN 112733886 A CN112733886 A CN 112733886A CN 202011543738 A CN202011543738 A CN 202011543738A CN 112733886 A CN112733886 A CN 112733886A
Authority
CN
China
Prior art keywords
target
image
enhanced
feature
target image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011543738.4A
Other languages
Chinese (zh)
Inventor
聂泳忠
杨素伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiren Ma Diyan Beijing Technology Co ltd
Original Assignee
Xiren Ma Diyan Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiren Ma Diyan Beijing Technology Co ltd filed Critical Xiren Ma Diyan Beijing Technology Co ltd
Priority to CN202011543738.4A priority Critical patent/CN112733886A/en
Publication of CN112733886A publication Critical patent/CN112733886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a sample image processing method, a sample image processing device, sample image processing equipment and a storage medium. Firstly, acquiring a plurality of first target frames of a sample image, wherein the first target frames comprise first target images, and the first target images comprise channel features and spatial features; then, a channel feature calculation module and a spatial feature calculation module which are included in a preset neural network are used for calculating a first target image included in each first target frame to obtain a feature-enhanced target image set; and finally, adding the feature-enhanced target image set into the sample image to obtain an enhanced image. The embodiment of the invention solves the problem that the subsequent detection model can not accurately detect the target image due to poor target image lifting effect in the existing sample image processing scheme, enhances the characteristics of the target image in the sample image and ensures that the subsequent detection model can accurately detect the target image.

Description

Sample image processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to a method, an apparatus, a device, and a storage medium for processing a sample image.
Background
Currently, for the safety of the automatic driving vehicle, in order to acquire more comprehensive characteristic information in a high-definition image, it is important to detect a target image from the high-definition image.
In order to detect a target image in a high-definition image, an image enhancement method is generally adopted at present.
However, in the current image enhancement method, the positive example is supplemented by using the difficult negative example as compensation, and only the original data set is used, so that no obvious improvement effect exists on a small target image.
In addition, the characteristics of different scales are fused, so that the receptive field is enlarged, and meanwhile, the characteristics of multiple scales are fused, so that a small target image is enhanced. The boosting effect is limited because the problem is not addressed from the enhancement data sample set itself.
Therefore, in the existing sample image processing scheme, there is a problem that the target image cannot be accurately detected by the subsequent detection model due to poor target image improvement effect.
Disclosure of Invention
The embodiment of the invention provides a sample image processing method, a sample image processing device, sample image processing equipment and a sample image storage medium, which solve the problem that a subsequent detection model cannot accurately detect a target image due to poor target image lifting effect in the existing sample image processing scheme, and enhance the characteristics of the target image in the sample image, so that the subsequent detection model can accurately detect the target image.
In order to solve the technical problems, the invention comprises the following steps:
in a first aspect, a method for processing a sample image is provided, the method including:
acquiring a plurality of first target frames of a sample image, wherein the first target frames comprise first target images, and the first target images comprise channel features and spatial features;
calculating a first target image included by each first target frame by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network to obtain a feature-enhanced target image set;
and adding the target image set with the enhanced features into the sample image to obtain an enhanced image.
In some implementation manners of the first aspect, calculating a first target image included in each first target frame by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network to obtain a feature-enhanced target image set, including:
calculating each channel feature of each first target image by using a channel feature calculation module to obtain the channel weight of each first target image;
calculating each pixel feature of each first target image by using a spatial feature calculation module to obtain the pixel weight of each first target image;
and determining an enhanced target image set according to each first target image and the channel weight and the pixel weight corresponding to the first target image.
In some implementations of the first aspect, obtaining a plurality of first target boxes of the sample image includes:
acquiring a plurality of labeling frames of a sample image, wherein the labeling frames comprise category information of the labeling frames;
and determining a plurality of first target frames according to the category information of the plurality of labeling frames.
In some implementations of the first aspect, adding the feature-enhanced target image set to the sample image, resulting in an enhanced image, includes:
acquiring first region positions of a plurality of labeling frames in a sample image;
and adding the feature-enhanced target image set to the target region position except the first region position in the sample image to obtain an enhanced sample image.
In some implementations of the first aspect, the method further comprises:
determining at least one labeling frame of which the number of the labeling frames corresponding to the category information is less than a threshold value as a second target frame, wherein the second target frame comprises a second target image;
adding the feature-enhanced target image set to target region positions in the sample image except the first region position to obtain an enhanced image, including:
and adding the second target image included by the second target frame and the feature-enhanced target image set to the target area position except the first area position in the sample image to obtain an enhanced image.
In some implementations of the first aspect, adding the second target image included in the second target frame and the feature-enhanced target image set to a target region position other than the first region position in the sample image to obtain an enhanced image includes:
at least one of zooming and angle rotating is carried out on a second target image included by the second target frame and the target image set after the characteristic enhancement;
and adding the processed target image set and the second target image to the target area position except the first area position in the sample image to obtain an enhanced image.
In some implementations of the first aspect, a first ratio of the number of the first target frames to the total number of the labeled frames and a second ratio of the number of the labeled frames except the first target frames and the second target frames to the total number of the labeled frames satisfy a preset condition.
In a second aspect, there is provided an apparatus for processing a sample image, the apparatus comprising:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of first target frames of a sample image, the first target frames comprise first target images, and the first target images comprise channel features and space features;
the processing module is used for calculating the first target image included by each first target frame by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network to obtain a feature-enhanced target image set;
and the processing module is also used for adding the feature-enhanced target image set to the sample image to obtain an enhanced image.
In some implementation manners of the second aspect, the processing module is further configured to calculate each channel feature of each first target image by using the channel feature calculation module to obtain a channel weight of each first target image; calculating each pixel feature of each first target image by using a spatial feature calculation module to obtain the pixel weight of each first target image; and determining an enhanced target image set according to each first target image and the channel weight and the pixel weight corresponding to the first target image.
In some implementations of the second aspect, the obtaining module is further configured to obtain a plurality of labeling boxes of the sample image, where the labeling boxes include category information of the labeling boxes; and determining a plurality of first target frames according to the category information of the plurality of labeling frames.
In some implementations of the second aspect, the processing module is further configured to obtain a first region position of the plurality of annotation boxes in the sample image; and adding the feature-enhanced target image set to the target region position except the first region position in the sample image to obtain an enhanced sample image.
In some implementations of the second aspect, the processing module is further configured to determine, as the second target frame, at least one of the annotation frames corresponding to the category information, in which the number of the annotation frames is less than the threshold, where the second target frame includes a second target image; and adding the second target image included by the second target frame and the feature-enhanced target image set to the target area position except the first area position in the sample image to obtain an enhanced image.
In some implementations of the second aspect, the processing module is further configured to perform at least one of scaling and angular rotation on the second target image included in the second target frame and the feature-enhanced target image set; and adding the processed target image set and the second target image to the target area position except the first area position in the sample image to obtain an enhanced image.
In some implementations of the second aspect, a first ratio of the number of the first target frames to the total number of the labeled frames and a second ratio of the number of the labeled frames except the first target frames and the second target frames to the total number of the labeled frames satisfy a preset condition.
In a third aspect, an electronic device is provided, the device comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the first aspect and the method of processing a sample image in some implementations of the first aspect.
In a fourth aspect, a computer storage medium is provided, wherein the computer storage medium has stored thereon computer program instructions, which when executed by a processor, implement the first aspect and the method of processing a sample image in some implementations of the first aspect.
The embodiment of the invention provides a sample image processing method, a sample image processing device, sample image processing equipment and a storage medium. The method comprises the steps of firstly obtaining a plurality of first target frames of a sample image, wherein the first target frames comprise first target images, and the first target images comprise channel features and space features; then, a channel feature calculation module and a spatial feature calculation module which are included in a preset neural network are used for calculating a first target image included in each first target frame to obtain a feature-enhanced target image set; and finally, adding the feature-enhanced target image set into the sample image to obtain an enhanced image. Because the preset channel characteristic calculation module and the preset spatial characteristic calculation module in the neural network can respectively calculate the first target image to obtain the target image set after characteristic enhancement and further obtain the enhanced image, the problem that a subsequent detection model cannot accurately detect the target image due to poor target image lifting effect in the existing sample image processing scheme is solved, the characteristics of the target image in the sample image are enhanced, and the subsequent detection model can accurately detect the target image.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a sample image processing method according to an embodiment of the present invention;
fig. 2 is a schematic process diagram of a preset neural network calculating a first target image included in a first target frame to obtain a feature-enhanced target image according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an internal structure of a feature fusion memory module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an internal structure of a channel and spatial attention unit residual block according to an embodiment of the present invention;
FIG. 5 is a matrix characterization diagram of a first target image according to an embodiment of the present invention;
FIG. 6 is a channel feature matrix provided by an embodiment of the present invention;
FIG. 7 is a spatial feature matrix provided by an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a sample image processing apparatus according to an embodiment of the present invention;
fig. 9 is a block diagram of a computing device provided by an embodiment of the invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For the safety of an autonomous vehicle, it is important to detect a small target, a long-distance target, and a rare target from a high-definition image. Due to the factors such as the structure of the neural network, the performance of the machine, the diversity of sample collection and the like, the initial values of the sizes of certain sample targets substituted into the neural network may have only a few pixels or even one pixel, and meanwhile, the problems that the number of certain sample targets is too small, so that the training is not good enough due to unbalance of positive and negative sample targets and the like exist. Therefore, it is important to perform preprocessing on the image with respect to the data related to the small object and the rare object.
In the field of automatic driving visual perception, the problems of small target detection and unbalance of positive and negative samples are always the hot points of research of researchers. Currently, there are generally several processing angles and methods:
one, difficult negative sample mining. As the number of group pixel in the picture samples is less, the number of positive samples is less, the number of negative samples is larger than that of the positive samples, in order to improve the classification capability and avoid the situation that the predicted value of the neural network is gathered to the negative samples when a small number of the predicted values obey the majority, and the number of the positive samples and the negative samples is about 1: 3. Since there are few positive samples and the similarity between the false positive negative samples and the positive samples is high, the difficult samples that are easily divided into positive samples in the negative samples are selected for the complementary resampling. Therefore, the purpose of relieving the imbalance of the positive sample and the negative sample is achieved.
And secondly, based on data augmentation of the small target and the rare target, the small target and the rare target are cut and then pasted to the original image at random, so that positive samples are increased, and the purpose of relieving the imbalance of the positive samples and the negative samples is achieved.
And thirdly, fusing the image pyramid and the features. The image size is zoomed and injected into neural network training, multi-scale characteristic training or the shallow output characteristic diagram and the deep output characteristic diagram of the neural network are fused, and the characteristic diagram of the fused shallow network can compensate the characteristics of a large receptive field due to the large receptive field mapped by the shallow network, and meanwhile, the problem of losing shallow characteristic information is relieved along with the deepening of the network level.
And fourthly, improving the small target detection rate by using a GAN generation type antagonistic neural network. In the generative countermeasure network, a generator converts a low-resolution representation of a small object into a high-resolution representation, and a discriminator and a generator distinguish characteristics in a competitive mode, so that a low-resolution small target is closer to a high-resolution large target real sample. The GAN excavates structural association between objects with different dimensions, and improves the feature representation of small objects to make the small objects similar to large objects.
However, the four methods described above have their own sidedness and limitations. Firstly, in the first method, a difficult negative sample is used as compensation to supplement a positive sample, only the original data set is used, the imbalance problem of the positive sample and the negative sample is relieved to a certain extent, and the small target frame and the rare target frame are not obviously improved. In the second method, the small target and the rare target are cut and then are directly and randomly pasted to the original image, so that the original marked target can be shielded, and simultaneously, the pasted edge is not smooth enough, so that the image is not real. And in the third method, the features of different scales are fused, so that the receptive field is enlarged, and the features of multiple scales are fused, so that the detection capability of the model on small targets is enhanced, but the problem is not solved by enhancing the data sample set, the improvement effect is limited, and the detection on the target frame with small pixels cannot be realized. And in the fourth method, the small target frame and the large target frame are filled into the GAN neural network, and the generator generates the high-resolution target frame with the resolution similar to that of the large target frame, so that the time and space complexity is increased, and the efficiency is lower under the condition of large data set.
In summary, in the existing sample image processing scheme, there is a problem that the subsequent detection model cannot accurately detect the target image due to the poor target image enhancement effect.
In order to solve the problem that a target image cannot be accurately detected by a subsequent detection model due to poor target image lifting effect in the current technical scheme, the embodiment of the invention provides a sample image processing method, a sample image processing device, sample image processing equipment and a storage medium. The method comprises the steps of firstly obtaining a plurality of first target frames of a sample image, wherein the first target frames comprise first target images, and the first target images comprise channel features and space features; then, a channel feature calculation module and a spatial feature calculation module which are included in a preset neural network are used for calculating a first target image included in each first target frame to obtain a feature-enhanced target image set; and finally, adding the feature-enhanced target image set into the sample image to obtain an enhanced image. Because the preset channel characteristic calculation module and the preset spatial characteristic calculation module in the neural network can respectively calculate the first target image to obtain the target image set after characteristic enhancement and further obtain the enhanced image, the problem that a subsequent detection model cannot accurately detect the target image due to poor target image lifting effect in the existing sample image processing scheme is solved, the characteristics of the target image in the sample image are enhanced, and the subsequent detection model can accurately detect the target image.
The technical solutions provided by the embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a sample image processing method according to an embodiment of the present invention. The execution subject of the method may be a terminal device.
As shown in fig. 1, the method for processing the sample image may include:
s101: a plurality of first target frames of the sample image are acquired, wherein the first target frames include a first target image including a channel feature and a spatial feature.
In one embodiment, the specific process of acquiring the plurality of first target frames of the sample image may be acquiring a plurality of label frames of the sample image, where the label frames include category information of the label frames; and then determining a plurality of first target frames according to the category information of the plurality of label frames.
The first target frame may be a labeling frame smaller than the minimum size that can be processed in the real frames group route of the sample image determined in advance according to the processing performance of the terminal device and the input size of the preset neural network, and therefore the first target frame may specifically refer to a small target frame.
After obtaining the plurality of first target frames, the process proceeds to the calculation process of the plurality of first target frames, i.e., S102.
S102: and calculating the first target image included by each first target frame by using a channel characteristic calculation module and a spatial characteristic calculation module included in a preset neural network to obtain a characteristic-enhanced target image set.
Specifically, in the process, a channel feature calculation module may be used to calculate each channel feature of each first target image to obtain a channel weight of each first target image; then, calculating each pixel feature of each first target image by using a spatial feature calculation module to obtain the pixel weight of each first target image; and finally, determining an enhanced target image set according to each first target image and the channel weight and the pixel weight corresponding to the first target image.
In an embodiment, a preset neural network may be used to calculate a first target image included in a first target frame, so as to obtain a feature-enhanced target image.
Fig. 2 is a schematic diagram illustrating a process of calculating a first target image included in a first target frame by using a preset neural network to obtain a feature-enhanced target image.
As shown in fig. 2, a series of feature fusion memory modules in the preset neural network are cascaded together to form a DenseNet to convert the low-resolution features into the high-information-content features, that is, the first target image included in the first target frame with low resolution, i.e., the car image indicated by the arrow 1, is converted into the high-information-content target image, i.e., the car image indicated by the arrow 2.
The automobile picture indicated by the arrow 1 is firstly calculated by two layers of convolution layers (conv), then enters a DenseNet formed by a plurality of feature fusion memory modules for calculation, and finally is calculated by the convolution layers and an upper sampling layer (upsamplle) to obtain a target image with high information content, namely the automobile picture indicated by the arrow 2.
It should be noted that the preset neural network can also be replaced by a residual module based on feature fusion and attention mechanism and all its variants.
Fig. 3 shows an internal structural diagram of the feature fusion memory module.
As shown in FIG. 3, inside each feature fusion memory module, a plurality of channel and spatial attention unit residual blocks and a gated fusion node for retaining long-term information are integrated. The gated fusion node is composed of a concat function layer (connection string) and a 1 × 1 convolution layer (conv).
Specifically, the channel and space attention unit residual block is used for weighting the features in two directions of the channel and the space, emphasizing more important features and neglecting unimportant features so as to realize the conversion of low-resolution features into features with high information content.
Fig. 4 shows a schematic diagram of the internal structure of the channel and spatial attention cell residual block.
As shown in fig. 4, the inside of the channel and spatial attention Unit residual block includes a channel feature calculation module (CA Unit), a spatial feature calculation module (SA Unit), a convolution layer (conv), and a Linear rectification function (rectiu). The channel feature calculation module (CA Unit) includes a channel attention mechanism (channel attention), and the spatial feature calculation module (SA Unit) includes a spatial attention mechanism (spatial attention).
In one embodiment, the channel feature calculation module may calculate each channel feature of the first target image to obtain a channel weight of the first target image.
Specifically, as shown in fig. 5, a matrix feature map of which the first target image is H × W × C may be taken as an example, where H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map.
The channel feature calculation module performs weighting in the channel direction on the first target image, that is, each channel learns different weights, and scores each channel feature of the first target image, and the weights in the H × W plane dimension are the same, that is, the processing process of the channel feature calculation module focuses on different features of the picture and does not focus on specific positions, and the finally obtained channel feature matrix is 1 × 1C, as shown in fig. 6.
In one embodiment, the spatial feature calculation module may calculate each pixel feature of the first target image to obtain a pixel weight of the first target image.
Specifically, taking the matrix feature map shown in fig. 5 as an example, the spatial feature calculation module performs weighting on the first target image in the spatial direction, that is, each pixel of the feature map with H × W size learns a weight, and the weights of C channels are the same, that is, the processing procedure of the spatial feature calculation module focuses on where the features of different regions of the picture are, and the resulting spatial feature matrix is H × W1, as shown in fig. 7.
The above-described computation may then be performed for each first target image using the above-described channel feature computation module and spatial feature computation module, and an enhanced set of target images may then be determined based on each first target image and the channel weights and pixel weights corresponding to each first target image.
In addition, it should be noted that the value of each pixel point in the enhanced target image set is obtained by performing weighted average on the pixel point itself and other non-enhanced pixel values in the neighborhood according to the channel weight and the pixel weight.
Because the channel feature calculation module and the spatial feature calculation module can integrate related features in all channel maps and selectively emphasize the channel maps with the deep layers and the shallow layers correlated with each other, the deep semantics can help the attention unit to find useful information existing in a shallow network, so that the image features of small target frames are better learned, and the resolution is higher improved.
S103: and adding the target image set with the enhanced features into the sample image to obtain an enhanced image.
When the enhanced target image set is added to the sample image, the enhanced target image set may be added to a region of the sample image without the labeling frame so as not to overlap with the existing target. That is, a first region position of a plurality of annotation frames in the sample image may be obtained; and then adding the feature-enhanced target image set to the target region position except the first region position in the sample image to obtain an enhanced sample image.
In addition, in one embodiment, the image included in the annotation box to other categories in the sample image is also enhanced. And determining a second target frame according to the acquired category information of the plurality of label frames, wherein the second target frame comprises a second target image. The second target frame is specifically a labeling frame whose number is less than a threshold, and thus the second target frame may be a rare target frame. Then, the second target image included in the second target frame and the feature-enhanced target image set can be subjected to linear transformation image augmentation processing such as size scaling and angle rotation at random; and adding the processed target image set and the second target image to the target area position except the first area position in the sample image to obtain an enhanced image. Therefore, in the enhanced image of the process, the images included in both the small target frame and the rare target frame are enhanced.
According to the embodiment of the invention, from the data of the sample image, the resolution enhancement and pasting expansion are carried out on the first target frame in the sample image, and the pasting expansion is carried out on the second target frame in the sample image, so that the problem of difficulty in detecting the small target frame and the rare target frame is solved.
In addition, in one embodiment, in order to ensure balance between the first target frame and the second target frame in the sample image, that is, to ensure balanced positive and negative sample, a first ratio of the number of the first target frames to the total number of the labeling frames and a second ratio of the number of the labeling frames except for the first target frame and the second target frame to the total number of the labeling frames need to satisfy a preset condition. The preset condition may mean that the difference between the first ratio and the second ratio is less than a threshold value.
Specifically, in order to satisfy the preset condition described above, the difference between the first target frame and the second target frame in the sample image that do not satisfy the preset condition described above and when the preset condition is satisfied may be calculated by formula (1).
Figure BDA0002855236180000111
G is the difference value between the first target frame or the second target frame which does not meet the preset condition and the average number of all types of marking frames except the first target frame and the second target frame in the marking frames; s+iRepresenting the number of first target frames which do not meet the preset condition or the number of second target frames which do not meet the preset condition in the sample image;
Figure BDA0002855236180000112
and the average number of all the types of the labeling boxes except the first target box and the second target box in the labeling boxes is represented.
Further, in one embodiment, to ensure that edges of the processed target image set and the second target image are sufficiently smooth when the processed target image set and the second target image are added to target region locations other than the first region location in the sample image, a gaussian filter smoothing process may be performed on image edges of the processed target image set and the second target image. In addition, a threshold distance of a preset number of pixels may be set at a distance from an edge of the image to improve detection accuracy of a small target frame and a rare target frame and a Maximum a posteriori (Map) value. Optionally, the preset number may be 5, or may be adjusted according to an actual situation, and the gaussian filtering smoothing process may be replaced by other corresponding fuzzy filtering.
It should be further noted that the embodiment of the present invention may be applied to detection fitting of the target frame in any 2D or 3D scene. Such as automatic driving, image target recognition, industrial defect detection, face recognition, security monitoring, intelligent traffic and other scenes.
According to the sample image processing method provided by the embodiment of the invention, a plurality of first target frames of a sample image are firstly obtained, wherein the first target frames comprise first target images, and the first target images comprise channel features and space features; then, a channel feature calculation module and a spatial feature calculation module which are included in a preset neural network are used for calculating a first target image included in each first target frame to obtain a feature-enhanced target image set; and finally, adding the feature-enhanced target image set into the sample image to obtain an enhanced image. Because the channel feature calculation module and the spatial feature calculation module in the preset neural network can respectively calculate the first target image to obtain a feature-enhanced target image set, and further obtain an enhanced image. The embodiment of the invention starts from the data of the sample image, performs high-resolution enhancement training and smooth pasting expansion on the data of the sample image based on the channel characteristic calculation module and the spatial characteristic calculation module, improves the resolution of a small target and the space of a rare target sample, reduces the time and space complexity, avoids repeated redundant and useless oversampling, ensures the diversity and effectiveness of sample data, solves the problems of difficult detection of the rare target frame of the small target frame and unbalanced positive and negative sample, and enables a subsequent detection model to accurately detect the target image.
Corresponding to the flow diagram of the sample image processing method in fig. 1, the embodiment of the present invention further provides a schematic structural diagram of a sample image processing apparatus.
Fig. 8 is a schematic structural diagram of a sample image processing apparatus according to an embodiment of the present invention. As shown in fig. 8, the processing means of the sample image may include:
the obtaining module 801 may be configured to obtain a plurality of first target frames of the sample image, where the first target frames include a first target image, and the first target image includes a channel feature and a spatial feature.
The processing module 802 may be configured to calculate, by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network, a first target image included in each first target frame to obtain a feature-enhanced target image set.
The processing module 802 may further be configured to add the feature-enhanced target image set to the sample image, so as to obtain an enhanced image.
The processing module 802 may also be configured to calculate each channel feature of each first target image by using a channel feature calculation module, so as to obtain a channel weight of each first target image; calculating each pixel feature of each first target image by using a spatial feature calculation module to obtain the pixel weight of each first target image; and determining an enhanced target image set according to each first target image and the channel weight and the pixel weight corresponding to the first target image.
The obtaining module 801 may further be configured to obtain a plurality of labeling boxes of the sample image, where each labeling box includes category information of the labeling box; and determining a plurality of first target frames according to the category information of the plurality of labeling frames.
The processing module 802 may be further configured to obtain first region positions of a plurality of annotation frames in the sample image; and adding the feature-enhanced target image set to the target region position except the first region position in the sample image to obtain an enhanced sample image.
The processing module 802 may be further configured to determine a second target frame according to the category information of the plurality of labeled frames, where the second target frame includes a second target image; and adding the second target image included by the second target frame and the feature-enhanced target image set to the target area position except the first area position in the sample image to obtain an enhanced image.
The processing module 802 may further be configured to perform at least one of scaling and angular rotation on the second target image included in the second target frame and the feature-enhanced target image set; and adding the processed target image set and the second target image to the target area position except the first area position in the sample image to obtain an enhanced image.
In one embodiment, a first ratio of the number of the first target frames to the total number of the labeled frames and a second ratio of the number of the labeled frames except the first target frames and the second target frames to the total number of the labeled frames satisfy a preset condition.
It can be understood that each module in the sample image processing apparatus shown in fig. 8 has a function of implementing each step in fig. 1, and can achieve the corresponding technical effect, and for brevity, no further description is provided herein.
According to the processing device of the sample image, provided by the embodiment of the invention, a plurality of first target frames of the sample image are firstly obtained, wherein the first target frames comprise first target images, and the first target images comprise channel features and space features; then, a channel feature calculation module and a spatial feature calculation module which are included in a preset neural network are used for calculating a first target image included in each first target frame to obtain a feature-enhanced target image set; and finally, adding the feature-enhanced target image set into the sample image to obtain an enhanced image. Because the channel feature calculation module and the spatial feature calculation module in the preset neural network can respectively calculate the first target image to obtain a feature-enhanced target image set, and further obtain an enhanced image. The embodiment of the invention starts from the data of the sample image, performs high-resolution enhancement training and smooth pasting expansion on the data of the sample image based on the channel characteristic calculation module and the spatial characteristic calculation module, improves the resolution of a small target and the space of a rare target sample, reduces the time and space complexity, avoids repeated redundant and useless oversampling, ensures the diversity and effectiveness of sample data, solves the problems of difficult detection of the rare target frame of the small target frame and unbalanced positive and negative sample, and enables a subsequent detection model to accurately detect the target image.
Fig. 9 is a block diagram of a hardware architecture of a computing device according to an embodiment of the present invention. As shown in fig. 9, computing device 900 includes an input device 901, an input interface 902, a central processor 903, a memory 904, an output interface 905, and an output device 906. The input interface 902, the central processing unit 903, the memory 904, and the output interface 905 are connected to each other through a bus 910, and the input device 901 and the output device 908 are connected to the bus 910 through the input interface 902 and the output interface 905, respectively, and further connected to other components of the computing device 900.
Specifically, the input device 901 receives input information from the outside, and transmits the input information to the central processor 903 through the input interface 902; central processor 903 processes input information based on computer-executable instructions stored in memory 904 to generate output information, stores the output information temporarily or permanently in memory 904, and then transmits the output information to output device 906 via output interface 905; output device 906 outputs the output information external to computing device 900 for use by a user.
That is, the computing device shown in fig. 9 may also be implemented as a sample image processing device that may include: a processor and a memory storing computer executable instructions; the processor, when executing the computer-executable instructions, may implement the method for processing the sample image provided by the embodiments of the present invention.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the method for processing a sample image provided by the embodiments of the present invention.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, Erasable Read-Only memories (EROMs), floppy disks, Compact disk Read-Only memories (CD-ROMs), optical disks, hard disks, optical fiber media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (11)

1. A method for processing a sample image, the method comprising:
obtaining a plurality of first target frames of a sample image, wherein the first target frames comprise first target images, and the first target images comprise channel features and spatial features;
calculating a first target image included in each first target frame by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network to obtain a feature-enhanced target image set;
and adding the feature-enhanced target image set to the sample image to obtain an enhanced image.
2. The method according to claim 1, wherein the calculating the first target image included in each first target frame by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network to obtain a feature-enhanced target image set includes:
calculating each channel feature of each first target image by using the channel feature calculation module to obtain the channel weight of each first target image;
calculating each pixel feature of each first target image by using the spatial feature calculation module to obtain the pixel weight of each first target image;
determining an enhanced target image set according to each of the first target images and the channel weights and the pixel weights corresponding to the first target images.
3. The method of claim 1, wherein said obtaining a plurality of first target frames of a sample image comprises:
acquiring a plurality of labeling frames of a sample image, wherein the labeling frames comprise category information of the labeling frames;
and determining the plurality of first target frames according to the category information of the plurality of labeling frames.
4. The method of claim 3, wherein adding the feature-enhanced target image set to the sample image results in an enhanced image comprising:
acquiring first region positions of the plurality of labeling frames in the sample image;
and adding the feature-enhanced target image set to the target region position except the first region position in the sample image to obtain an enhanced sample image.
5. The method of claim 4, further comprising:
determining at least one labeling frame of which the number of the labeling frames corresponding to the category information is less than a threshold value as a second target frame, wherein the second target frame comprises a second target image;
adding the feature-enhanced target image set to a target region position in the sample image except the first region position to obtain an enhanced image, including:
and adding a second target image included by the second target frame and the feature-enhanced target image set to the target area position except the first area position in the sample image to obtain an enhanced image.
6. The method of claim 5, wherein adding the second target image included in the second target frame and the feature-enhanced target image set to target region positions in the sample image other than the first region position to obtain an enhanced image comprises:
performing at least one of scaling and angle rotation on a second target image included in the second target frame and the feature-enhanced target image set;
and adding the processed target image set and the second target image to the target area position except the first area position in the sample image to obtain an enhanced image.
7. The method of claim 5, wherein a first ratio of the number of the first target frames to the total number of the labeled frames to a second ratio of the number of the labeled frames except the first target frames and the second target frames to the total number satisfies a predetermined condition.
8. An apparatus for processing a sample image, the apparatus comprising:
an obtaining module, configured to obtain a plurality of first target frames of a sample image, where the first target frames include a first target image, and the first target image includes a channel feature and a spatial feature;
the processing module is used for calculating the first target image included by each first target frame by using a channel feature calculation module and a spatial feature calculation module included in a preset neural network to obtain a feature-enhanced target image set;
the processing module is further configured to add the feature-enhanced target image set to the sample image to obtain an enhanced image.
9. The apparatus according to claim 8, wherein the processing module is further configured to calculate each channel feature of each first target image by using the channel feature calculation module, so as to obtain a channel weight of each first target image; calculating each pixel feature of each first target image by using the spatial feature calculation module to obtain the pixel weight of each first target image; determining an enhanced target image set according to each of the first target images and the channel weights and the pixel weights corresponding to the first target images.
10. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a method of processing a sample image as claimed in any one of claims 1-7.
11. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement a method of processing a sample image as claimed in any one of claims 1 to 7.
CN202011543738.4A 2020-12-24 2020-12-24 Sample image processing method, device, equipment and storage medium Pending CN112733886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011543738.4A CN112733886A (en) 2020-12-24 2020-12-24 Sample image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011543738.4A CN112733886A (en) 2020-12-24 2020-12-24 Sample image processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112733886A true CN112733886A (en) 2021-04-30

Family

ID=75604848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011543738.4A Pending CN112733886A (en) 2020-12-24 2020-12-24 Sample image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112733886A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361588A (en) * 2021-06-03 2021-09-07 北京文安智能技术股份有限公司 Image training set generation method and model training method based on image data enhancement
CN113516165A (en) * 2021-05-07 2021-10-19 北京惠朗时代科技有限公司 Customer satisfaction judging method based on image pyramid matching posterior
CN113887545A (en) * 2021-12-07 2022-01-04 南方医科大学南方医院 Laparoscopic surgical instrument identification method and device based on target detection model
CN114120056A (en) * 2021-10-29 2022-03-01 中国农业大学 Small target identification method, small target identification device, electronic equipment, medium and product
WO2023216269A1 (en) * 2022-05-13 2023-11-16 北京小米移动软件有限公司 Data enhancement method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740571A (en) * 2019-01-22 2019-05-10 南京旷云科技有限公司 The method of Image Acquisition, the method, apparatus of image procossing and electronic equipment
CN110334612A (en) * 2019-06-19 2019-10-15 上海交通大学 Electric inspection process image object detection method with self-learning capability
CN111178183A (en) * 2019-12-16 2020-05-19 深圳市华尊科技股份有限公司 Face detection method and related device
CN111310764A (en) * 2020-01-20 2020-06-19 上海商汤智能科技有限公司 Network training method and device, image processing method and device, electronic equipment and storage medium
CN111862097A (en) * 2020-09-24 2020-10-30 常州微亿智造科技有限公司 Data enhancement method and device for micro defect detection rate

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740571A (en) * 2019-01-22 2019-05-10 南京旷云科技有限公司 The method of Image Acquisition, the method, apparatus of image procossing and electronic equipment
CN110334612A (en) * 2019-06-19 2019-10-15 上海交通大学 Electric inspection process image object detection method with self-learning capability
CN111178183A (en) * 2019-12-16 2020-05-19 深圳市华尊科技股份有限公司 Face detection method and related device
CN111310764A (en) * 2020-01-20 2020-06-19 上海商汤智能科技有限公司 Network training method and device, image processing method and device, electronic equipment and storage medium
CN111862097A (en) * 2020-09-24 2020-10-30 常州微亿智造科技有限公司 Data enhancement method and device for micro defect detection rate

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HU YANTING 等: "Channel-Wise and Spatial Feature Modulation", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》, 7 May 2019 (2019-05-07), pages 3 - 7, XP011817229, DOI: 10.1109/TCSVT.2019.2915238 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516165A (en) * 2021-05-07 2021-10-19 北京惠朗时代科技有限公司 Customer satisfaction judging method based on image pyramid matching posterior
CN113516165B (en) * 2021-05-07 2023-10-10 北京惠朗时代科技有限公司 Customer satisfaction judging method based on image pyramid matching posterior
CN113361588A (en) * 2021-06-03 2021-09-07 北京文安智能技术股份有限公司 Image training set generation method and model training method based on image data enhancement
CN114120056A (en) * 2021-10-29 2022-03-01 中国农业大学 Small target identification method, small target identification device, electronic equipment, medium and product
CN113887545A (en) * 2021-12-07 2022-01-04 南方医科大学南方医院 Laparoscopic surgical instrument identification method and device based on target detection model
CN113887545B (en) * 2021-12-07 2022-03-25 南方医科大学南方医院 Laparoscopic surgical instrument identification method and device based on target detection model
WO2023216269A1 (en) * 2022-05-13 2023-11-16 北京小米移动软件有限公司 Data enhancement method and device

Similar Documents

Publication Publication Date Title
CN112733886A (en) Sample image processing method, device, equipment and storage medium
US11790040B2 (en) Method for object detection and recognition based on neural network
CN111738110A (en) Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN110675415B (en) Road ponding area detection method based on deep learning enhanced example segmentation
CN113378686A (en) Two-stage remote sensing target detection method based on target center point estimation
CN113536920B (en) Semi-supervised three-dimensional point cloud target detection method
CN112446292B (en) 2D image salient object detection method and system
CN103679740B (en) ROI (Region of Interest) extraction method of ground target of unmanned aerial vehicle
CN110610130A (en) Multi-sensor information fusion power transmission line robot navigation method and system
Bin et al. A design of parking space detector based on video image
CN113743163A (en) Traffic target recognition model training method, traffic target positioning method and device
CN113627299A (en) Intelligent wire floater identification method and device based on deep learning
CN116258940A (en) Small target detection method for multi-scale features and self-adaptive weights
CN115100545A (en) Target detection method for small parts of failed satellite under low illumination
CN111814773A (en) Lineation parking space identification method and system
CN117351374B (en) Remote sensing image saliency target detection method, system, equipment and medium
CN112800932A (en) Method for detecting obvious ship target in marine background and electronic equipment
CN117423077A (en) BEV perception model, construction method, device, equipment, vehicle and storage medium
CN116797789A (en) Scene semantic segmentation method based on attention architecture
CN113284221B (en) Target detection method and device and electronic equipment
CN115082897A (en) Monocular vision 3D vehicle target real-time detection method for improving SMOKE
CN114639084A (en) Road side end vehicle sensing method based on SSD (solid State disk) improved algorithm
CN114581876A (en) Method for constructing lane detection model under complex scene and method for detecting lane line
CN114417946A (en) Target detection method and device
Vasudha et al. Carriageway Edge Detection for Unmarked Urban Roads using Deep Learning Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210430

RJ01 Rejection of invention patent application after publication