CN114004963A - Target class identification method and device and readable storage medium - Google Patents

Target class identification method and device and readable storage medium Download PDF

Info

Publication number
CN114004963A
CN114004963A CN202111652406.4A CN202111652406A CN114004963A CN 114004963 A CN114004963 A CN 114004963A CN 202111652406 A CN202111652406 A CN 202111652406A CN 114004963 A CN114004963 A CN 114004963A
Authority
CN
China
Prior art keywords
feature vector
target
sub
feature
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111652406.4A
Other languages
Chinese (zh)
Other versions
CN114004963B (en
Inventor
艾国
杨作兴
房汝明
向志宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen MicroBT Electronics Technology Co Ltd
Original Assignee
Shenzhen MicroBT Electronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen MicroBT Electronics Technology Co Ltd filed Critical Shenzhen MicroBT Electronics Technology Co Ltd
Priority to CN202111652406.4A priority Critical patent/CN114004963B/en
Publication of CN114004963A publication Critical patent/CN114004963A/en
Application granted granted Critical
Publication of CN114004963B publication Critical patent/CN114004963B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a target class identification method, a target class identification device and a readable storage medium. The method comprises the following steps: performing feature extraction on an image to be recognized to obtain a first feature vector; finding each first sub-feature vector corresponding to each foreground and background area in the first feature vectors; performing first interpolation processing on each first sub-feature vector to obtain each second sub-feature vector; performing target detection according to the second sub-feature vectors to obtain target areas; finding out each first sub-feature vector corresponding to each target area in the first feature vectors, and splicing each first sub-feature vector into a third feature vector; performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector; and identifying the target category according to the fifth feature vector to obtain the target category contained in the image. The embodiment of the invention further refines the granularity of target class identification.

Description

Target class identification method and device and readable storage medium
Technical Field
The embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method and an apparatus for identifying a target class, a readable storage medium, and a computer program product.
Background
In many scenarios, the objects need to be classified for various purposes. And for different target categories with similar morphology and texture, the general image classification method is difficult to distinguish. For example: in the underground pipeline construction scene, in recent years, the urbanization process of China develops rapidly, with hundreds of millions of people flowing into the city, the pressure born by the underground pipeline is further aggravated, the underground pipeline construction is used as a very important basic task in the city construction process, the stability of the normal operation of the city is influenced, a pipe network system is overhauled in time, and the urban infrastructure construction stability is guaranteed.
At present, most of the methods for detecting the defects of the underground pipelines are that video data are shot through a robot in a well, then the pipeline sections with the defects are determined through the acquired massive information through manual screening, and after a defect type report is generated, workers overhaul the pipeline according to the reported defect types. However, there are two disadvantages to the manual determination of the type of pipe defect, the first: people who can screen pipe defect types must have certain expertise, which limits the amount of manpower that can be put into use; secondly, the method comprises the following steps: the human power may be periodically tired during the work, thereby causing a reduction in work efficiency. The superposition of the two defects causes that the urban pipeline maintenance work is difficult to be carried out in time.
At present, simple drainage pipeline damage categories (deposition, cracks, tree roots and the like) are classified by using a neural network, and the classification cannot be popularized to more pipeline defect category identification because the classification cannot selectively capture information useful for defect category differentiation.
In order to be able to identify more pipeline defect classes, another existing solution is to use a deep convolutional neural network: on the basis of a VGG (Visual Geometry Group) 16, a CBAM (Convolutional Block Attention Module) is inserted into each convolution Module to calculate an Attention moment array of a channel and a space level respectively, and then the Attention moment array is multiplied by original features, so that the purpose of enabling a network pipeline to have discriminant force features is achieved, and therefore, the classification of the defects of the pipeline before can only identify three types of defects is popularized to the extent of identifying seven types of defects such as deformation, corrosion, scaling, dislocation, deposition, leakage and fracture. Although relatively more versatile than previous methods, this method still has several drawbacks:
1) according to introduction of the published industry standard urban drainage pipeline detection and evaluation technical regulation of China ministry of urban and rural construction of housing, 17 types of defects of underground drainage pipelines (hidden connection, deformation, misconnection, residual wall, penetration, corrosion, scum, scaling, undulation, tree root, disjointing, shedding, obstacle, stagger, deposition, leakage and fracture) exist. The method can only identify seven most common defect types, so that the method cannot be widely popularized, and on the other hand, the aim of reducing the labor input cannot be achieved. Since even though this method can achieve extremely high recognition accuracy among these seven defects, it cannot be guaranteed whether the remaining 10 defects are considered normal, and these other ten defects are not negligible in drain line repair. Therefore, the data after the model detection must be screened again by manpower.
2) Although the attention mechanism of the channel and the space level is adopted, the important characteristics of the sample expression can be extracted by the neural network, so that the purpose of distinguishing the defect types with high confusion among the classes can be achieved. But the powerful fitting power of neural networks comes from its thousands of neurons, and the computation of inserting channels and spatial level attention mechanisms in each convolutional layer will greatly exacerbate the computational cost of the model.
3) CBAM is essentially a self-attentive mechanism that autonomously selects to suppress or enhance certain feature information on a current a priori basis. However, the prior information provided by the labeling information of image classification during feature selection is limited, and when more and more complex pipeline defect categories need to be identified, the CBAM mechanism is difficult to focus on truly distinguishing features.
In addition, the academic community provides a lot of meaningful exploration references for the classification task of the fine-grained images, and the methods are all used for distinguishing the classes with high confusion degree among the classes based on a mode of detection before classification. That is, the positions of the sub-regions with the differentiated granularity in the image are detected, and then the fine classification is performed based on the characteristics of the regions. However, the fine-grained classification method of first detection and then classification is not suitable for modeling of the universal pipeline defect classification method, because one of the main properties of the pipeline defects is that the defect distribution area is wide but the actual area occupation ratio of the defect region is low (such as cracks of the pipeline wall, tree roots and the like); furthermore, classification based on features of only locally distinguishable regions is not conducive to distinguishing classes that are relatively distant between classes.
Disclosure of Invention
The embodiment of the invention provides a target class identification method, a target class identification device, a readable storage medium and a computer program product, which are used for refining class granularity of target class identification and improving identification precision of the target class identification.
The technical scheme of the embodiment of the invention is realized as follows:
a method of object class identification, the method comprising:
performing feature extraction on an image to be recognized to obtain a first feature vector;
detecting each foreground region and each background region in the image according to the first feature vector;
searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the image;
respectively carrying out first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region to obtain each second sub-feature vector with a first fixed size;
performing target detection according to the second sub-feature vectors to obtain target areas in the image;
searching each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the image, and splicing each first sub-feature vector corresponding to all target areas into a third feature vector;
performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
and identifying the target category according to the fifth feature vector to obtain the target category contained in the image.
The feature extraction of the image to be recognized to obtain a first feature vector comprises the following steps:
inputting an image to be identified into a backbone network of a neural network for feature extraction;
detecting each foreground region and each background region in the image according to the first feature vector, including:
inputting first feature vectors into a region suggestion network of the neural network to detect respective foreground regions and respective background regions in the image;
the performing target detection according to the second sub-feature vectors includes:
inputting the second sub-feature vectors into a target detection network of the neural network for target detection;
the classifying and identifying the target according to the fifth feature vector comprises the following steps:
and inputting the fifth feature vector into a target classification network of the neural network for target class identification.
The backbone network, the area suggestion network, the target detection network and the target classification network of the neural network are obtained through the following training processes:
acquiring a training image set, and labeling each labeling target area and a corresponding labeling target category in each frame of training image;
sequentially taking out a frame of training image from the training image set and inputting the frame of training image into a backbone network of the neural network for feature extraction to obtain a first feature vector of the input training image;
inputting a first feature vector into a region suggestion network of the neural network to detect each foreground region and each background region in an input training image;
searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the input training image;
respectively carrying out first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region to obtain each second sub-feature vector with a first fixed size;
inputting the second sub-feature vectors into a target detection network of the neural network to obtain detection target areas and detection target types of the detection target areas in the input training image;
calculating by adopting a preset first loss function according to each detection target area and the detection target category of each detection target area obtained by the target detection network, each labeled target area labeled in the input training image and the corresponding labeled target category to obtain a first prediction deviation;
searching each first sub-feature vector corresponding to each detection target area in the first feature vector according to the corresponding area of each detection target area in the input training image, and splicing each first sub-feature vector corresponding to all detection target areas into a third feature vector;
performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
inputting the fifth feature vector into a target classification network of the neural network to obtain each detection target category contained in the input training image;
calculating by adopting a preset second loss function according to each detection target category contained in the input training image obtained by the target classification network and the labeled target category of each labeled target area labeled in the input training image to obtain a second prediction deviation;
carrying out weighted summation on the first prediction deviation and the second prediction deviation, and adjusting parameters of the neural network according to the weighted summation;
when the neural network converges, the neural network at that time is taken as the neural network to be finally used.
After the first sub-feature vectors corresponding to all the target regions are spliced into the third feature vector, and before the fourth feature vector and the third feature vector are superposed, the method further includes:
performing self-attention mechanism enhancement processing on the third feature vector to obtain a self-attention coefficient of each feature value in the third feature vector, and multiplying each feature value in the third feature vector by the self-attention coefficient to obtain a self-attention mechanism enhancement feature vector of the third feature vector;
the superimposing the fourth feature vector and the third feature vector includes:
and superposing the fourth feature vector and the self-attention mechanism enhanced feature vector of the third feature vector.
The method for labeling each labeling target area and the corresponding labeling target category in each frame of training image further comprises the following steps:
labeling the outline of each labeling target in each frame of training image;
after finding each foreground region and each first sub-feature vector corresponding to each background region in the first feature vector, and before performing weighted summation on the first prediction bias and the second prediction bias, the method further includes:
respectively carrying out second interpolation processing on the first sub-feature vectors corresponding to the foreground regions and the background regions to obtain sixth sub-feature vectors with a second fixed size;
inputting the sixth sub-feature vectors into a semantic segmentation network of the neural network to obtain the outline and the class of each detection target in the input training image;
calculating by adopting a preset third loss function according to the contour and the detection target category of each detection target in the input training image and the contour and the labeling target category of each labeling target labeled in the input training image obtained by the semantic segmentation network to obtain a third prediction deviation;
the weighted summation of the first prediction bias and the second prediction bias comprises:
and carrying out weighted summation on the first prediction deviation, the second prediction deviation and the third prediction deviation.
The image to be identified is a pipeline image, and the target category is a pipeline defect category.
The pipeline defect category comprises one or any combination of the following: blind joint, deformation, misconnection, wall residue, penetration, corrosion, scum, scaling, undulation, tree root, disjointing, shedding, obstacle, stagger, deposition, leakage, rupture.
An object class identification apparatus, the apparatus comprising:
the characteristic extraction module is used for extracting characteristics of the image to be identified to obtain a first characteristic vector;
the region suggestion module is used for detecting each foreground region and each background region in the image to be identified according to the first feature vector;
the region-of-interest alignment module is used for searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the image to be identified, and performing first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region respectively to obtain each second sub-feature vector with a first fixed size; searching each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the image detected by the target detection module, and splicing each first sub-feature vector corresponding to all the target areas into a third feature vector;
the target detection module is used for carrying out target detection according to the second sub-feature vectors to obtain each target area in the image to be identified;
the self-adaptive global average pooling processing module is used for carrying out self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
the feature fusion module is used for superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
and the category identification module is used for identifying the target category according to the fifth feature vector to obtain the target category contained in the image to be identified.
An object class recognition neural network training apparatus, the apparatus comprising:
the image acquisition module is used for acquiring a training image set and marking each marking target area and the corresponding marking target category in each frame of training image;
the characteristic extraction module is used for sequentially extracting a frame of training image from the training image set and inputting the frame of training image into a backbone network of the neural network for characteristic extraction to obtain a first characteristic vector of the input training image;
the region suggestion module is used for inputting the first feature vector into a region suggestion network of the neural network so as to detect each foreground region and each background region in the input training image;
the region-of-interest alignment module is used for searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the input training image, and performing first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region respectively to obtain each second sub-feature vector with a first fixed size; searching each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the input training image obtained by the target detection module, and splicing each first sub-feature vector corresponding to all target areas into a third feature vector;
the target detection module is used for inputting the second sub-feature vectors into a target detection network of the neural network to obtain detection target areas and detection target types of the detection target areas in the input training image;
the self-adaptive global average pooling processing module is used for carrying out self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
the feature fusion module is used for superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
the class identification module is used for inputting a fifth feature vector into a target classification network of the neural network to obtain each detection target class contained in the input training image;
the adjusting module is used for calculating by adopting a preset first loss function according to the detection target areas and the detection target types of the detection target areas obtained by the target detection module, the marking target areas marked in the input training image and the corresponding marking target types to obtain a first prediction deviation; calculating by adopting a preset second loss function according to each detection target category contained in the input training image and each labeled target category of each labeled target area labeled in the input training image obtained by the category identification module to obtain a second prediction deviation; carrying out weighted summation on the first prediction deviation and the second prediction deviation, and adjusting parameters of the neural network according to the weighted summation; when the neural network converges, the neural network at that time is taken as the neural network to be finally used.
A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the target class identification method of any one of the above.
In the embodiment of the present invention, after the local features (third feature vectors) corresponding to each target region in the image and the global features (fourth feature vectors) extracted from the image are superimposed and fused, the target category is identified, thereby implementing: the method can distinguish the categories with high confusion degree among the categories and the categories with large distance among the categories, thereby refining the category range of target category identification and improving the identification precision of the target category identification.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of a target class identification method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a target class identification method according to another embodiment of the present invention;
FIG. 3 is a flowchart of a method for training a neural network for target class recognition according to an embodiment of the present invention;
FIG. 4 is an original training image acquired: a schematic of a pipeline image;
FIG. 5 is a schematic illustration of a defect of FIG. 4 being labeled;
fig. 6 is a schematic structural diagram of an object class identification apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a target class recognition neural network training device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present invention will be described in detail with specific examples. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a flowchart of a target class identification method according to an embodiment of the present invention, which includes the following specific steps:
step 101: and performing feature extraction on the image to be recognized to obtain a first feature vector.
Step 102: and detecting each foreground region and each background region in the image according to the first feature vector.
One foreground region corresponds to one connected region where the target is located, and the remaining connected regions respectively correspond to one background region. For example: if the image to be identified is a pipeline image and the target category to be identified is a category of pipeline defects, each connected region where the defects are located in the pipeline image is a foreground region.
Step 103: and searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the image.
Each eigenvalue in the first eigenvector corresponds to a region of the image to be recognized, that is, each eigenvalue is used for describing a region of the image to be recognized. Therefore, for each foreground region or background region, according to its position in the image to be recognized, the corresponding part of feature values can be found in the first feature vector, and since the part of feature values actually belongs to the sub-feature vectors of the first feature vector, the part of feature values is referred to as the corresponding first sub-feature vector.
Step 104: and respectively carrying out first interpolation processing on the first sub-feature vectors corresponding to each foreground area and each background area to obtain second sub-feature vectors with a first fixed size.
Step 105: and carrying out target detection according to the second sub-feature vectors to obtain each target area in the image.
Step 106: and finding each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the image, and splicing each first sub-feature vector corresponding to all the target areas into a third feature vector.
For each target area, according to the position of the target area in the image to be recognized, the part of the feature value corresponding to the target area can be found in the first feature vector, and the part of the feature value is called as a first sub-feature vector corresponding to the target area.
Step 107: performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; and the dimension of the fourth feature vector is the same as that of the third feature vector.
Step 108: and superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector.
Step 109: and carrying out target classification and identification according to the fifth feature vector to obtain a target class contained in the image.
In the above embodiment, after the local features (third feature vectors) corresponding to each target region in the image and the global features (fourth feature vectors) extracted from the image are superimposed and fused, the target class identification is performed, so that the following is achieved: the method can distinguish the categories with high confusion degree among the categories and the categories with large distance among the categories, thereby refining the category range of target category identification and improving the identification precision of the target category identification.
In an alternative embodiment, steps 101, 102, 105 and 109 may be implemented by a neural network, which is mainly composed of a backbone network, a regional recommendation network (RPN), an object detection network and an object classification network.
Fig. 2 is a flowchart of a target class identification method according to another embodiment of the present invention, which includes the following specific steps:
step 201: and inputting the image to be identified into a backbone network of a neural network for feature extraction to obtain a first feature vector.
The backbone network may employ a ResNet50 structure.
Step 202: the first feature vectors are input into a region suggestion network (RPN) of the neural network to detect respective foreground regions and respective background regions in the image.
Step 203: and searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the image.
Step 204: and respectively carrying out first interpolation processing on the first sub-feature vectors corresponding to each foreground area and each background area to obtain second sub-feature vectors with a first fixed size.
Step 205: and inputting each second sub-feature vector into a target detection network of the neural network for target detection to obtain each target area in the image.
Step 206: and finding each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the image, and splicing each first sub-feature vector corresponding to all the target areas into a third feature vector.
Step 207: performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; and the dimension of the fourth feature vector is the same as that of the third feature vector.
Step 208: and superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector.
Step 209: and inputting the fifth feature vector into a target classification network of the neural network for target classification and identification to obtain a target class contained in the image.
In an alternative embodiment, after step 106, before step 108, or after step 206, before step 208, further comprising: performing self-attention mechanism enhancement processing on the third feature vector to obtain a self-attention coefficient of each feature value in the third feature vector, and multiplying each feature value in the third feature vector by the self-attention coefficient to obtain a self-attention mechanism enhancement feature vector of the third feature vector;
in step 108 or step 208, the superimposing the fourth feature vector and the third feature vector includes: and superposing the fourth feature vector and the self-attention mechanism enhanced feature vector of the third feature vector.
The value range of the self-attention coefficient of each feature value is [ 0, 1 ], and the self-attention mechanism enhancement processing is an existing algorithm and is not described herein.
In the above embodiment, the feature value corresponding to the target region in the image may be enhanced by calculating the self-attention coefficient of each feature value in the third feature vector, so as to improve the accuracy of the final target class identification.
Fig. 3 is a flowchart of a method for training a neural network for target class recognition according to an embodiment of the present invention, which includes the following specific steps:
step 301: and acquiring a training image set, and labeling each target area and the corresponding target class in each frame of training image.
In order to distinguish the target areas from the detection target areas in the subsequent step 306, each target area labeled in the step 301 is referred to as a labeled target area; in order to distinguish from the detection target class in the subsequent steps 306 and 311, the target class labeled in this step 301 is referred to as a labeled target class.
Here, the labeling target area is usually represented by a rectangular box, which is essentially the position of the labeling target area, and the position of the target area is usually described by the top left vertex or center point of the rectangular box.
For example: when the pipeline defect category is to be identified, pipeline images are collected to form a training image set.
FIG. 4 is an original training image acquired: a schematic of a pipe image in which the gray circles are defects (here the pipe defect classes are identified).
FIG. 5 is a schematic diagram of the defect of FIG. 4, wherein the dashed rectangle is the defect frame (i.e. the smallest rectangle containing a defect), and the location of the defect is essentially marked.
The black circle in fig. 5 is a mask (i.e., a minimum connected region formed by each pixel point on the defect) of the region corresponding to the defect, and if the mask can be known about the contour of the defect, the mask is labeled, i.e., the contour of the defect is labeled.
Step 302: and sequentially taking out a frame of training image from the training image set and inputting the frame of training image into a backbone network of the neural network for feature extraction to obtain a first feature vector of the input training image.
Step 303: the first feature vectors are input to a region suggestion network (RPN) of the neural network to detect respective foreground regions and respective background regions in the input training image.
Each foreground and background area is represented by a rectangular box.
Step 304: and searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the input training image.
Each eigenvalue in the first eigenvector corresponds to a region in the input training image (the region is composed of a plurality of pixel points), and according to which region each foreground frame or background frame is located in the training image (namely, the region of the training image to which the rectangle corresponding to the foreground frame or background frame can be mapped), the corresponding sub-eigenvector can be found in the first eigenvector.
Step 305: and respectively carrying out first interpolation processing on the first sub-feature vectors corresponding to each foreground area and each background area to obtain second sub-feature vectors with a first fixed size.
The first interpolation processing may be bidirectional linear interpolation processing, and specifically, which interpolation algorithm is used is not limited in this embodiment.
The first fixed size can be set according to the requirement, and this embodiment does not limit this, for example, set to 7 × 7.
Step 306: and inputting each second sub-feature vector into a target detection network of the neural network to obtain each detection target area and the detection target category of each detection target area in the input training image.
And detecting the target area, namely, the target area detected by the target detection network of the neural network in the input training image.
The detection target area is represented by a detection target frame, i.e., a minimum rectangular frame containing the detected target, and the position of the detection target area is usually described by the top left vertex or the center point of the rectangular frame.
Step 307: and calculating by adopting a preset first loss function according to each detection target area and the detection target type of each detection target area and each labeled target area and labeled target type labeled in the input training image to obtain a first prediction deviation.
Here, since the object detection network outputs two parameters for each object: the loss function calculation is performed on the two parameters, which may be the same or different, for example: and adopting smooth _ L1_ loss (smooth L1 loss) function for the detection target area, adopting cross entropy function for the detection target category, and finally adding the prediction deviation corresponding to the detection target area and the prediction deviation corresponding to the detection target category to obtain a first prediction deviation.
Step 308: and searching each first sub-feature vector corresponding to each detection target area in the first feature vector according to the corresponding area of each detection target area in the input training image, and splicing each first sub-feature vector corresponding to all detection target areas into a third feature vector.
Step 309: performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; and the dimension of the fourth feature vector is the same as that of the third feature vector.
The adaptive global average pooling process is an existing mature algorithm and is not described herein.
Step 310: and superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector.
Step 311: and inputting the fifth feature vector into a target classification network of the neural network to obtain a detection target class contained in the input training image.
Step 312: and calculating by adopting a preset second loss function according to the detection target class obtained by the target classification network and each labeled target class labeled in the input training image to obtain a second prediction deviation.
The second loss function may use a cross entry function.
Step 313: and carrying out weighted summation on the first prediction deviation and the second prediction deviation, and adjusting the parameters of the neural network according to the weighted summation.
For example: a SGD (Stochastic Gradient Descent) algorithm may be used to adjust parameters of the neural network based on the weighted sum.
Step 314: when the neural network converges, the neural network at that time is taken as the neural network to be finally used.
In an optional embodiment, after step 308 and before step 310, further comprising: performing self-attention mechanism enhancement processing on the third feature vector to obtain a self-attention coefficient of each feature value in the third feature vector, and multiplying each feature value in the third feature vector by the self-attention coefficient to obtain a self-attention mechanism enhancement feature vector of the third feature vector;
in step 310, the superimposing the fourth feature vector and the third feature vector includes:
and superposing the fourth feature vector and the self-attention mechanism enhanced feature vector of the third feature vector.
The value range of the self-attention coefficient of each feature value is [ 0, 1 ], and the self-attention mechanism enhancement processing is an existing algorithm and is not described herein.
In an optional embodiment, in step 301, further labeling the contour and the category of each labeled target in each frame of training image, and further including, after "finding each first sub-feature vector corresponding to each detected target region in the first feature vector" in step 308 and before "performing weighted summation on the first prediction bias and the second prediction bias" in step 313:
respectively carrying out second interpolation processing on each first sub-feature vector to obtain each sixth sub-feature vector with a second fixed size; inputting each sixth sub-feature vector into a semantic segmentation network of a neural network to obtain the outline and the category of each detection target in the input training image; calculating by adopting a preset third loss function according to the contour and the category of each detection target in the input training image obtained by the semantic segmentation network and the contour and the category of each labeled target labeled in the input training image to obtain a third prediction deviation; here, the third loss function may be a cross entry function. The second interpolation processing may be bidirectional linear interpolation processing, and specifically, which interpolation algorithm is used is not limited in this embodiment. The second fixed size can be set according to the requirement, and this embodiment does not limit this, for example, set to 13 × 13. And marking the contour of the target as the real contour of the target.
In step 313, the weighted summation of the first prediction bias and the second prediction bias includes: and carrying out weighted summation on the first prediction deviation, the second prediction deviation and the third prediction deviation.
In the embodiment, by adding the semantic segmentation network, the characteristics predicted by the neural network are more favorable for the target position information, so that the function of accurately inhibiting the background information and enhancing the target information can be achieved, and accurate prior information is provided for the subsequent process.
In practical application, the collected partial images can be put into a verification image set, when the neural network converges, the converged neural network is verified by adopting the verification image set, and if the verification effect does not meet the requirement, the neural network is retrained until the verification effect meets the requirement by changing the structure of each sub-network in the neural network and the like. Typically, the verification image set is 1/4 the size of the training image set.
The image to be identified in the embodiment of the invention can be a pipeline image, and the corresponding target category is a pipeline defect category.
The pipeline defect category in the embodiment of the invention can comprise one or any combination of the following: blind joint, deformation, misconnection, wall residue, penetration, corrosion, scum, scaling, undulation, tree root, disjointing, shedding, obstacle, stagger, deposition, leakage, rupture. Of course, the classification of pipe defects in the embodiments of the present invention is not limited thereto, and other pipe defects or classification of defects similar to pipes are covered by the scope of the present claims.
Fig. 6 is a schematic structural diagram of an object class identification apparatus according to an embodiment of the present invention, where the apparatus mainly includes: a feature extraction module 61, a region suggestion module 62, a region of interest alignment module 63, a target detection module 64, an adaptive global average pooling processing module 65, a feature fusion module 66, and a category identification module 67, wherein:
the feature extraction module 61 is configured to perform feature extraction on the image to be identified to obtain a first feature vector.
And the region suggesting module 62 is configured to detect each foreground region and each background region in the image to be recognized according to the first feature vector obtained by the feature extracting module 61.
An interested region aligning module 63, configured to find, according to each foreground region and each corresponding region of each background region in the image to be identified, detected by the region suggesting module 62, each first sub-feature vector corresponding to each foreground region and each background region from the first feature vectors obtained by the feature extracting module 61; and respectively carrying out first interpolation processing on each first sub-feature vector to obtain each second sub-feature vector with a first fixed size. According to the corresponding region of each target region in the image detected by the target detection module 64, each first sub-feature vector corresponding to each target region is found in the first feature vectors obtained by the feature extraction module 61, and the first sub-feature vectors corresponding to all target regions are spliced into a third feature vector.
And the target detection module 64 is configured to perform target detection according to each second sub-feature vector obtained by the region-of-interest alignment module 63, so as to obtain each target region in the image to be identified.
The adaptive global average pooling processing module 65 is configured to perform adaptive global average pooling processing on the first feature vector obtained by the feature extracting module 61 to obtain a fourth feature vector; and the dimension of the fourth feature vector is the same as that of the third feature vector.
And the feature fusion module 66 is configured to superimpose the fourth feature vector obtained by the adaptive global average pooling processing module 65 and the third feature vector obtained by the region of interest aligning module 63 to obtain a fifth feature vector.
And the category identification module 67 is configured to perform target category identification according to the fifth feature vector obtained by the feature fusion module 66, so as to obtain a target category included in the image to be identified.
Fig. 7 is a schematic structural diagram of a target class recognition neural network training device according to an embodiment of the present invention, where the device mainly includes: an image acquisition module 71, a feature extraction module 72, a region suggestion module 73, a region of interest alignment module 74, a target detection module 75, an adaptive global average pooling processing module 76, a feature fusion module 77, a category identification module 78, and an adjustment module 79, wherein:
the image acquisition module 71 is configured to acquire a training image set, and label each labeled target area and a corresponding labeled target category in each frame of training image.
And the feature extraction module 72 is configured to sequentially extract a frame of training image from the training image set and input the frame of training image to the backbone network of the neural network for feature extraction, so as to obtain a first feature vector of the input training image.
A region suggestion module 73, configured to input the first feature vector to a region suggestion network of the neural network, so as to detect each foreground region and each background region in the input training image.
An interested region alignment module 74, configured to search, according to corresponding regions of each foreground region and each background region in the input training image, each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector, and perform first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region, respectively, to obtain each second sub-feature vector of a first fixed size; according to the corresponding region of each target region in the input training image obtained by the target detection module 75, each first sub-feature vector corresponding to each target region is found in the first feature vector, and the first sub-feature vectors corresponding to all target regions are spliced into a third feature vector.
The target detection module 75 is configured to input each second sub-feature vector to a target detection network of the neural network, so as to obtain each detection target area and a detection target category of each detection target area in the input training image.
An adaptive global average pooling processing module 76, configured to perform adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; and the dimension of the fourth feature vector is the same as that of the third feature vector.
And a feature fusion module 77, configured to superimpose the fourth feature vector and the third feature vector to obtain a fifth feature vector.
And a class identification module 78, configured to input the fifth feature vector to a target classification network of the neural network, so as to obtain each detection target class included in the input training image.
An adjusting module 79, configured to calculate, according to the detection target regions and the detection target categories of the detection target regions obtained by the target detecting module 75, and the labeling target regions and the corresponding labeling target categories labeled in the input training image, by using a preset first loss function, so as to obtain a first prediction deviation; calculating by using a preset second loss function according to each detection target category contained in the input training image and each labeled target category of each labeled target region labeled in the input training image obtained by the category identification module 78 to obtain a second prediction deviation; carrying out weighted summation on the first prediction deviation and the second prediction deviation, and adjusting parameters of the neural network according to the weighted summation; when the neural network converges, the neural network at that time is taken as the neural network to be finally used.
Embodiments of the present invention further provide a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the steps of the object class identification method described in any of the above embodiments are implemented.
Embodiments of the present invention also provide a computer-readable storage medium, which stores instructions that, when executed by a processor, may perform the steps in the object class identification method as described above. In practical applications, the computer readable medium may be included in each device/apparatus/system of the above embodiments, or may exist separately and not be assembled into the device/apparatus/system. Wherein instructions are stored in a computer readable storage medium, which stored instructions, when executed by a processor, may perform the steps in the object class identification method as described above.
According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
As shown in fig. 8, an embodiment of the present invention further provides an electronic device. As shown in fig. 8, it shows a schematic structural diagram of an electronic device according to an embodiment of the present invention, specifically:
the electronic device may include a processor 81 of one or more processing cores, memory 82 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. The above-described object class identification method may be implemented when the program of the memory 82 is executed.
Specifically, in practical applications, the electronic device may further include a power supply 83, an input/output unit 84, and the like. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 8 is not intended to be limiting of the electronic device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. Wherein:
the processor 81 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the server and processes data by operating or executing software programs and/or modules stored in the memory 82 and calling data stored in the memory 82, thereby performing overall monitoring of the electronic device.
The memory 82 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 81 executes various functional applications and data processing by executing software programs and modules stored in the memory 82. The memory 82 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 82 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 82 may also include a memory controller to provide the processor 81 with access to the memory 82.
The electronic device further comprises a power supply 83 for supplying power to each component, and the power supply 83 can be logically connected with the processor 81 through a power management system, so that functions of charging, discharging, power consumption management and the like can be managed through the power management system. The power supply 83 may also include any component including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may also include an input-output unit 84, the input-unit output 84 being operable to receive entered numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. The input unit output 84 may also be used to display information input by or provided to the user as well as various graphical user interfaces, which may be composed of graphics, text, icons, video, and any combination thereof.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.
The principles and embodiments of the present invention are explained herein using specific examples, which are provided only to help understanding the method and the core idea of the present invention, and are not intended to limit the present application. It will be appreciated by those skilled in the art that changes may be made in this embodiment and its broader aspects and without departing from the principles, spirit and scope of the invention, and that all such modifications, equivalents, improvements and equivalents as may be included within the scope of the invention are intended to be protected by the claims.

Claims (10)

1. An object class identification method, characterized in that the method comprises:
performing feature extraction on an image to be recognized to obtain a first feature vector;
detecting each foreground region and each background region in the image according to the first feature vector;
searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the image;
respectively carrying out first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region to obtain each second sub-feature vector with a first fixed size;
performing target detection according to the second sub-feature vectors to obtain target areas in the image;
searching each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the image, and splicing each first sub-feature vector corresponding to all target areas into a third feature vector;
performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
and identifying the target category according to the fifth feature vector to obtain the target category contained in the image.
2. The method according to claim 1, wherein the extracting features of the image to be recognized to obtain a first feature vector comprises:
inputting an image to be identified into a backbone network of a neural network for feature extraction;
detecting each foreground region and each background region in the image according to the first feature vector, including:
inputting first feature vectors into a region suggestion network of the neural network to detect respective foreground regions and respective background regions in the image;
the performing target detection according to the second sub-feature vectors includes:
inputting the second sub-feature vectors into a target detection network of the neural network for target detection;
the classifying and identifying the target according to the fifth feature vector comprises the following steps:
and inputting the fifth feature vector into a target classification network of the neural network for target class identification.
3. The method of claim 2, wherein the backbone network, the area recommendation network, the object detection network, and the object classification network of the neural network are obtained by a training process comprising:
acquiring a training image set, and labeling each labeling target area and a corresponding labeling target category in each frame of training image;
sequentially taking out a frame of training image from the training image set and inputting the frame of training image into a backbone network of the neural network for feature extraction to obtain a first feature vector of the input training image;
inputting a first feature vector into a region suggestion network of the neural network to detect each foreground region and each background region in an input training image;
searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the input training image;
respectively carrying out first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region to obtain each second sub-feature vector with a first fixed size;
inputting the second sub-feature vectors into a target detection network of the neural network to obtain detection target areas and detection target types of the detection target areas in the input training image;
calculating by adopting a preset first loss function according to each detection target area and the detection target category of each detection target area obtained by the target detection network, each labeled target area labeled in the input training image and the corresponding labeled target category to obtain a first prediction deviation;
searching each first sub-feature vector corresponding to each detection target area in the first feature vector according to the corresponding area of each detection target area in the input training image, and splicing each first sub-feature vector corresponding to all detection target areas into a third feature vector;
performing self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
inputting the fifth feature vector into a target classification network of the neural network to obtain each detection target category contained in the input training image;
calculating by adopting a preset second loss function according to each detection target category contained in the input training image obtained by the target classification network and the labeled target category of each labeled target area labeled in the input training image to obtain a second prediction deviation;
carrying out weighted summation on the first prediction deviation and the second prediction deviation, and adjusting parameters of the neural network according to the weighted summation;
when the neural network converges, the neural network at that time is taken as the neural network to be finally used.
4. The method according to claim 1, wherein after the first sub-feature vectors corresponding to all the target regions are spliced into the third feature vector, and before the fourth feature vector is superimposed on the third feature vector, the method further comprises:
performing self-attention mechanism enhancement processing on the third feature vector to obtain a self-attention coefficient of each feature value in the third feature vector, and multiplying each feature value in the third feature vector by the self-attention coefficient to obtain a self-attention mechanism enhancement feature vector of the third feature vector;
the superimposing the fourth feature vector and the third feature vector includes:
and superposing the fourth feature vector and the self-attention mechanism enhanced feature vector of the third feature vector.
5. The method of claim 3, wherein each labeled target area and corresponding labeled target category are labeled in each frame of training image, and further comprising:
labeling the outline of each labeling target in each frame of training image;
after finding each foreground region and each first sub-feature vector corresponding to each background region in the first feature vector, and before performing weighted summation on the first prediction bias and the second prediction bias, the method further includes:
respectively carrying out second interpolation processing on the first sub-feature vectors corresponding to the foreground regions and the background regions to obtain sixth sub-feature vectors with a second fixed size;
inputting the sixth sub-feature vectors into a semantic segmentation network of the neural network to obtain the outline and the class of each detection target in the input training image;
calculating by adopting a preset third loss function according to the contour and the detection target category of each detection target in the input training image and the contour and the labeling target category of each labeling target labeled in the input training image obtained by the semantic segmentation network to obtain a third prediction deviation;
the weighted summation of the first prediction bias and the second prediction bias comprises:
and carrying out weighted summation on the first prediction deviation, the second prediction deviation and the third prediction deviation.
6. The method of claim 1, wherein the image to be identified is a pipe image and the object class is a pipe defect class.
7. The method of claim 6, wherein the pipe defect category comprises one or any combination of the following: blind joint, deformation, misconnection, wall residue, penetration, corrosion, scum, scaling, undulation, tree root, disjointing, shedding, obstacle, stagger, deposition, leakage, rupture.
8. An object class identification device, characterized in that the device comprises:
the characteristic extraction module is used for extracting characteristics of the image to be identified to obtain a first characteristic vector;
the region suggestion module is used for detecting each foreground region and each background region in the image to be identified according to the first feature vector;
the region-of-interest alignment module is used for searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the image to be identified, and performing first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region respectively to obtain each second sub-feature vector with a first fixed size; searching each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the image detected by the target detection module, and splicing each first sub-feature vector corresponding to all the target areas into a third feature vector;
the target detection module is used for carrying out target detection according to the second sub-feature vectors to obtain each target area in the image to be identified;
the self-adaptive global average pooling processing module is used for carrying out self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
the feature fusion module is used for superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
and the category identification module is used for identifying the target category according to the fifth feature vector to obtain the target category contained in the image to be identified.
9. An object class recognition neural network training apparatus, comprising:
the image acquisition module is used for acquiring a training image set and marking each marking target area and the corresponding marking target category in each frame of training image;
the characteristic extraction module is used for sequentially extracting a frame of training image from the training image set and inputting the frame of training image into a backbone network of the neural network for characteristic extraction to obtain a first characteristic vector of the input training image;
the region suggestion module is used for inputting the first feature vector into a region suggestion network of the neural network so as to detect each foreground region and each background region in the input training image;
the region-of-interest alignment module is used for searching each first sub-feature vector corresponding to each foreground region and each background region in the first feature vector according to the corresponding region of each foreground region and each background region in the input training image, and performing first interpolation processing on each first sub-feature vector corresponding to each foreground region and each background region respectively to obtain each second sub-feature vector with a first fixed size; searching each first sub-feature vector corresponding to each target area in the first feature vectors according to the corresponding area of each target area in the input training image obtained by the target detection module, and splicing each first sub-feature vector corresponding to all target areas into a third feature vector;
the target detection module is used for inputting the second sub-feature vectors into a target detection network of the neural network to obtain detection target areas and detection target types of the detection target areas in the input training image;
the self-adaptive global average pooling processing module is used for carrying out self-adaptive global average pooling processing on the first feature vector to obtain a fourth feature vector; the dimensionality of the fourth feature vector is the same as that of the third feature vector;
the feature fusion module is used for superposing the fourth feature vector and the third feature vector to obtain a fifth feature vector;
the class identification module is used for inputting a fifth feature vector into a target classification network of the neural network to obtain each detection target class contained in the input training image;
the adjusting module is used for calculating by adopting a preset first loss function according to the detection target areas and the detection target types of the detection target areas obtained by the target detection module, the marking target areas marked in the input training image and the corresponding marking target types to obtain a first prediction deviation; calculating by adopting a preset second loss function according to each detection target category contained in the input training image and each labeled target category of each labeled target area labeled in the input training image obtained by the category identification module to obtain a second prediction deviation; carrying out weighted summation on the first prediction deviation and the second prediction deviation, and adjusting parameters of the neural network according to the weighted summation; when the neural network converges, the neural network at that time is taken as the neural network to be finally used.
10. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of the target class identification method of any one of claims 1 to 7.
CN202111652406.4A 2021-12-31 2021-12-31 Target class identification method and device and readable storage medium Active CN114004963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111652406.4A CN114004963B (en) 2021-12-31 2021-12-31 Target class identification method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111652406.4A CN114004963B (en) 2021-12-31 2021-12-31 Target class identification method and device and readable storage medium

Publications (2)

Publication Number Publication Date
CN114004963A true CN114004963A (en) 2022-02-01
CN114004963B CN114004963B (en) 2022-03-29

Family

ID=79932322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111652406.4A Active CN114004963B (en) 2021-12-31 2021-12-31 Target class identification method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN114004963B (en)

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897730A (en) * 2016-12-30 2017-06-27 陕西师范大学 SAR target model recognition methods based on fusion classification information with locality preserving projections
CN108268814A (en) * 2016-12-30 2018-07-10 广东精点数据科技股份有限公司 A kind of face identification method and device based on the fusion of global and local feature Fuzzy
CN108509891A (en) * 2018-03-27 2018-09-07 斑马网络技术有限公司 Image labeling method, device, storage medium and electronic equipment
CN109165644A (en) * 2018-07-13 2019-01-08 北京市商汤科技开发有限公司 Object detection method and device, electronic equipment, storage medium, program product
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN109886933A (en) * 2019-01-25 2019-06-14 腾讯科技(深圳)有限公司 A kind of medical image recognition method, apparatus and storage medium
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module
CN110580487A (en) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 Neural network training method, neural network construction method, image processing method and device
US10636148B1 (en) * 2016-05-20 2020-04-28 Ccc Information Services Inc. Image processing system to detect contours of an object in a target object image
CN111091140A (en) * 2019-11-20 2020-05-01 南京旷云科技有限公司 Object classification method and device and readable storage medium
CN111640125A (en) * 2020-05-29 2020-09-08 广西大学 Mask R-CNN-based aerial photograph building detection and segmentation method and device
CN111833306A (en) * 2020-06-12 2020-10-27 北京百度网讯科技有限公司 Defect detection method and model training method for defect detection
CN111881849A (en) * 2020-07-30 2020-11-03 Oppo广东移动通信有限公司 Image scene detection method and device, electronic equipment and storage medium
US20200401812A1 (en) * 2018-07-13 2020-12-24 Tencent Technology (Shenzhen) Company Limited Method and system for detecting and recognizing target in real-time video, storage medium, and device
CN112149693A (en) * 2020-10-16 2020-12-29 上海智臻智能网络科技股份有限公司 Training method of contour recognition model and detection method of target object
CN112257758A (en) * 2020-09-27 2021-01-22 浙江大华技术股份有限公司 Fine-grained image recognition method, convolutional neural network and training method thereof
WO2021056705A1 (en) * 2019-09-23 2021-04-01 平安科技(深圳)有限公司 Method for detecting damage to outside of human body on basis of semantic segmentation network, and related device
CN112699855A (en) * 2021-03-23 2021-04-23 腾讯科技(深圳)有限公司 Image scene recognition method and device based on artificial intelligence and electronic equipment
CN112990432A (en) * 2021-03-04 2021-06-18 北京金山云网络技术有限公司 Target recognition model training method and device and electronic equipment
CN113269257A (en) * 2021-05-27 2021-08-17 中山大学孙逸仙纪念医院 Image classification method and device, terminal equipment and storage medium
CN113705293A (en) * 2021-02-26 2021-11-26 腾讯科技(深圳)有限公司 Image scene recognition method, device, equipment and readable storage medium
CN113762049A (en) * 2021-05-11 2021-12-07 腾讯科技(深圳)有限公司 Content identification method and device, storage medium and terminal equipment
CN113780270A (en) * 2021-03-23 2021-12-10 京东鲲鹏(江苏)科技有限公司 Target detection method and device

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10636148B1 (en) * 2016-05-20 2020-04-28 Ccc Information Services Inc. Image processing system to detect contours of an object in a target object image
CN108268814A (en) * 2016-12-30 2018-07-10 广东精点数据科技股份有限公司 A kind of face identification method and device based on the fusion of global and local feature Fuzzy
CN106897730A (en) * 2016-12-30 2017-06-27 陕西师范大学 SAR target model recognition methods based on fusion classification information with locality preserving projections
CN108509891A (en) * 2018-03-27 2018-09-07 斑马网络技术有限公司 Image labeling method, device, storage medium and electronic equipment
CN110580487A (en) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 Neural network training method, neural network construction method, image processing method and device
US20200401812A1 (en) * 2018-07-13 2020-12-24 Tencent Technology (Shenzhen) Company Limited Method and system for detecting and recognizing target in real-time video, storage medium, and device
CN109165644A (en) * 2018-07-13 2019-01-08 北京市商汤科技开发有限公司 Object detection method and device, electronic equipment, storage medium, program product
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN109886933A (en) * 2019-01-25 2019-06-14 腾讯科技(深圳)有限公司 A kind of medical image recognition method, apparatus and storage medium
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module
WO2021056705A1 (en) * 2019-09-23 2021-04-01 平安科技(深圳)有限公司 Method for detecting damage to outside of human body on basis of semantic segmentation network, and related device
CN111091140A (en) * 2019-11-20 2020-05-01 南京旷云科技有限公司 Object classification method and device and readable storage medium
CN111640125A (en) * 2020-05-29 2020-09-08 广西大学 Mask R-CNN-based aerial photograph building detection and segmentation method and device
CN111833306A (en) * 2020-06-12 2020-10-27 北京百度网讯科技有限公司 Defect detection method and model training method for defect detection
CN111881849A (en) * 2020-07-30 2020-11-03 Oppo广东移动通信有限公司 Image scene detection method and device, electronic equipment and storage medium
CN112257758A (en) * 2020-09-27 2021-01-22 浙江大华技术股份有限公司 Fine-grained image recognition method, convolutional neural network and training method thereof
CN112149693A (en) * 2020-10-16 2020-12-29 上海智臻智能网络科技股份有限公司 Training method of contour recognition model and detection method of target object
CN113705293A (en) * 2021-02-26 2021-11-26 腾讯科技(深圳)有限公司 Image scene recognition method, device, equipment and readable storage medium
CN112990432A (en) * 2021-03-04 2021-06-18 北京金山云网络技术有限公司 Target recognition model training method and device and electronic equipment
CN112699855A (en) * 2021-03-23 2021-04-23 腾讯科技(深圳)有限公司 Image scene recognition method and device based on artificial intelligence and electronic equipment
CN113780270A (en) * 2021-03-23 2021-12-10 京东鲲鹏(江苏)科技有限公司 Target detection method and device
CN113762049A (en) * 2021-05-11 2021-12-07 腾讯科技(深圳)有限公司 Content identification method and device, storage medium and terminal equipment
CN113269257A (en) * 2021-05-27 2021-08-17 中山大学孙逸仙纪念医院 Image classification method and device, terminal equipment and storage medium

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
BENJAMIN BISCHKE等: ""Global-Local Feature Fusion for Image Classification of Flood Affected Roads from Social Multimedia"", 《MEDIAEVAL》 *
WANG J等: ""Collaborative learning for weakly supervised object detection"", 《ARXIV》 *
XUELING WEI等: ""Medical hyperspectral image classification based on end-to-end fusion deep neural network"", 《IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT》 *
YAO H等: ""Coarse-to-Fine Description for Fine-Grained Visual Categorization"", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
尹红等: ""选择性卷积特征融合的花卉图像分类"", 《中国图象图形学报》 *
李祥霞等: ""细粒度图像分类的深度学习方法"", 《计算机科学与探索》 *
杨丹等: ""基于注意力机制的细粒度图像分类算法"", 《西南科技大学学报》 *
赵浩如等: ""基于RPN与B-CNN的细粒度图像分类算法研究"", 《计算机应用与软件》 *
郭璠等: ""YOLOv3:基于注意力机制的交通标志检测网络"", 《通信学报》 *

Also Published As

Publication number Publication date
CN114004963B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
KR102008973B1 (en) Apparatus and Method for Detection defect of sewer pipe based on Deep Learning
CN112581463B (en) Image defect detection method and device, electronic equipment, storage medium and product
CN109858367B (en) Visual automatic detection method and system for worker through supporting unsafe behaviors
CN107808133B (en) Unmanned aerial vehicle line patrol-based oil and gas pipeline safety monitoring method and system and software memory
CN110264444B (en) Damage detection method and device based on weak segmentation
CN112446870B (en) Pipeline damage detection method, device, equipment and storage medium
Biasutti et al. Lu-net: An efficient network for 3d lidar point cloud semantic segmentation based on end-to-end-learned 3d features and u-net
CN110992349A (en) Underground pipeline abnormity automatic positioning and identification method based on deep learning
CN102682428B (en) Fingerprint image computer automatic mending method based on direction fields
CN113822880A (en) Crack identification method based on deep learning
CN109085174A (en) Display screen peripheral circuit detection method, device, electronic equipment and storage medium
Moradi et al. Real-time defect detection in sewer closed circuit television inspection videos
CN111462140B (en) Real-time image instance segmentation method based on block stitching
CN113962951B (en) Training method and device for detecting segmentation model, and target detection method and device
CN112198170A (en) Detection method for identifying water drops in three-dimensional detection of outer surface of seamless steel pipe
Fan et al. Application of YOLOv5 neural network based on improved attention mechanism in recognition of Thangka image defects
Peng et al. Research on oil leakage detection in power plant oil depot pipeline based on improved YOLO v5
Rayhana et al. Automated defect-detection system for water pipelines based on CCTV inspection videos of autonomous robotic platforms
CN114120086A (en) Pavement disease recognition method, image processing model training method, device and electronic equipment
CN109102486B (en) Surface defect detection method and device based on machine learning
CN114004963B (en) Target class identification method and device and readable storage medium
CN113469938A (en) Pipe gallery video analysis method and system based on embedded front-end processing server
CN114004838B (en) Target class identification method, training method and readable storage medium
Chen et al. Deep learning based underground sewer defect classification using a modified RegNet
CN116432078A (en) Building electromechanical equipment monitoring system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant