CN115471499A - Image target detection and segmentation method, system, storage medium and electronic equipment - Google Patents

Image target detection and segmentation method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN115471499A
CN115471499A CN202211281775.1A CN202211281775A CN115471499A CN 115471499 A CN115471499 A CN 115471499A CN 202211281775 A CN202211281775 A CN 202211281775A CN 115471499 A CN115471499 A CN 115471499A
Authority
CN
China
Prior art keywords
image
original
training image
training
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211281775.1A
Other languages
Chinese (zh)
Inventor
袁铭康
李叶
许乐乐
徐金中
郭丽丽
马忠松
金山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technology and Engineering Center for Space Utilization of CAS
Original Assignee
Technology and Engineering Center for Space Utilization of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technology and Engineering Center for Space Utilization of CAS filed Critical Technology and Engineering Center for Space Utilization of CAS
Priority to CN202211281775.1A priority Critical patent/CN115471499A/en
Publication of CN115471499A publication Critical patent/CN115471499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image target detection and segmentation method, a system, a storage medium and an electronic device, comprising: training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model; and inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected. The invention enhances the expression ability of the model to the image target and improves the target detection and segmentation precision of the object in the image by training the improved deep learning model.

Description

Image target detection and segmentation method, system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image target detection and segmentation method, system, storage medium, and electronic device.
Background
With the development and maturity of computer vision technology, it is widely applied to various fields. Object detection and segmentation is one of the important issues in computer vision research, and is an important basis for understanding high-level semantic features of images, with the task of returning rectangular bounding box coordinates or regions of single or multiple specific objects in a given image. Existing image target detection and segmentation algorithms can be generally classified into two categories: one is a two-stage model, such as Faster R-CNN, which extracts candidate regions independently, and first screens out candidate regions in which objects may exist from an input image, determines whether a target exists in the candidate regions, and then outputs a target type, a position feature, or a segmentation region. The other is a one-stage model, such as YOLO, which does not extract candidate regions independently, and directly inputs images to obtain object classes, corresponding position features or segmentation regions existing in the images. However, the above algorithms all have the problem of low precision of target detection and segmentation.
Therefore, it is desirable to provide a technical solution to solve the above technical problems.
Disclosure of Invention
In order to solve the technical problem, the invention provides an image target detection and segmentation method, an image target detection and segmentation system, a storage medium and electronic equipment.
The technical scheme of the image target detection and segmentation method is as follows:
s1, training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model;
s2, inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected.
The image target detection and segmentation method has the following beneficial effects:
the method of the invention enhances the expression ability of the model to the image target and improves the target detection and segmentation precision of the object in the image by training the improved deep learning model.
On the basis of the scheme, the image target detection and segmentation method can be further improved as follows.
Further, the preset deep learning model includes: an original backbone network, an original neck network, and a plurality of original header networks; before S1, further comprising:
s01, labeling each original training image by adopting at least one labeling mode to obtain at least one labeled training image corresponding to each original training image;
the S1 comprises:
s11, inputting any original training image into the original backbone network for multi-scale image feature extraction to obtain a first training image corresponding to any original training image;
s12, inputting a first training image corresponding to any original training image into the original neck network for image feature extraction to obtain a second training image corresponding to any original training image;
s13, respectively inputting a second training image corresponding to any original training image into each original head network for prediction to obtain a training prediction result of any original training image in each original head network until a training prediction result of each original training image in each original head network is obtained;
and S14, loss calculation is carried out on all training prediction results corresponding to each original training image and at least one kind of labeled training image, the preset deep learning model is obtained and optimized according to the loss calculation result, the optimized preset deep learning model is used as the preset deep learning model and returns to the step S11 for iterative training, and when the preset deep learning model converges, the optimized preset deep learning model corresponding to the preset deep learning model converging is determined as the target deep learning model.
Further, the original backbone network comprises: the output end of each first convolution layer is correspondingly connected with the input end of one first down-sampling layer; the S11 comprises:
inputting any original training image to a first convolution layer of the original backbone network, sequentially passing through all the first convolution layers and all the first down-sampling layers, and performing multi-scale image feature extraction on any original training image to obtain a first training image corresponding to any original training image.
Further, the original neck network comprises: the output end of each second convolution layer is correspondingly connected with the input end of one first up-sampling layer; the S12 includes:
and inputting the first training image corresponding to any original training image to a first second convolution layer of the original neck network, sequentially passing through all second convolution layers and all first up-sampling layers, and performing image feature extraction on the first training image corresponding to any original training image to obtain a second training image corresponding to any original training image.
Further, the plurality of original header networks comprises: presetting a target classification head network, a target detection head network, an image segmentation head network and a central skeleton head network; the training prediction result of any original training image comprises: a first training prediction result, a second training prediction result, a third training prediction result, and a fourth training prediction result; the step of inputting the second training image corresponding to any original training image into each original head network respectively for prediction to obtain a training prediction result of any original training image in each original head network includes:
inputting a second training image corresponding to any original training image into the preset target classification head network for prediction to obtain a first training prediction result obtained by performing target classification on any original training image;
inputting a second training image corresponding to any original training image into the preset target detection head network for prediction to obtain a second training prediction result obtained by performing target detection on any original training image;
inputting a second training image corresponding to any original training image into the preset image segmentation head network for prediction to obtain a third training prediction result obtained by performing image segmentation on any original training image;
inputting a second training image corresponding to any original training image into the preset central skeleton head network for prediction, and obtaining a fourth training prediction result obtained by performing central skeleton extraction on any original training image.
The beneficial effect of adopting the further technical scheme is that: by adding the central skeleton network into the head network of the deep learning model, the target detection head network and the image segmentation head network can be helped to acquire more characteristics of object forms, so that the accuracy of object detection and image segmentation is improved.
Further, the step of labeling any original training image to obtain at least one labeled training image corresponding to any original training image includes:
labeling each object in any original training image based on the class of the object to obtain a first labeled training image containing labeled class information of each object;
labeling each object in any original training image based on the position of the object to obtain a second labeling training image containing the position information of each object;
masking each object in any original training image to obtain a third labeling training image containing mask information of each object;
and acquiring the central skeleton of each object in the third annotation training image corresponding to any original training image, and arranging all the central skeletons according to a preset arrangement sequence to obtain a fourth annotation training image.
The beneficial effect of adopting the further technical scheme is that: the central skeleton is extracted from the object example in the training image and is expressed as the point array arranged according to the set sequence, so that the object retains the characteristics related to the shape, the expression capability of the model to the image target is enhanced, the model is more favorable for learning the characteristic relation related to the object, and the detection and segmentation precision of the object is improved.
Further, the target prediction result is: and the target detection result of the image to be detected and the image segmentation result of the image to be detected.
The technical scheme of the image target detection and segmentation system is as follows:
the method comprises the following steps: a processing module and an operation module;
the processing module is used for: training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model;
the operation module is used for: and inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected.
The image target detection and segmentation system has the following beneficial effects:
the system of the invention enhances the expression capability of the model to the image target and improves the target detection and segmentation precision of the object in the image by training the improved deep learning model.
On the basis of the scheme, the image target detection and segmentation system can be further improved as follows.
Further, the preset deep learning model includes: an original backbone network, an original neck network, and a plurality of original header networks; before the processing module, the method further comprises the following steps: a labeling module;
the labeling module is used for: labeling each original training image by adopting at least one labeling mode to obtain at least one labeled training image corresponding to each original training image;
the processing module comprises: a first processing module, a second processing module, a third processing module and a fourth processing module,
The first processing module is configured to: inputting any original training image into the original backbone network for multi-scale image feature extraction to obtain a first training image corresponding to any original training image;
the second processing module is configured to: inputting a first training image corresponding to any original training image into the original neck network for image feature extraction to obtain a second training image corresponding to any original training image;
the third processing module is configured to: inputting a second training image corresponding to any original training image into each original head network respectively for prediction to obtain a training prediction result of any original training image in each original head network until obtaining a training prediction result of each original training image in each original head network respectively;
the fourth processing module is configured to: performing loss calculation based on all training prediction results corresponding to each original training image and at least one labeled training image to obtain and optimize the preset deep learning model according to the loss calculation result, taking the optimized preset deep learning model as the preset deep learning model and returning to call the first processing module for iterative training until the preset deep learning model is converged, and determining the optimized preset deep learning model corresponding to the convergence of the preset deep learning model as the target deep learning model.
Further, the original backbone network comprises: the output end of each first convolution layer is correspondingly connected with the input end of one first down-sampling layer; the first processing module is specifically configured to:
inputting any original training image to a first convolution layer of the original backbone network, sequentially passing through all the first convolution layers and all the first down-sampling layers, and performing multi-scale image feature extraction on any original training image to obtain a first training image corresponding to any original training image.
Further, the original neck network comprises: the output end of each second convolution layer is correspondingly connected with the input end of one first up-sampling layer; the second processing module is specifically configured to:
inputting the first training image corresponding to any original training image to a first second convolution layer of the original neck network, sequentially passing through all second convolution layers and all first up-sampling layers, and performing image feature extraction on the first training image corresponding to any original training image to obtain a second training image corresponding to any original training image.
Further, the plurality of original header networks comprises: presetting a target classification head network, a target detection head network, an image segmentation head network and a central skeleton head network; the training prediction result of any original training image comprises: a first training prediction result, a second training prediction result, a third training prediction result, and a fourth training prediction result;
the third processing module is specifically configured to:
inputting a second training image corresponding to any original training image into the preset target classification head network for prediction to obtain a first training prediction result obtained by performing target classification on any original training image;
inputting a second training image corresponding to any original training image into the preset target detection head network for prediction to obtain a second training prediction result obtained by performing target detection on any original training image;
inputting a second training image corresponding to any original training image into the preset image segmentation head network for prediction to obtain a third training prediction result obtained by performing image segmentation on any original training image;
inputting a second training image corresponding to any original training image into the preset central skeleton head network for prediction, and obtaining a fourth training prediction result obtained by performing central skeleton extraction on any original training image.
The beneficial effect of adopting the further technical scheme is that: by adding the central skeleton network into the head network of the deep learning model, the target detection head network and the image segmentation head network can be helped to acquire more characteristics of object forms, so that the accuracy of object detection and image segmentation is improved.
Further, the labeling module is specifically configured to:
labeling each object in any original training image based on the class of the object to obtain a first labeled training image containing labeled class information of each object;
labeling each object in any original training image based on the position of the object to obtain a second labeled training image containing the position information of each object;
masking each object in any original training image to obtain a third labeling training image containing mask information of each object;
and acquiring the central skeleton of each object in the third labeling training image corresponding to any original training image, and arranging all the central skeletons according to a preset arrangement sequence to obtain a fourth labeling training image.
The beneficial effect of adopting the further technical scheme is that: by extracting the central skeleton from the object example in the training image and expressing the central skeleton as a point array arranged according to a set sequence, the object retains the characteristics related to the shape thereof, the expression capability of the model to the image target is enhanced, the model is more favorable for learning the characteristic relationship related to the object, and the detection and segmentation precision of the object is improved.
Further, the target prediction result is: and the target detection result of the image to be detected and the image segmentation result of the image to be detected.
The technical scheme of the storage medium of the invention is as follows:
the storage medium has stored therein instructions which, when read by a computer, cause the computer to perform the steps of the image object detection and segmentation method according to the invention.
The technical scheme of the electronic equipment is as follows:
comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, causes the computer to perform the steps of the image object detection and segmentation method according to the invention.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting and segmenting an image target according to an embodiment of the present invention;
FIG. 2 is a structural diagram of a preset deep learning model in the image target detection and segmentation method according to the embodiment of the present invention;
fig. 3 is a schematic structural diagram of an image target detection and segmentation system according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the image target detecting and segmenting method according to the embodiment of the present invention includes the following steps:
s1, training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model.
Wherein, the original training image is an image containing a strip-shaped object; for example: worms, vines, bridges, trees, and the like.
The preset deep learning model comprises three parts: the system comprises a backbone network, a neck network and a head network, wherein a preset deep learning model is an untrained deep learning model and can be used for carrying out target detection and segmentation on objects (especially strip-shaped objects) in an image. The target deep learning model is as follows: and (5) training the deep learning model.
Specifically, a plurality of original training images containing at least one strip-shaped object are obtained, each original training image is respectively input into a preset deep learning model for iterative training, the expression capacity of the model on an image target is continuously improved, and the target deep learning model for image target detection and segmentation is obtained until the preset deep learning model converges.
S2, inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected.
Wherein, the image to be measured is: randomly selected images. The target prediction results include: and the target detection result of the image to be detected and the image segmentation result of the image to be detected.
It should be noted that the target detection result is: and carrying out target detection on the image to be detected to obtain the category and the position of each object in the image to be detected. The image segmentation result is: and carrying out image segmentation on the image to be detected to obtain an image area corresponding to each object.
Preferably, the preset deep learning model comprises: an original backbone network, an original neck network, and a plurality of original header networks.
In the present embodiment, the model structure of the preset deep learning model is as shown in fig. 2.
The original backbone network is an untrained backbone network, and can be used for extracting multi-scale features of the image, and the original backbone network can adopt the following steps: and mature backbone networks such as a ResNet50 network and a ResNet101 network. In this embodiment, the original backbone network includes: a plurality of first convolution layers and a plurality of first downsampling layers.
Wherein the original neck network is an untrained neck network that can be used to extract image features; the original neck network may employ, for example, a FPN network. In this embodiment, the original neck network comprises: at least one second convolutional layer and at least one first upsampling layer.
Wherein the original head network is an untrained head network that can be used to predict the image. The plurality of original header networks includes: the method comprises the steps of presetting a target classification head network, presetting a target detection head network, presetting an image segmentation head network and presetting a center skeleton head network.
Before S1, further comprising:
and S01, labeling each original training image by adopting at least one labeling mode to obtain at least one labeled training image corresponding to each original training image.
Wherein, at least one labeling mode comprises: the method comprises the steps of marking the type of an object in an image, marking the position of the object in the image, marking a mask of the object in the image and extracting a central skeleton of the mask of the object in the marked image.
Specifically, the step of labeling any original training image to obtain at least one labeled training image corresponding to any original training image includes:
and labeling each object in any original training image based on the class of the object to obtain a first labeled training image containing labeled class information of each object in any original training image.
Wherein, the annotation category information includes but is not limited to: and (4) selecting the objects to form rectangular frames, wherein the rectangular frames corresponding to each category are displayed in different colors. The first labeled training image is: an image containing a rectangular frame corresponding to each object.
And labeling each object in any original training image based on the position of the object to obtain a second labeled training image containing the position information of each object.
Wherein the location information includes but is not limited to: the image including the position of each object may be the same as or different from the annotation category information, and is not limited thereto.
And masking each object in any original training image to obtain a third labeling training image containing mask information of each object.
The process of masking the image is the prior art, and is not described herein in detail.
And acquiring the central skeleton of each object in the third labeling training image corresponding to any original training image, and arranging all the central skeletons according to a preset arrangement sequence to obtain a fourth labeling training image.
Specifically, a third labeling training image after mask processing is utilized to extract a central skeleton of each object in the third labeling training image, and the extracted central skeletons of all the objects are represented in a point array form arranged according to a preset arrangement sequence. The point array can be a plurality of equidistant points or non-equidistant points on the central line, and if the point array is a strip-shaped object, the head and the tail of the object have a sequence. The preset arrangement sequence can be from head to tail or from tail to head according to the actual situation.
It should be noted that, the above only describes the process of labeling any original training image, and the rest of the original training images may all adopt the process of labeling the images, which is not always described herein.
The S1 comprises:
s11, inputting any original training image into the original backbone network for multi-scale image feature extraction, and obtaining a first training image corresponding to any original training image.
Wherein the first training image is: and (4) carrying out multi-scale image feature extraction on the original training image to obtain an image.
In this embodiment, the original backbone network takes two first convolution layers and two first down-sampling layers as an example. The output end of each first convolution layer is correspondingly connected with the input end of one first down-sampling layer.
Specifically, any original training image is input to a first convolution layer of the original backbone network, and the first convolution layer, a second convolution layer and a second downsampling layer are sequentially arranged, so that multi-scale image features of any original training image are extracted, and a first training image corresponding to any original training image is obtained.
It should be noted that, the process of extracting image features through the convolutional layer and the downsampling layer is the prior art, and is not described herein in detail.
And S12, inputting the first training image corresponding to any original training image into the original neck network for image feature extraction to obtain a second training image corresponding to any original training image.
Wherein the second training image is: and the first training image is an image obtained by performing image feature extraction on the original neck network.
In this embodiment, the original neck network is exemplified by a second convolutional layer and a first upsampling layer. The output end of the first convolution layer is connected with the input end of the first up-sampling layer.
Specifically, a first training image corresponding to any original training image is input to a second convolution layer of the original neck network, and image feature extraction is performed on the first training image corresponding to any original training image through a first up-sampling layer, so that a second training image corresponding to any original training image is obtained.
It should be noted that, the process of extracting image features through the convolution layer and the upsampling layer is the prior art, and is not described herein in detail.
S13, inputting the second training image corresponding to any original training image into each original head network respectively for prediction, and obtaining a training prediction result of any original training image in each original head network until obtaining a training prediction result of each original training image in each original head network respectively.
Wherein the training prediction result of any original training image comprises: a first training prediction result, a second training prediction result, a third training prediction result, and a fourth training prediction result.
Specifically, a second training image corresponding to any original training image is input into the preset target classification head network for prediction, and a first training prediction result obtained by performing target classification on any original training image is obtained.
The preset target classification head network comprises at least one first full connection layer and is used for carrying out target classification on the image.
Inputting a second training image corresponding to any original training image into the preset target detection head network for prediction, and obtaining a second training prediction result obtained by performing target detection on any original training image.
The preset target detection head network comprises at least one second full connection layer and is used for carrying out target detection on the image.
Inputting a second training image corresponding to any original training image into the preset image segmentation head network for prediction, and obtaining a third training prediction result obtained by performing image segmentation on any original training image.
The preset image segmentation head network comprises at least one third convolution layer and is used for carrying out image segmentation on the image.
Inputting a second training image corresponding to any original training image into the preset central skeleton head network for prediction, and obtaining a fourth training prediction result obtained by performing central skeleton extraction on any original training image.
The preset central skeleton head network comprises at least one fourth convolution layer and at least one third full-connection layer and is used for extracting the central skeleton of the image.
It should be noted that both the preset center skeleton head network and the preset target detection head network predict points, and therefore the same network structure may be adopted.
And S14, performing loss calculation based on all training prediction results corresponding to each original training image and at least one labeled training image to obtain and optimize the preset deep learning model according to the loss calculation result, taking the optimized preset deep learning model as the preset deep learning model and returning to the step S11 for iterative training until the preset deep learning model is converged, and determining the optimized preset deep learning model corresponding to the preset deep learning model when the preset deep learning model is converged as the target deep learning model.
The preset deep learning model convergence means that the error between the predicted value and the true value obtained through the model is smaller than a preset threshold value, and the preset threshold value can be set according to user requirements.
The loss calculation process is the prior art, for example, cross entropy loss is adopted, the main process is to substitute the true value and the model prediction value into a loss function to calculate the difference between the true value and the model prediction value, and the lower the loss value is, the better the prediction effect of the deep learning model is.
Wherein, the target deep learning model comprises: a target backbone network, a target neck network, and a plurality of target head networks; the plurality of target header networks includes: the method comprises the steps of training a target classification head network, training a target detection head network, training an image segmentation head network and a training center skeleton head network.
According to the technical scheme, the improved deep learning model is trained, so that the expression capability of the model on the image target is enhanced, and the target detection and segmentation precision of the object in the image is improved.
As shown in fig. 3, the image target detecting and segmenting system 200 according to the embodiment of the present invention includes: a processing module 210 and an execution module 220;
the processing module 210 is configured to: training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model;
the operation module 220 is configured to: and inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected.
Preferably, the preset deep learning model comprises: an original backbone network, an original neck network, and a plurality of original header networks; before the processing module 210, further comprising: a labeling module;
the labeling module is used for: labeling each original training image by adopting at least one labeling mode to obtain at least one labeled training image corresponding to each original training image;
the processing module 210 includes: a first processing module 211, a second processing module 212, a third processing module 213, and a fourth processing module 214;
the first processing module 211 is configured to: inputting any original training image into the original backbone network to perform multi-scale image feature extraction to obtain a first training image corresponding to any original training image;
the second processing module 212 is configured to: inputting a first training image corresponding to any original training image into the original neck network for image feature extraction to obtain a second training image corresponding to any original training image;
the third processing module 213 is configured to: inputting a second training image corresponding to any original training image into each original head network respectively for prediction to obtain a training prediction result of any original training image in each original head network until a training prediction result of each original training image in each original head network is obtained;
the fourth processing module 214 is configured to: and performing loss calculation on all training prediction results corresponding to each original training image and at least one kind of marked training image to obtain and optimize the preset deep learning model according to the loss calculation result, taking the optimized preset deep learning model as the preset deep learning model, returning and calling the first processing module to perform iterative training until the preset deep learning model is converged, and determining the optimized preset deep learning model corresponding to the preset deep learning model in convergence as the target deep learning model.
Preferably, the original backbone network comprises: the output end of each first convolution layer is correspondingly connected with the input end of one first down-sampling layer; the first processing module 211 is specifically configured to:
inputting any original training image to a first convolution layer of the original backbone network, sequentially passing through all the first convolution layers and all the first down-sampling layers, and performing multi-scale image feature extraction on any original training image to obtain a first training image corresponding to any original training image.
Preferably, the original neck network comprises: the output end of each second convolution layer is correspondingly connected with the input end of one first up-sampling layer; the second processing module 212 is specifically configured to:
inputting the first training image corresponding to any original training image to a first second convolution layer of the original neck network, sequentially passing through all second convolution layers and all first up-sampling layers, and performing image feature extraction on the first training image corresponding to any original training image to obtain a second training image corresponding to any original training image.
Preferably, the plurality of original header networks comprises: presetting a target classification head network, a target detection head network, an image segmentation head network and a central skeleton head network; the training prediction result of any original training image comprises: a first training prediction result, a second training prediction result, a third training prediction result, and a fourth training prediction result;
the third processing module 213 is specifically configured to:
inputting a second training image corresponding to any original training image into the preset target classification head network for prediction to obtain a first training prediction result obtained by performing target classification on any original training image;
inputting a second training image corresponding to any original training image into the preset target detection head network for prediction to obtain a second training prediction result obtained by performing target detection on any original training image;
inputting a second training image corresponding to any original training image into the preset image segmentation head network for prediction to obtain a third training prediction result obtained by performing image segmentation on any original training image;
inputting a second training image corresponding to any original training image into the preset central skeleton head network for prediction, and obtaining a fourth training prediction result obtained by performing central skeleton extraction on any original training image.
Preferably, the labeling module is specifically configured to:
labeling each object in any original training image based on the class of the object to obtain a first labeling training image containing labeling class information of each object;
labeling each object in any original training image based on the position of the object to obtain a second labeled training image containing the position information of each object;
masking each object in any original training image to obtain a third labeling training image containing mask information of each object;
and acquiring the central skeleton of each object in the third labeling training image corresponding to any original training image, and arranging all the central skeletons according to a preset arrangement sequence to obtain a fourth labeling training image.
Preferably, the target prediction result is: and the target detection result of the image to be detected and the image segmentation result of the image to be detected.
According to the technical scheme, the improved deep learning model is trained, so that the expression capability of the model on the image target is enhanced, and the target detection and segmentation precision of the object in the image is improved.
The above steps for implementing the corresponding functions of the parameters and modules in the image target detection and segmentation system 200 of the present embodiment may refer to the parameters and steps in the above embodiments of the image target detection and segmentation method, which are not described herein again.
An embodiment of the present invention provides a storage medium, including: the storage medium stores instructions, and when the computer reads the instructions, the computer executes steps such as the image target detection and segmentation method, which may specifically refer to the parameters and steps in the above embodiments of the image target detection and segmentation method, and are not described herein again.
Computer storage media such as: flash disks, portable hard disks, and the like.
An electronic device provided in an embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that when the processor executes the computer program, the computer executes steps such as an image target detection and segmentation method, which may specifically refer to each parameter and step in the above embodiment of the image target detection and segmentation method, and are not described herein again.
Those skilled in the art will appreciate that the present invention may be embodied as methods, systems, storage media and electronic devices.
Thus, the present invention may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An image target detection and segmentation method is characterized by comprising the following steps:
s1, training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model;
s2, inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected.
2. The image target detection and segmentation method according to claim 1, wherein the predetermined deep learning model includes: an original backbone network, an original neck network, and a plurality of original header networks; before S1, further comprising:
s01, labeling each original training image by adopting at least one labeling mode to obtain at least one labeled training image corresponding to each original training image;
the S1 comprises:
s11, inputting any original training image into the original backbone network to perform multi-scale image feature extraction to obtain a first training image corresponding to any original training image;
s12, inputting a first training image corresponding to any original training image into the original neck network for image feature extraction to obtain a second training image corresponding to any original training image;
s13, respectively inputting a second training image corresponding to any original training image into each original head network for prediction to obtain a training prediction result of any original training image in each original head network until a training prediction result of each original training image in each original head network is obtained;
and S14, loss calculation is carried out on all training prediction results corresponding to each original training image and at least one kind of labeled training image, the preset deep learning model is obtained and optimized according to the loss calculation result, the optimized preset deep learning model is used as the preset deep learning model and returns to the step S11 for iterative training, and when the preset deep learning model converges, the optimized preset deep learning model corresponding to the preset deep learning model converging is determined as the target deep learning model.
3. The image object detection and segmentation method of claim 2 wherein the original backbone network comprises: the output end of each first convolution layer is correspondingly connected with the input end of one first down-sampling layer; the S11 comprises:
inputting any original training image to a first convolution layer of the original backbone network, sequentially passing through all the first convolution layers and all the first down-sampling layers, and performing multi-scale image feature extraction on any original training image to obtain a first training image corresponding to any original training image.
4. The image target detection and segmentation method of claim 2, wherein the original neck network comprises: the output end of each second convolution layer is correspondingly connected with the input end of one first up-sampling layer; the S12 includes:
inputting the first training image corresponding to any original training image to a first second convolution layer of the original neck network, sequentially passing through all second convolution layers and all first up-sampling layers, and performing image feature extraction on the first training image corresponding to any original training image to obtain a second training image corresponding to any original training image.
5. The image object detection and segmentation method of claim 2 wherein the plurality of original header networks comprises: presetting a target classification head network, a target detection head network, an image segmentation head network and a central skeleton head network; the training prediction result of any original training image comprises: a first training prediction result, a second training prediction result, a third training prediction result, and a fourth training prediction result; the step of inputting the second training image corresponding to any original training image into each original head network respectively for prediction to obtain a training prediction result of any original training image in each original head network includes:
inputting a second training image corresponding to any original training image into the preset target classification head network for prediction to obtain a first training prediction result obtained by performing target classification on any original training image;
inputting a second training image corresponding to any original training image into the preset target detection head network for prediction to obtain a second training prediction result obtained by performing target detection on any original training image;
inputting a second training image corresponding to any original training image into the preset image segmentation head network for prediction to obtain a third training prediction result obtained by performing image segmentation on any original training image;
inputting a second training image corresponding to any original training image into the preset central skeleton head network for prediction, and obtaining a fourth training prediction result obtained by performing central skeleton extraction on any original training image.
6. The image target detecting and segmenting method according to claim 2, wherein the step of labeling any original training image to obtain at least one labeled training image corresponding to the any original training image includes:
labeling each object in any original training image based on the class of the object to obtain a first labeling training image containing labeling class information of each object;
labeling each object in any original training image based on the position of the object to obtain a second labeled training image containing the position information of each object;
masking each object in any original training image to obtain a third labeling training image containing mask information of each object;
and acquiring the central skeleton of each object in the third annotation training image corresponding to any original training image, and arranging all the central skeletons according to a preset arrangement sequence to obtain a fourth annotation training image.
7. The image target detection and segmentation method of claim 1, wherein the target prediction result is: and the target detection result of the image to be detected and the image segmentation result of the image to be detected.
8. An image object detection and segmentation system, comprising: a processing module and an operation module;
the processing module is used for: training a preset deep learning model for image target detection and segmentation based on a plurality of original training images to obtain a target deep learning model;
the operation module is used for: and inputting the image to be detected into the target deep learning model to obtain a target prediction result of target detection and segmentation of the image to be detected.
9. A storage medium having stored therein instructions which, when read by a computer, cause the computer to execute the image object detection and segmentation method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, causes the computer to perform the image object detection and segmentation method according to any one of claims 1 to 7.
CN202211281775.1A 2022-10-19 2022-10-19 Image target detection and segmentation method, system, storage medium and electronic equipment Pending CN115471499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211281775.1A CN115471499A (en) 2022-10-19 2022-10-19 Image target detection and segmentation method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211281775.1A CN115471499A (en) 2022-10-19 2022-10-19 Image target detection and segmentation method, system, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115471499A true CN115471499A (en) 2022-12-13

Family

ID=84337267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211281775.1A Pending CN115471499A (en) 2022-10-19 2022-10-19 Image target detection and segmentation method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115471499A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022083123A1 (en) * 2020-10-19 2022-04-28 北京捷通华声科技股份有限公司 Certificate positioning method
CN114677596A (en) * 2022-05-26 2022-06-28 之江实验室 Remote sensing image ship detection method and device based on attention model
CN114782680A (en) * 2022-05-13 2022-07-22 北京地平线信息技术有限公司 Training method and device of target detection model, and target detection method and device
CN114937086A (en) * 2022-07-19 2022-08-23 北京鹰瞳科技发展股份有限公司 Training method and detection method for multi-image target detection and related products
CN114943682A (en) * 2022-02-25 2022-08-26 清华大学 Method and device for detecting anatomical key points in three-dimensional angiography image
CN114998582A (en) * 2022-05-10 2022-09-02 深圳市第二人民医院(深圳市转化医学研究院) Coronary artery blood vessel segmentation method, device and storage medium
CN115115947A (en) * 2022-07-14 2022-09-27 云南大学 Remote sensing image detection method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022083123A1 (en) * 2020-10-19 2022-04-28 北京捷通华声科技股份有限公司 Certificate positioning method
CN114943682A (en) * 2022-02-25 2022-08-26 清华大学 Method and device for detecting anatomical key points in three-dimensional angiography image
CN114998582A (en) * 2022-05-10 2022-09-02 深圳市第二人民医院(深圳市转化医学研究院) Coronary artery blood vessel segmentation method, device and storage medium
CN114782680A (en) * 2022-05-13 2022-07-22 北京地平线信息技术有限公司 Training method and device of target detection model, and target detection method and device
CN114677596A (en) * 2022-05-26 2022-06-28 之江实验室 Remote sensing image ship detection method and device based on attention model
CN115115947A (en) * 2022-07-14 2022-09-27 云南大学 Remote sensing image detection method and device, electronic equipment and storage medium
CN114937086A (en) * 2022-07-19 2022-08-23 北京鹰瞳科技发展股份有限公司 Training method and detection method for multi-image target detection and related products

Similar Documents

Publication Publication Date Title
CN107545262B (en) Method and device for detecting text in natural scene image
CN110991311B (en) Target detection method based on dense connection deep network
CN113361578B (en) Training method and device for image processing model, electronic equipment and storage medium
KR101782589B1 (en) Method for detecting texts included in an image and apparatus using the same
CN111368600A (en) Method and device for detecting and identifying remote sensing image target, readable storage medium and equipment
CN110675407B (en) Image instance segmentation method and device, electronic equipment and storage medium
CN111681273A (en) Image segmentation method and device, electronic equipment and readable storage medium
CN112560862B (en) Text recognition method and device and electronic equipment
CN111311634A (en) Face image detection method, device and equipment
CN113239818B (en) Table cross-modal information extraction method based on segmentation and graph convolution neural network
CN114022865A (en) Image processing method, apparatus, device and medium based on lane line recognition model
CN117058421A (en) Multi-head model-based image detection key point method, system, platform and medium
CN114842482B (en) Image classification method, device, equipment and storage medium
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN111259764A (en) Text detection method and device, electronic equipment and storage device
CN115471499A (en) Image target detection and segmentation method, system, storage medium and electronic equipment
CN115861255A (en) Model training method, device, equipment, medium and product for image processing
CN116824291A (en) Remote sensing image learning method, device and equipment
CN115879002A (en) Training sample generation method, model training method and device
CN114842476A (en) Watermark detection method and device and model training method and device
CN114399791A (en) Pedestrian detection method, device, equipment and medium
CN114638304A (en) Training method of image recognition model, image recognition method and device
CN114282559A (en) Optical code positioning method and device and image sensor chip
CN113887414A (en) Target detection method, target detection device, electronic equipment and storage medium
CN113887394A (en) Image processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination