CN112070793A - Target extraction method and device - Google Patents

Target extraction method and device Download PDF

Info

Publication number
CN112070793A
CN112070793A CN202010952509.1A CN202010952509A CN112070793A CN 112070793 A CN112070793 A CN 112070793A CN 202010952509 A CN202010952509 A CN 202010952509A CN 112070793 A CN112070793 A CN 112070793A
Authority
CN
China
Prior art keywords
image
target
mask
point
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010952509.1A
Other languages
Chinese (zh)
Inventor
王晓茹
徐培容
曲昭伟
张珩
谷嘉航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010952509.1A priority Critical patent/CN112070793A/en
Publication of CN112070793A publication Critical patent/CN112070793A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a target extraction method and a target extraction device, wherein the method comprises the following steps: acquiring an image and position information of a target to be extracted; the position information comprises position information of foreground points and position information of background points; calculating a foreground binary image according to the position information of the foreground point, and calculating a background binary image according to the position information of the background point; carrying out channel combination on the image, the foreground binary image and the background binary image; inputting the image after channel combination into a trained target semantic segmentation model to obtain a mask for distinguishing a target from a non-target in the image; the target semantic segmentation model is obtained by replacing at least Xception-65 in a Deeplab v3+ semantic segmentation network with a residual error network ResNet-101; and extracting the target in the image according to the mask and the image. According to the target extraction method, the speed and the precision of extracting the target are improved.

Description

Target extraction method and device
Technical Field
The present application relates to the field of image processing, and in particular, to a method and an apparatus for extracting a target.
Background
In some application scenarios, a user needs to extract a desired object from an image, for convenience of description, the object that the user needs to extract in the image is referred to as a target, and contents other than the target in the image are referred to as non-targets. Typically, a user clicks a preset first number of location points (referred to as foreground points for descriptive convenience) within a target in an image, the foreground points being used to represent the target. Meanwhile, the user clicks a preset second number of position points (referred to as background points for convenience of description) on the non-object of the image, and the background points are used for representing the non-object.
At present, a target is extracted by processing through a semantic segmentation model based on an image, a foreground point and a background point of the target to be extracted.
However, the accuracy and speed of extracting the target are low at present.
Disclosure of Invention
The application provides a target extraction method and a target extraction device, and aims to solve the problems of low precision and low speed of target extraction.
In order to achieve the above object, the present application provides the following technical solutions:
the application provides a target extraction method, which comprises the following steps:
acquiring an image and position information of a target to be extracted; the position information comprises position information of foreground points and position information of background points;
calculating a foreground binary image according to the position information of the foreground point;
calculating a background binary image according to the position information of the background points;
performing channel combination on the image, the foreground binary image and the background binary image to obtain an image after channel combination;
inputting the image after the channel combination into a trained target semantic segmentation model to obtain a mask for distinguishing a target from a non-target in the image; the target semantic segmentation model is obtained by replacing at least Xception-65 in a Deeplab v3+ semantic segmentation network with a residual error network ResNet-101;
and extracting the target in the image according to the mask and the image.
Optionally, the calculating a foreground binary image according to the position information of the foreground point includes:
in the image, taking each foreground point as a circle center and taking a preset radius as a circle respectively to obtain a circle corresponding to each foreground point;
and setting the pixel value of the pixel point in the area where the circle corresponding to each foreground point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the foreground point as 0 to obtain the foreground binary image.
Optionally, the calculating a background binary image according to the position information of the background point includes:
in the image, taking each background point as a circle center and taking the preset radius as a circle respectively to obtain a circle corresponding to each background point;
setting the pixel value of the pixel point in the area where the circle corresponding to each background point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the background point as 0 to obtain the background binary image.
Optionally, the number of channels of the final output network of the target semantic segmentation network is 2;
the mask comprises a first mask and a second mask; the value of any position point in the first mask represents the probability that the pixel point which is the same as the position point in the image belongs to the target; and the value of any position point in the second mask represents the probability that the pixel point which is the same as the position point in the image belongs to the non-target.
The extracting the target in the image according to the mask and the image comprises:
comparing the values of the target position points in the first mask and the second mask to obtain a comparison result; the target position points are the same position points;
in the case that the comparison result indicates: under the condition that the value of the target position point in the first mask is larger than that of the target position point in the second mask, taking a pixel point indicated by the target position point in the image as a target pixel point;
and taking the area formed by the target pixel points in the image as the target.
Optionally, the image is an RGB image; the number of input channels of the residual network ResNet-101 is 5.
Optionally, the training process of the target semantic segmentation model includes:
acquiring a training sample and a sample label; any one training sample is obtained by combining an RGB original image, a foreground binary image and a background binary image through a channel; the sample label of the training sample is a 2-channel mask for distinguishing a target from a non-target;
training the target semantic segmentation model by adopting the training sample, the sample label and a target loss function to obtain a trained target semantic segmentation model; the target loss function is the sum of a preset mark loss function and a cross entropy; the expression of the mark loss function is:
Figure BDA0002677488720000031
the LabelLoss represents the marker loss function value,
Figure BDA0002677488720000032
l represents a non-target mask in the result mask output by the target semantic segmentation model on any input training sample, S1Representing the foreground binary image in the training sample, S0Representing a background binary image in the training sample, W representing the width of the RGB raw image, and H representing the height of the RGB raw image.
Optionally, the obtaining of the training sample and the sample label includes:
acquiring a training example sample; any one of the training example samples includes: the RGB original image and a single-class mask; the single-class mask is a single-channel mask used for dividing the RGB original image into a target and a non-target;
selecting foreground points and background points for masks of a single category in each training example sample respectively to obtain a single-channel mask after marking corresponding to each example sample;
respectively generating a foreground binary image and a background binary image according to the foreground points and the background points of the marked single-channel mask to obtain a foreground binary image and a background binary image of each example sample;
respectively carrying out channel combination on the RGB original image in each example sample and the corresponding foreground binary image and background binary image to obtain an image after channel combination corresponding to each example sample;
respectively updating each single-class mask in the training example sample into a 2-channel mask through a preset function to obtain a single-class 2-channel mask;
and taking the image obtained by combining the channels corresponding to any one example sample as a training sample, and taking the 2-channel mask of a single class in the example sample as a sample label of the training sample.
Optionally, the obtaining a training example sample includes:
obtaining a semantic segmentation training data set; the semantic segmentation training data set comprises the RBG original image and a mask label; the mask label is a single-channel mask for distinguishing different areas in the RGB original image;
respectively determining a single type of mask corresponding to each area in the mask label; the single-category mask corresponding to any one area in the mask labels is obtained by setting the value of the position point of the area to be 1 and the values of other position points to be 0;
and the RGB original image and each single-class mask respectively form the training example sample.
The present application further provides a target extraction apparatus, including:
the acquisition module is used for acquiring an image and position information of a target to be extracted; the position information comprises position information of foreground points and position information of background points;
the first calculation module is used for calculating a foreground binary image according to the position information of the foreground point;
the second calculation module is used for calculating a background binary image according to the position information of the background point;
the combination module is used for carrying out channel combination on the image, the foreground binary image and the background binary image to obtain an image after channel combination;
the input module is used for inputting the image after the channel combination into a trained target semantic segmentation model to obtain a mask for distinguishing a target from a non-target in the image; the target semantic segmentation model is obtained by replacing at least Xception-65 in a Deeplab v3+ semantic segmentation network with a residual error network ResNet-101;
and the extraction module is used for extracting the target in the image according to the mask and the image.
Optionally, the first calculating module is configured to calculate a foreground binary image according to the position information of the foreground point, and includes:
the first calculation module is specifically configured to make a circle with a preset radius by taking each foreground point as a center of a circle in the image, so as to obtain a circle corresponding to each foreground point;
and setting the pixel value of the pixel point in the area where the circle corresponding to each foreground point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the foreground point as 0 to obtain the foreground binary image.
Optionally, the second calculating module is configured to calculate a background binary image according to the position information of the background point, and includes:
the second calculation module is specifically configured to take each background point as a circle center and take the preset radius as a circle in the image to obtain a circle corresponding to each background point;
setting the pixel value of the pixel point in the area where the circle corresponding to each background point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the background point as 0 to obtain the background binary image.
Optionally, the number of channels of the final output network of the target semantic segmentation network is 2;
the mask comprises a first mask and a second mask; the value of any position point in the first mask represents the probability that the pixel point which is the same as the position point in the image belongs to the target; and the value of any position point in the second mask represents the probability that the pixel point which is the same as the position point in the image belongs to the non-target.
The extracting module is configured to extract the target in the image according to the mask and the image, and includes:
the extraction module is specifically configured to compare values of target position points in the first mask and the second mask to obtain a comparison result; the target position points are the same position points;
in the case that the comparison result indicates: under the condition that the value of the target position point in the first mask is larger than that of the target position point in the second mask, taking a pixel point indicated by the target position point in the image as a target pixel point;
and taking the area formed by the target pixel points in the image as the target.
Optionally, the image is an RGB image; the number of input channels of the residual network ResNet-101 is 5.
Optionally, the method further includes: the training module is used for the training process of the target semantic segmentation model and comprises the following steps:
acquiring a training sample and a sample label; any one training sample is obtained by combining an RGB original image, a foreground binary image and a background binary image through a channel; the sample label of the training sample is a 2-channel mask for distinguishing a target from a non-target;
training the target semantic segmentation model by adopting the training sample, the sample label and a target loss function to obtain a trained target semantic segmentation model; the target loss function is the sum of a preset mark loss function and a cross entropy; the expression of the mark loss function is:
Figure BDA0002677488720000061
the LabelLoss represents the marker loss function value,
Figure BDA0002677488720000062
l represents a non-target mask in the result mask output by the target semantic segmentation model on any input training sample, S1Representing the foreground binary image in the training sample, S0Representing a background binary image in the training sample, W representing the width of the RGB raw image, and H representing the height of the RGB raw image.
Optionally, the training module is configured to obtain a training sample and a sample label, and includes:
the training module is specifically used for acquiring a training example sample; any one of the training example samples includes: the RGB original image and a single-class mask; the single-class mask is a single-channel mask used for dividing the RGB original image into a target and a non-target;
selecting foreground points and background points for masks of a single category in each training example sample respectively to obtain a single-channel mask after marking corresponding to each example sample;
respectively generating a foreground binary image and a background binary image according to the foreground points and the background points of the marked single-channel mask to obtain a foreground binary image and a background binary image of each example sample;
respectively carrying out channel combination on the RGB original image in each example sample and the corresponding foreground binary image and background binary image to obtain an image after channel combination corresponding to each example sample;
respectively updating each single-class mask in the training example sample into a 2-channel mask through a preset function to obtain a single-class 2-channel mask;
and taking the image obtained by combining the channels corresponding to any one example sample as a training sample, and taking the 2-channel mask of a single class in the example sample as a sample label of the training sample.
Optionally, the training module is configured to obtain a training example sample, and includes:
the training module is specifically used for acquiring a semantic segmentation training data set; the semantic segmentation training data set comprises the RBG original image and a mask label; the mask label is a single-channel mask for distinguishing different areas in the RGB original image;
respectively determining a single type of mask corresponding to each area in the mask label; the single-category mask corresponding to any one area in the mask labels is obtained by setting the value of the position point of the area to be 1 and the values of other position points to be 0;
and the RGB original image and each single-class mask respectively form the training example sample.
In the target extraction method and the target extraction device, an image and position information of a target to be extracted are obtained, wherein the position information comprises position information of foreground points and position information of background points; calculating a foreground binary image according to the position information of the foreground point, and calculating a background binary image according to the position information of the background point; because the information content of the binary image is small, the image, the foreground binary image and the background binary image are subjected to channel combination, and the information content of the obtained image after channel combination is small; furthermore, the image after channel combination is input into the trained target semantic segmentation model, so that the processing speed of the trained semantic segmentation model can be improved.
Because the target semantic segmentation model is obtained by replacing at least the Xeception-65 in the Deeplab v3+ semantic segmentation network with the residual error network ResNet-101, compared with the Xeception-65, the residual error network ResNet-101 can extract more semantic information, and therefore, the mask output by the target semantic segmentation model provided by the application has higher accuracy. Furthermore, according to the mask, the accuracy of the target extracted from the image is higher.
In summary, the speed and accuracy of extracting the target are improved by the target extraction method provided by the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a training process of a target semantic segmentation model disclosed in an embodiment of the present application;
fig. 2 is a flowchart of a target extraction method disclosed in an embodiment of the present application;
fig. 3 is a schematic diagram of an architecture of a target extraction process disclosed in an embodiment of the present application;
fig. 4(a) is a schematic image of an object to be extracted according to an embodiment of the present application;
FIG. 4(b) is a schematic diagram of an extraction result disclosed in the embodiment of the present application;
fig. 4(c) is a schematic image of another object to be extracted disclosed in the embodiment of the present application;
FIG. 4(d) is a schematic diagram of another extraction result disclosed in the embodiment of the present application;
fig. 5 is a schematic structural diagram of a target extraction device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the embodiment of the present application, the image type suitable for the extraction target may be an RGB image, or may be another type of image.
Fig. 1 is a schematic diagram of a training process of a target semantic segmentation model according to an embodiment of the present disclosure.
In this embodiment, the target semantic segmentation model is obtained by improving a deepab v3+ semantic segmentation network, specifically, Xception-65 in a deepab v3+ semantic segmentation network is replaced by a residual error network ResNet-101, the number of input channels of the residual error network ResNet-101 is modified to 5, and in this embodiment, the number of the channels of the deepab v3+ the final output network of the semantic segmentation network, which are replaced by the residual error network ResNet-101, is modified to 2, so as to obtain the target semantic segmentation model. The specific structure of the residual error network ResNet-101 and the specific structure of the Deeplab v3+ semantic segmentation network are the prior art, and are not described herein again.
In practice, when the image is not an RGB image but another type of image, the input channel number of the residual network ResNet-101 is modified to a target channel number, where the target channel number is the total channel number obtained by adding 2 to the image channel number.
It should be further noted that, in this embodiment, the number of channels of the final output network of the deplab v3+ semantic segmentation network replaced by the residual network ResNet-101 is modified to 2, that is, the number of channels of the final output network of the target semantic segmentation model is 2, that is, the target semantic segmentation model outputs a mask of 2 channels, that is, the target in the image is determined through the mask of two channels. In practice, the number of channels of the final output network of the target semantic segmentation model may also be 1, that is, the target semantic segmentation model outputs a single-channel mask, that is, the target in the image is determined through the single-channel mask. In this embodiment, a training process of the target semantic segmentation model is introduced by taking a mask of 2 channels output by the target semantic segmentation model as an example.
In this embodiment, in the process of training the target semantic segmentation model, an Adam optimizer may be adopted, the batch size (batch size) may be set to 4, the learning rate may be set to 0.0001, and the value of the preset radius may be set to 5. Of course, in practice, other optimizers and parameters may also be used to train the target semantic segmentation model, and the specific content of the optimizers and parameters is not limited in this embodiment. The specific training process may include the steps of:
s101, obtaining a semantic segmentation training data set.
In this step, the semantic segmentation training data set may be a Pascal VOC2012 semantic segmentation data set, and certainly, in practice, other semantic segmentation data sets may also be adopted, which is described in this embodiment by taking the Pascal VOC2012 semantic segmentation data set as an example.
The semantic segmentation model training dataset comprises: the image processing method comprises RGB original images and mask labels, wherein one RGB original image corresponds to one mask label, and the mask label of any frame of RGB original image is obtained through manual marking.
For any frame of RGB original image, the RGB original image is composed of multiple regions, the mask label corresponding to the RGB original image is a single-channel mask for distinguishing different regions in the RGB original image, namely the mask label corresponding to the RGB original image comprises multiple regions, wherein one region in the mask label corresponds to one region in the RGB original image. The number of the regional categories in the RGB original image in the semantic segmentation training data set is determined in advance.
For example, a frame of RGB original image is composed of 21 regions, and the corresponding mask label of the RGB original image includes 21 regions, wherein one region may be represented by 1 number.
S102, determining the masks of the single type corresponding to each area in the mask labels respectively.
And respectively determining a mask of a single category corresponding to each region in the corresponding mask label for each frame of RGB original image in the semantic segmentation model.
Taking an RGB original image of any frame as an example, a single-type mask corresponding to any one region in a mask label corresponding to the RGB original image is obtained by setting the value of the position point of the region to 1, and setting the values of the other position points to 0.
Assuming that the frame RGB original image includes 21 regions, the mask label corresponding to the frame RGB original image includes 21 different regions, and the different regions are represented by different numbers. Assuming that any region in the mask label corresponding to the RGB original image is a region corresponding to the number "7" in the mask label, setting a value of a position point corresponding to the number "7" in the mask label to 1, and setting values of position points where other numbers except for the number "7" in the mask label to 0, thereby obtaining a single-type mask corresponding to the region. Then in this step 21 masks of a single class are obtained.
S103, combining the RGB original image and each single-class mask respectively to form a training example sample.
Taking the RGB original image including 21 regions as an example, in this step, the frame of RGB original image is combined with 21 masks of a single type to form a training example sample, i.e. 21 training example examples are obtained.
The above-mentioned purposes of S101 to S103 are: training example samples are obtained. Wherein any one of the training example samples comprises: an RGB original image and a single-class mask; the single class mask is a single channel mask for dividing the RGB original image into a target and a non-target.
And S104, selecting foreground points and background points for the masks of the single category in each training example sample respectively to obtain the marked single-channel masks corresponding to each example sample respectively.
In this step, the first location point (belonging to the target and representing the target) that simulates the user click is called the foreground point. The second location point (belonging to and representing a non-object) that simulates a user click is called a background point. Taking the RGB image including 21 regions as an example, 21 marked masks are obtained in this step.
In this step, foreground points and background points may be randomly selected.
And S105, respectively generating a foreground binary image and a background binary image according to the marked foreground points and background points of the single-channel mask corresponding to each training example sample, so as to obtain the foreground binary image and the background binary image of each example sample.
In this embodiment, the principle of generating a foreground binary image and the principle of generating a background binary image for the foreground point and the background point in the marked mask corresponding to each training example sample are the same, and for convenience of description, the principle of generating a foreground binary image and a background binary image according to the foreground point and the background point in the marked mask corresponding to the training example sample is described by taking the marked mask corresponding to any one training example sample as an example:
in the RGB original image in the training example sample, respectively taking each foreground point as the center of a circle and making a circle by a preset radius to obtain a circle corresponding to each foreground point; and setting the pixel value of the pixel point of the area where the circle corresponding to each foreground point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the foreground point as 0 to obtain a foreground binary image.
Similarly, in the RGB original image in the training example sample, each background point is taken as a circle center, and a circle is made with the preset radius, so as to obtain a circle corresponding to each background point; and setting the pixel value of the pixel point in the area where the circle corresponding to each background point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the background point as 0 to obtain a background binary image.
Specifically, the background binary image and the foreground binary image may be calculated according to the following formula (1):
Figure BDA0002677488720000111
wherein t is 0
Figure BDA0002677488720000112
Background II indicating marked mask(i, j) pixel values in the value image indicating pixel points,
Figure BDA0002677488720000113
representing a set of background points in the marked mask,
Figure BDA0002677488720000114
indicating that (i, j) indicates a pixel point and
Figure BDA0002677488720000115
the minimum distance value among the distances of each background point in the image. r denotes a preset radius value and sgn denotes a sign function.
Wherein (i, j) indicates a pixel point and
Figure BDA0002677488720000121
the distance between any one of the background points may be an euclidean distance, and a specific calculation formula is shown in the following formula (2):
Figure BDA0002677488720000122
in the formula, AxRepresents the pixel point indicated by (i, j) in formula (1), BxTo represent
Figure BDA0002677488720000123
Any background point in (2).
When t is 1
Figure BDA0002677488720000124
Representing pixel values of (i, j) indicated pixel points in the foreground binary image of the marked mask,
Figure BDA0002677488720000125
representing a set of foreground points in the marked mask,
Figure BDA0002677488720000126
indicating that (i, j) indicates a pixel point and
Figure BDA0002677488720000127
the minimum distance value among the distances of each background point in the image. r denotes a preset radius value and sgn denotes a sign function.
Wherein (i, j) indicates a pixel point and
Figure BDA0002677488720000128
the distance between any foreground points may be an euclidean distance, and the specific calculation principle may refer to formula (2), which is not described herein again.
And S106, respectively carrying out channel combination on the RGB original image in each example sample and the corresponding foreground binary image and background binary image to obtain an image after channel combination corresponding to each example sample.
In this step, the channel combination of the RGB original image in each example sample and the corresponding foreground binary image and background binary image is performed in the same manner, and the channel combination of the RGB original image of any example sample and the corresponding foreground binary image and background binary image is used as an example for introduction.
Since the RGB original image is a 3-channel image, the foreground binary image corresponding to the RGB original image is a single-channel image, and the background binary image corresponding to the RGB original image is a single-channel image, in this step, the RGB original image, the corresponding foreground binary image, and the background binary image are subjected to channel combination to obtain a 5-channel image.
S107, respectively updating the mask of each single type in the training example sample into a 2-channel mask through a preset function, and obtaining the 2-channel mask of the single type.
Specifically, in this step, the specific implementation manner of changing the mask of a single category in the training example sample into the sample label of the 2-channel by using the preset function is the prior art, and is not described herein again. In the masks of the 2 channels, the value of each position point in the mask of one channel is used for representing the probability that the corresponding pixel point in the RGB original image belongs to the target, and the value of each position point in the mask of the other channel is used for representing the probability that the corresponding pixel point in the RGB original image belongs to the non-target.
And S108, taking the image obtained by combining the channels corresponding to any example sample as a training sample, and taking the 2-channel mask of a single category in the example sample as a sample label of the training sample.
The purpose of the above S101 to S108 is: acquiring a training sample and a sample label; any training sample is obtained by combining an RGB original image, a foreground binary image and a background binary image through a channel; the sample label of the training sample is a 2-channel mask used to distinguish between targets and non-targets.
S109, training the target semantic segmentation model by adopting the training samples, the sample labels and the target loss function to obtain the trained target semantic segmentation model.
In this step, the training process of the target semantic segmentation model may include: and for any training sample input into the target semantic segmentation model, the result mask of the training sample output by the target semantic segmentation model, wherein the number of channels of the result mask is 2. And calculating a loss function value between a non-target mask in the result mask and a mask label of the training sample by adopting a target loss function, and adjusting parameters of the target semantic segmentation model according to the loss function value to finally obtain the target semantic segmentation model.
In this embodiment, the target loss function is the sum of the mark loss function and the cross entropy.
Wherein, the expression of the mark loss function is shown in the following formula (3):
Figure BDA0002677488720000131
wherein LabelLoss represents a loss function value,
Figure BDA0002677488720000132
for limiting the input range to (0,1), L represents a non-target mask in the result mask output by the target semantic division model, i.e. represents the output of the target semantic division modelChannel in the result mask representing the probability of background points, S1Representing a foreground binary image, S0Representing a background binary image, W representing the width of the image, and H representing the height of the image.
(S1-L)2Performing square calculation on each element in a difference matrix of the foreground binary image and a non-target mask in the result mask to obtain a first square matrix; s1(S1-L)2And expressing that elements in the first square matrix are correspondingly multiplied with elements in the foreground binary image to obtain a first multiplication matrix.
In the same way, (S)0-L)2Representing a second square matrix obtained by squaring each element in a difference matrix between the background binary image and a non-target mask in the result mask, S0(S0-L)2And expressing that elements in the second square matrix are correspondingly multiplied with elements in the background binary image to obtain a second multiplication matrix.
In this embodiment, the cross entropy is a cross entropy between the result mask and the sample label of the training sample, and it should be noted that in the process of calculating the cross entropy, two masks, namely a target mask and a non-target mask, in the result mask output by the target semantic segmentation model are used.
Fig. 2 is a method for extracting an object according to an embodiment of the present application, including the following steps:
s201, obtaining an image and position information of a target to be extracted.
In this step, the image of the target to be extracted is an RGB image, and the position information is a foreground point and a background point marked on the image of the target to be extracted by the user, where the foreground point belongs to the target to be extracted and the background point belongs to a non-target.
In order to more intuitively show the execution flow of the embodiment, an architectural diagram of a target extraction flow is provided in the embodiment of the present application, as shown in fig. 3.
The image and the position information of the target to be extracted acquired in this step correspond to the image indicated by "image 1" in fig. 3.
S202, calculating a foreground binary image according to the position information of the foreground point, and calculating a background binary image according to the position information of the background point.
The specific implementation manner of this step may refer to S105, which is not described herein again.
The foreground binary image obtained in this step corresponds to "image 3" in fig. 3, and the background binary image corresponds to "image 4" in fig. 3 "
And S203, carrying out channel combination on the image, the foreground binary image and the background binary image to obtain an image after channel combination.
In this step, the image of the target to be extracted, the foreground binary image of the target to be extracted, and the background binary image of the target to be extracted are subjected to channel combination to obtain an image after channel combination corresponding to the image of the target to be extracted.
It should be noted that, this step is only to combine channels of different images, and does not modify information in the images.
In this step, the image, the foreground binary image and the background binary image are subjected to channel combination, and corresponding to fig. 3, "image 2", "image 3" and "image 4" are subjected to channel combination.
And S204, inputting the image after the channel combination into the trained target semantic segmentation model to obtain a mask.
In this step, the trained target semantic segmentation model is the target semantic segmentation model obtained in S109 in the embodiment corresponding to fig. 1.
In this embodiment, the resulting mask is used to distinguish between objects and non-objects in the image. In this step, the obtained mask may be a single-channel mask or a 2-channel mask. Specifically, the number of channels of the mask depends on how to train the target semantic segmentation model, and the number of channels of the mask obtained in this step is not limited in this embodiment.
The target semantic segmentation model of this step corresponds to the target semantic segmentation model in fig. 3. This step results in a mask corresponding to "image 5" in FIG. 3.
S205, extracting the target in the image of the target to be extracted according to the mask and the image of the target to be extracted.
Optionally, in this step, taking the mask as a 2-channel mask as an example, specifically, the 2-channel mask is a first mask and a second mask, where a value taken from any position point in the first mask indicates a probability that a pixel point in the image that is the same as the position point belongs to the target; the value of any position point in the second mask represents the probability that the pixel point in the image same as the position point belongs to the non-target.
Specifically, extracting the target in the image of the target to be extracted according to the mask and the image of the target to be extracted includes the following steps a1 to A3:
and A1, comparing the values of the target position points in the first mask and the second mask to obtain a comparison result.
In this step, the target position points are the same position points, that is, each pair of position points in the first mask and the second mask, wherein any pair of position points is formed by any one of the same position points in the first mask and the second mask.
In this step, the values of each target location point in the first mask and the second mask are compared, and one target location point corresponds to one comparison result.
A2, in the comparison: and under the condition that the value of the target position point in the first mask is larger than that of the target position point in the second mask, taking the pixel point indicated by the target position point in the image of the target to be extracted as a target pixel point.
The value of the target position point in the first mask is larger than that of the target position point in the second mask, so that the pixel point indicated by the target position point in the image of the target to be extracted belongs to the target. For convenience of description, the pixel point indicated by the target position point in the image of the target to be extracted is taken as a target pixel point, namely the target pixel point belongs to the target.
And A3, taking a region formed by target pixel points in the image of the target to be extracted as a target, and obtaining the image only comprising the target.
S205 this step corresponds to the process of "image 2" and "image 5" to "image 6" in fig. 3.
In the present embodiment, a variety of programming languages and tools may be used for implementation. Specifically, the programming language may be a python interpreter version may be 3.6.8. The deep learning framework may be tensiorflow (both CPU and GPU versions) and the version may be 1.13.1. The tools used to determine the background binary image and the foreground binary image may include Numpy, Opencv-python, H5 py. The visualization process may employ matplotlib.
The beneficial effects of this embodiment include:
the beneficial effects are that:
acquiring an image and position information of a target to be extracted, wherein the position information comprises position information of foreground points and position information of background points; calculating a foreground binary image according to the position information of the foreground point, and calculating a background binary image according to the position information of the background point; on one hand, the foreground binary image and the background binary image are calculated, so that the calculation speed is increased, and the target extraction speed is increased; on the other hand, because the information content of the binary image is small, the image, the foreground binary image and the background binary image are subjected to channel combination, and the information content of the obtained image after channel combination is small; furthermore, the image after channel combination is input into the trained target semantic segmentation model, so that the processing speed of the trained semantic segmentation model can be improved.
Since the target semantic segmentation model in this embodiment is obtained by replacing at least the decapsulation v3+ Xception-65 in the semantic segmentation network with the residual network ResNet-101, the residual network ResNet-101 can extract more semantic information than the Xception-65, and therefore, the mask output by the target semantic segmentation model provided by this embodiment has higher accuracy. Furthermore, according to the mask, the accuracy of the target extracted from the image is higher.
In summary, the speed and the accuracy of extracting the target by the target extracting method provided by the embodiment are improved.
The beneficial effects are that:
in this embodiment, the loss function used in the training process of the model includes a labeled loss function, and the labeled loss function can improve the convergence speed of the training model, and further can improve the training speed of the model.
Fig. 4 is a schematic diagram of an experimental result provided in an embodiment of the present application. Specifically, fig. 4(a), 4(b), 4(c), and 4(d) are included.
Fig. 4(a) is an image of an object to be extracted, where two sheep are present in the image, one sheep on the right side is the object to be extracted, a foreground point is marked in a region where the sheep on the right side is located, and a background point is marked in a region outside the sheep on the right side. Fig. 4(b) is a schematic diagram of the result obtained by the target extraction method according to the embodiment of the present application.
Fig. 4(c) is another image of the target to be extracted, where two sheep exist in the image, one sheep on the left side is the target to be extracted, a foreground point is marked in the region where the sheep on the left side is located, and a background point is marked in the region outside the sheep on the left side. Fig. 4(d) is a schematic diagram of the result obtained by the target extraction method according to the embodiment of the present application.
Fig. 5 is a schematic structural diagram of a target extraction apparatus according to an embodiment of the present application, including: an acquisition module 501, a first calculation module 502, a second calculation module 503, a combination module 504, an input module 505, and an extraction module 506. Wherein the content of the first and second substances,
an obtaining module 501, configured to obtain an image and position information of a target to be extracted; the position information includes position information of foreground points and position information of background points.
The first calculating module 502 is configured to calculate a foreground binary image according to the position information of the foreground point.
And a second calculating module 503, configured to calculate a background binary image according to the position information of the background point.
And the combining module 504 is configured to perform channel combination on the image, the foreground binary image, and the background binary image to obtain an image after channel combination.
An input module 505, configured to input the image after channel combination into the trained target semantic segmentation model, so as to obtain a mask for distinguishing a target from a non-target in the image; the target semantic segmentation model is obtained by replacing at least the Xeception-65 in the Deeplab v3+ semantic segmentation network with a residual network ResNet-101.
And an extracting module 506, configured to extract an object in the image according to the mask and the image.
Optionally, the first calculating module 502 is configured to calculate a foreground binary image according to the position information of the foreground point, and includes:
the first calculating module 502 is specifically configured to make a circle with a preset radius by taking each foreground point as a center of the circle in the image, so as to obtain a circle corresponding to each foreground point; and setting the pixel value of the pixel point of the area where the circle corresponding to each foreground point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the foreground point as 0 to obtain a foreground binary image.
Optionally, the second calculating module 503 is configured to calculate a background binary image according to the position information of the background point, and includes:
the second calculating module 503 is specifically configured to take each background point as a circle center and take a preset radius as a circle in the image to obtain a circle corresponding to each background point; and setting the pixel value of the pixel point in the area where the circle corresponding to each background point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the background point as 0 to obtain a background binary image.
Optionally, the number of channels of the final output network of the target semantic segmentation network is 2;
the mask comprises a first mask and a second mask; the value taking of any position point in the first mask represents the probability that the pixel point which is the same as the position point in the image belongs to the target; the value of any position point in the second mask represents the probability that the pixel point in the image same as the position point belongs to the non-target.
An extracting module 506, configured to extract an object in the image according to the mask and the image, including:
an extracting module 506, specifically configured to compare values of the target location points in the first mask and the second mask to obtain a comparison result; the target position points are the same position points; the comparison results show that: under the condition that the value of the target position point in the first mask is larger than that of the target position point in the second mask, taking the pixel point indicated by the target position point in the image as a target pixel point; and taking the area formed by the target pixel points in the image as a target.
Optionally, the image is an RGB image; the number of input channels of the residual network ResNet-101 is 5.
Optionally, the method further includes: the training module is used for the training process of the target semantic segmentation model and comprises the following steps: acquiring a training sample and a sample label; any training sample is obtained by combining an RGB original image, a foreground binary image and a background binary image through a channel; the sample label of the training sample is a 2-channel mask for distinguishing a target from a non-target;
training a target semantic segmentation model by adopting the training sample, the sample label and the target loss function to obtain a trained target semantic segmentation model; the target loss function is the sum of a preset mark loss function and the cross entropy; the expression for the mark loss function is:
Figure BDA0002677488720000191
LabelLoss represents the marker loss function value,
Figure BDA0002677488720000192
l represents a non-target mask in the result mask output by the target semantic segmentation model on any input training sample, S1Representing the foreground binary image in the training sample, S0Representing the background binary image in the training sample, W representing the width of the RGB raw image, and H representing the height of the RGB raw image.
Optionally, the training module is configured to obtain a training sample and a sample label, and includes:
the training module is specifically used for acquiring a training example sample; any one of the training example samples includes: an RGB original image and a single-class mask; the single-class mask is a single-channel mask used for dividing the RGB original image into a target and a non-target;
selecting foreground points and background points for masks of a single category in each training example sample respectively to obtain a single-channel mask after marking corresponding to each example sample;
respectively generating a foreground binary image and a background binary image according to the foreground points and the background points of the marked single-channel mask to obtain a foreground binary image and a background binary image of each example sample;
respectively carrying out channel combination on the RGB original image in each example sample and the corresponding foreground binary image and background binary image to obtain an image after channel combination corresponding to each example sample;
respectively updating each single-class mask in the training example sample into a 2-channel mask through a preset function to obtain a single-class 2-channel mask;
and taking the image after the channel combination corresponding to any example sample as a training sample, and taking the 2-channel mask of a single category in the example sample as a sample label of the training sample.
Optionally, the training module is configured to obtain a training example sample, and includes:
the training module is specifically used for acquiring a semantic segmentation training data set; the semantic segmentation training data set comprises RBG original images and mask labels; the mask label is a single-channel mask for distinguishing different areas in the RGB original image;
respectively determining a single type of mask corresponding to each area in the mask label; the single-category mask corresponding to any region in the mask label is obtained by setting the value of the position point of the region to be 1 and the values of other position points to be 0;
the RGB original image and each single-class mask form a training example sample.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of target extraction, comprising:
acquiring an image and position information of a target to be extracted; the position information comprises position information of foreground points and position information of background points;
calculating a foreground binary image according to the position information of the foreground point;
calculating a background binary image according to the position information of the background points;
performing channel combination on the image, the foreground binary image and the background binary image to obtain an image after channel combination;
inputting the image after the channel combination into a trained target semantic segmentation model to obtain a mask for distinguishing a target from a non-target in the image; the target semantic segmentation model is obtained by replacing at least Xception-65 in a Deeplab v3+ semantic segmentation network with a residual error network ResNet-101;
and extracting the target in the image according to the mask and the image.
2. The method of claim 1, wherein the computing a foreground binary image from the location information of the foreground points comprises:
in the image, taking each foreground point as a circle center and taking a preset radius as a circle respectively to obtain a circle corresponding to each foreground point;
and setting the pixel value of the pixel point in the area where the circle corresponding to each foreground point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the foreground point as 0 to obtain the foreground binary image.
3. The method according to claim 1, wherein the calculating a background binary image according to the position information of the background point comprises:
in the image, taking each background point as a circle center and taking the preset radius as a circle respectively to obtain a circle corresponding to each background point;
setting the pixel value of the pixel point in the area where the circle corresponding to each background point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the background point as 0 to obtain the background binary image.
4. The method according to claim 1, wherein the number of channels of the final output network of the target semantic segmentation network is 2;
the mask comprises a first mask and a second mask; the value of any position point in the first mask represents the probability that the pixel point which is the same as the position point in the image belongs to the target; the value of any position point in the second mask represents the probability that the pixel point which is the same as the position point in the image belongs to the non-target;
the extracting the target in the image according to the mask and the image comprises:
comparing the values of the target position points in the first mask and the second mask to obtain a comparison result; the target position points are the same position points;
in the case that the comparison result indicates: under the condition that the value of the target position point in the first mask is larger than that of the target position point in the second mask, taking a pixel point indicated by the target position point in the image as a target pixel point;
and taking the area formed by the target pixel points in the image as the target.
5. The method of claim 4, wherein the image is an RGB image; the number of input channels of the residual network ResNet-101 is 5.
6. The method of claim 5, wherein the training process for the target semantic segmentation model comprises:
acquiring a training sample and a sample label; any one training sample is obtained by combining an RGB original image, a foreground binary image and a background binary image through a channel; the sample label of the training sample is a 2-channel mask for distinguishing a target from a non-target;
training the target semantic segmentation model by adopting the training sample, the sample label and a target loss function to obtain a trained target semantic segmentation model; the target loss function is the sum of a preset mark loss function and a cross entropy; the expression of the mark loss function is:
Figure FDA0002677488710000021
the LabelLoss represents the marker loss function value,
Figure FDA0002677488710000022
l represents a non-target mask in the result mask output by the target semantic segmentation model on any input training sample, S1Representing the foreground binary image in the training sample, S0Representing a background binary image in the training sample, W representing the width of the RGB raw image, and H representing the height of the RGB raw image.
7. The method of claim 6, wherein the obtaining of the training sample and the sample label comprises:
acquiring a training example sample; any one of the training example samples includes: the RGB original image and a single-class mask; the single-class mask is a single-channel mask used for dividing the RGB original image into a target and a non-target;
selecting foreground points and background points for masks of a single category in each training example sample respectively to obtain a single-channel mask after marking corresponding to each example sample;
respectively generating a foreground binary image and a background binary image according to the foreground points and the background points of the marked single-channel mask to obtain a foreground binary image and a background binary image of each example sample;
respectively carrying out channel combination on the RGB original image in each example sample and the corresponding foreground binary image and background binary image to obtain an image after channel combination corresponding to each example sample;
respectively updating each single-class mask in the training example sample into a 2-channel mask through a preset function to obtain a single-class 2-channel mask;
and taking the image obtained by combining the channels corresponding to any one example sample as a training sample, and taking the 2-channel mask of a single class in the example sample as a sample label of the training sample.
8. The method of claim 7, wherein the obtaining training instance samples comprises:
obtaining a semantic segmentation training data set; the semantic segmentation training data set comprises the RBG original image and a mask label; the mask label is a single-channel mask for distinguishing different areas in the RGB original image;
respectively determining a single type of mask corresponding to each area in the mask label; the single-category mask corresponding to any one area in the mask labels is obtained by setting the value of the position point of the area to be 1 and the values of other position points to be 0;
and the RGB original image and each single-class mask respectively form the training example sample.
9. An object extraction device, comprising:
the acquisition module is used for acquiring an image and position information of a target to be extracted; the position information comprises position information of foreground points and position information of background points;
the first calculation module is used for calculating a foreground binary image according to the position information of the foreground point;
the second calculation module is used for calculating a background binary image according to the position information of the background point;
the combination module is used for carrying out channel combination on the image, the foreground binary image and the background binary image to obtain an image after channel combination;
the input module is used for inputting the image after the channel combination into a trained target semantic segmentation model to obtain a mask for distinguishing a target from a non-target in the image; the target semantic segmentation model is obtained by replacing at least Xception-65 in a Deeplab v3+ semantic segmentation network with a residual error network ResNet-101;
and the extraction module is used for extracting the target in the image according to the mask and the image.
10. The apparatus of claim 9, wherein the first computing module is configured to compute a foreground binary image according to the position information of the foreground point, and comprises:
the first calculation module is specifically configured to make a circle with a preset radius by taking each foreground point as a center of a circle in the image, so as to obtain a circle corresponding to each foreground point;
and setting the pixel value of the pixel point in the area where the circle corresponding to each foreground point is located as 1, and setting the pixel value of the pixel point outside the circle corresponding to the foreground point as 0 to obtain the foreground binary image.
CN202010952509.1A 2020-09-11 2020-09-11 Target extraction method and device Pending CN112070793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010952509.1A CN112070793A (en) 2020-09-11 2020-09-11 Target extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010952509.1A CN112070793A (en) 2020-09-11 2020-09-11 Target extraction method and device

Publications (1)

Publication Number Publication Date
CN112070793A true CN112070793A (en) 2020-12-11

Family

ID=73696509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010952509.1A Pending CN112070793A (en) 2020-09-11 2020-09-11 Target extraction method and device

Country Status (1)

Country Link
CN (1) CN112070793A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781584A (en) * 2021-01-14 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for taking color of picture
CN114782460A (en) * 2022-06-21 2022-07-22 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660066A (en) * 2019-09-29 2020-01-07 Oppo广东移动通信有限公司 Network training method, image processing method, network, terminal device, and medium
CN111260666A (en) * 2020-01-19 2020-06-09 上海商汤临港智能科技有限公司 Image processing method and device, electronic equipment and computer readable storage medium
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660066A (en) * 2019-09-29 2020-01-07 Oppo广东移动通信有限公司 Network training method, image processing method, network, terminal device, and medium
CN111260666A (en) * 2020-01-19 2020-06-09 上海商汤临港智能科技有限公司 Image processing method and device, electronic equipment and computer readable storage medium
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAORU WANG等: "Image Object Extraction Based on Semantic Segmentation and Label Loss", 《IEEE ACCESS》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781584A (en) * 2021-01-14 2021-12-10 北京沃东天骏信息技术有限公司 Method and device for taking color of picture
CN114782460A (en) * 2022-06-21 2022-07-22 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment

Similar Documents

Publication Publication Date Title
US9697444B2 (en) Convolutional-neural-network-based classifier and classifying method and training methods for the same
CN110490081B (en) Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network
WO2022001623A1 (en) Image processing method and apparatus based on artificial intelligence, and device and storage medium
CN106980856B (en) Formula identification method and system and symbolic reasoning calculation method and system
CN111046784A (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN106202030B (en) Rapid sequence labeling method and device based on heterogeneous labeling data
CN109740515B (en) Evaluation method and device
CN108681735A (en) Optical character recognition method based on convolutional neural networks deep learning model
CN111523622B (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
CN113094478B (en) Expression reply method, device, equipment and storage medium
CN111210402A (en) Face image quality scoring method and device, computer equipment and storage medium
CN111325750A (en) Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN111639717A (en) Image character recognition method, device, equipment and storage medium
CN112070793A (en) Target extraction method and device
CN111401099A (en) Text recognition method, device and storage medium
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN112508966B (en) Interactive image segmentation method and system
CN113705468A (en) Digital image identification method based on artificial intelligence and related equipment
CN115984875B (en) Stroke similarity evaluation method and system for hard-tipped pen regular script copy work
CN109389173B (en) M-CNN-based test paper score automatic statistical analysis method and device
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN116311322A (en) Document layout element detection method, device, storage medium and equipment
CN115512340A (en) Intention detection method and device based on picture
CN113486680A (en) Text translation method, device, equipment and storage medium
CN113177543A (en) Certificate identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201211

RJ01 Rejection of invention patent application after publication