CN116895003A - Target object segmentation method, device, computer equipment and storage medium - Google Patents

Target object segmentation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116895003A
CN116895003A CN202311148049.7A CN202311148049A CN116895003A CN 116895003 A CN116895003 A CN 116895003A CN 202311148049 A CN202311148049 A CN 202311148049A CN 116895003 A CN116895003 A CN 116895003A
Authority
CN
China
Prior art keywords
pixel
networks
object set
network
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311148049.7A
Other languages
Chinese (zh)
Other versions
CN116895003B (en
Inventor
李佩霞
陈宇
张如高
虞正华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Moshi Intelligent Technology Co ltd
Original Assignee
Suzhou Moshi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Moshi Intelligent Technology Co ltd filed Critical Suzhou Moshi Intelligent Technology Co ltd
Priority to CN202311148049.7A priority Critical patent/CN116895003B/en
Publication of CN116895003A publication Critical patent/CN116895003A/en
Application granted granted Critical
Publication of CN116895003B publication Critical patent/CN116895003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of computers, and discloses a target object segmentation method, a target object segmentation device, a target object segmentation computer device and a target object storage medium, wherein the target object segmentation method comprises the following steps: acquiring a first object set with pixel labels and a second object set without pixel labels, inputting the first object set into a plurality of groups of networks for iterative training, carrying out noise filtering by using a second type network of each group of networks, acquiring a third object set, carrying out pixel classification on data without pixel labels by using the second type network of each group of networks, inputting training data which are output after the processing of the first group of networks into the first type networks of other groups of networks for training, updating the parameters of the second type networks by using the parameters of the first type networks in each group of networks after training for a certain number of times, and repeating the steps of iterative noise filtering, model training, parameter updating and the like until the model ends when reaching convergence.

Description

Target object segmentation method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for partitioning a target object, a computer device, and a storage medium.
Background
In recent years, artificial intelligence technology has been rapidly developed, and machine learning technology has also become more mature. With the development of artificial intelligence technology, an image segmentation technology has emerged, which is a technology and process for dividing an image into a plurality of specific regions with unique properties and presenting an object of interest. Currently, images are typically segmented by training an image segmentation artificial intelligence model. Namely, image segmentation training is performed by acquiring a large number of marked images. The method requires a large amount of manual labeling, the labeling cost is high, the current image segmentation processing is mainly image-level classification task processing, when the labeling data amount does not reach a certain level, the model is difficult to reach a certain accuracy, and a large amount of manual labeling can introduce more noise labeling, so that the accuracy of the model is influenced.
Disclosure of Invention
In view of the above, the present invention provides a method, apparatus, computer device and storage medium for dividing a target object, so as to solve the problem that the division of the target object is not accurate enough.
In a first aspect, the present invention provides a method for segmenting a target object, including:
acquiring initial sample data, wherein the initial sample data comprises a first object set and a second object set, the first object set is an object set with a pixel label, and the second object set is an object set without a pixel label;
inputting each object in the first object set into each network in a plurality of groups of pre-constructed networks respectively for training, and obtaining a plurality of groups of trained optimized networks, wherein each group of optimized networks comprises a first type network and a second type network;
in the current polling period, inputting a first object set into a second class network in a first group of optimized networks to perform noise filtering to obtain a third object set, wherein the first group of optimized networks are any one group of optimized networks in a plurality of groups of optimized networks;
when the noise filtering processing is carried out and the object containing noise exists, constructing a fourth object set by the second object set and the object containing noise, wherein the fourth object set is a label-free object set;
inputting the fourth object set into a second class network in the first group of optimized networks to obtain a fifth object set, wherein the objects in the fifth object set all have pixel pseudo tags;
Inputting a third object set and a fifth object set into a first type network of a second group of optimizing networks in a plurality of optimizing networks to train, obtaining a first training result, inputting a sixth object set and a seventh object set obtained by the second group of optimizing networks into the first type network in the first optimizing network to train, and obtaining a second training result, wherein the second group of optimizing networks is any one of the optimizing networks except the first group of optimizing networks in a plurality of groups of optimizing networks, the sixth object set is an object set with pixel labels, and the seventh object set is an object set with pixel pseudo labels;
updating parameters of a second type network in the first group of optimized networks by using the first training result, and after updating parameters of the second type network in the second group of optimized networks by using the second training result, if all the first type networks and/or the second type networks in the plurality of groups of optimized networks are determined to not reach preset conditions, training of the next polling period is performed;
or stopping training after determining that all the first-class networks and the second-class networks in the plurality of groups of optimized networks reach preset conditions;
and completing the segmentation operation on the target object by using the second class network in each group of optimized networks.
By the method, the initial first object set with the pixel labels and the second object set without the pixel labels are acquired, the first object set is input into the pre-constructed multi-group network for a certain number of iterative training, each group of networks has a certain training precision, each group of networks comprises a first class network and a second class network, for the noise labels possibly contained in the pixel labels, the second class network of each group of networks is utilized for noise filtering, noise labeling data can be recognized to a large extent, pixels and images of the noise labels are processed, and a third object set is acquired, so that the accuracy of training samples is greatly improved, namely, a fourth object set is acquired, because the second class network has undergone a certain number of iterative training, the output classification of the second class network has a certain accuracy, then the training data output after the first class network is processed is input into the first class network of other group of networks, the first class network is utilized for noise filtering after training for a certain number of times, the noise labels are repeatedly filtered by the first class network, the filter parameters of the first class network are updated, the filter model is more accurately segmented by the aid of the filter parameters of the first class network, the filter model is more accurate than the filter model is achieved when the filter model is in a series of the condition that the quality of the noise labels is not equal to the quality of the filter labels is greatly improved, the quality of the sample is not improved, and the quality of the filter labels is greatly has been achieved by the iterative training of the filter model is achieved, and the quality of the filter labels is higher than the quality has been improved, the accuracy of the network segmentation model is further improved.
In an alternative embodiment, during the current polling period, inputting the first object set into the second type network in the first group of optimized networks for noise filtering, and obtaining the third object set includes:
in the current polling period, inputting a first object set into a first type network in a first group of optimized networks, extracting features of the first objects in the first object set, and obtaining pixel features corresponding to each pixel in the first objects, wherein the first objects are any one object in the first object set;
generating a pixel data set according to the pixel characteristics respectively corresponding to each pixel of all the objects and the pixel labels respectively corresponding to each pixel;
performing noise estimation operation on the pixel data set, and determining a noise proportion mean value corresponding to the pixel data set;
inputting the first object into a second class network in a first group of optimized networks to predict, and obtaining a prediction classification corresponding to each pixel in the first object;
determining a confidence mean value of the first object set according to the prediction classification and the pixel labels corresponding to each pixel in all the objects of the first object set;
determining a noise filtering threshold according to the noise proportion average value and the confidence coefficient average value;
And carrying out noise filtering on the first object set according to the noise filtering threshold value to obtain a third object set.
By the method, for each object in the first object set, firstly, extracting the characteristics of the pixel level, generating a pixel data set according to the pixel characteristics and the labels corresponding to the pixels, performing noise estimation operation on the pixel data set, determining a noise proportion mean value corresponding to the pixel data set, determining a confidence coefficient mean value of the first object set according to the confidence coefficient of each pixel, and determining a noise filtering threshold value according to the noise proportion mean value and the confidence coefficient mean value; and carrying out noise filtering on the first object set according to the noise filtering threshold value to obtain a third object set, so that noise labels in the first object set can be accurately identified, and the sample quality is improved.
In an alternative embodiment, generating a pixel data set according to the pixel characteristics respectively corresponding to each pixel of all objects and the pixel labels respectively corresponding to each pixel includes:
respectively carrying out random sampling processing on pixels corresponding to each type of pixel labels in the first object to obtain a target pixel set corresponding to the first pixels;
And forming a pixel data set by the target pixel sets respectively corresponding to all the objects.
In an optional implementation manner, performing noise estimation operation on the pixel data set to determine a noise proportion mean value corresponding to the pixel data set includes:
performing noise estimation operation on the pixel data set, and determining a first probability and a second probability corresponding to each type of pixel label respectively, wherein the first probability is a probability that the first type of pixel label is a true value, the second probability is a probability that the first type of pixel label is a given true value when the first type of pixel label is the true value, and the first type of pixel label is any type of pixel label;
determining the noise proportion of the first type pixel labels according to the first probability and the second probability of the first type pixel labels;
and determining a noise proportion average value according to the noise proportions of all classes of pixel labels.
In an alternative embodiment, determining the confidence mean of the first object set according to the prediction classification and the pixel label corresponding to each pixel in all the objects of the first object set, includes:
determining the confidence of a first pixel according to each prediction classification of the first pixel in the first object and the pixel label of the first object, wherein the first pixel is any pixel in the first object;
Determining the confidence level of the first object according to the confidence levels of all pixels in the first object;
and determining a confidence mean according to the confidence degrees of all the objects in the first object set.
In an alternative embodiment, noise filtering is performed on the first object set according to a noise filtering threshold value, to obtain a third object set, including:
when the confidence coefficient of the first pixel in the first object is determined to be larger than the noise filtering threshold value, marking the first pixel as noise marking;
deleting all pixel labels of pixels marked as noise marks in the first object to obtain an updated object;
and combining all the updated objects and the objects which do not have noise labels in the first object set into a third object set.
In an alternative embodiment, the first training result includes a first parameter set, the second training result includes a second parameter set, the parameters of the second type of network in the first set of optimized networks are updated by using the first training result, and the parameters of the second type of network in the second set of optimized networks are updated by using the second training result, specifically including:
performing a moving average operation on the first parameter set to obtain a third parameter set;
Replacing parameters of the second type network in the first group of optimized networks with a third parameter set;
the method comprises the steps of,
performing a moving average operation on the second parameter set to obtain a fourth parameter set;
the parameters of the second class of networks in the second set of optimized networks are replaced with a fourth set of parameters.
By the method, the noise of data in the parameters can be reduced in a moving average mode, and the trend and the characteristics of the data are better reflected, so that the updated parameters of the second type network are different from those of the first type network, and the parameters can be optimized to a certain extent.
In a second aspect, the present invention provides a segmentation apparatus for a target object, including:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring initial sample data, the initial sample data comprises a first object set and a second object set, the first object set is an object set with a pixel label, and the second object set is an object set without a pixel label;
the first training module is used for respectively inputting each object in the first object set into each network in the pre-constructed multiple groups of networks to train and obtaining multiple groups of trained optimized networks, wherein each group of optimized networks comprises a first type network and a second type network;
The noise filtering module is used for inputting the first object set into a second class network in a first group of optimized networks to perform noise filtering in the current polling period to obtain a third object set, wherein the first group of optimized networks are any one group of optimized networks in a plurality of groups of optimized networks;
the construction module is used for constructing a fourth object set from the second object set and the objects containing noise when the objects containing noise exist after noise filtering processing is executed, wherein the fourth object set is a label-free object set;
the processing module is used for inputting the fourth object set into a second class network in the first group of optimized networks to obtain a fifth object set, wherein objects in the fifth object set all have pixel pseudo tags;
the second training module is used for inputting a third object set and a fifth object set into a first type network of a second group of optimizing networks in the plurality of optimizing networks to train, acquiring a first training result, inputting a sixth object set and a seventh object set acquired by the second group of optimizing networks into the first type network in the first optimizing network to train, and acquiring a second training result, wherein the second group of optimizing networks is any one of the optimizing networks except the first group of optimizing networks, the sixth object set is an object set with a pixel label, and the seventh object set is an object set with a pixel pseudo label;
The updating module is used for updating parameters of the second type network in the first group of optimized networks by using the first training result, and after updating parameters of the second type network in the second group of optimized networks by using the second training result, if all the first type networks and/or the second type networks in the plurality of groups of optimized networks are determined to not reach preset conditions, training of the next polling period is carried out; or stopping training after determining that all the first-class networks and the second-class networks in the plurality of groups of optimized networks reach preset conditions;
and the segmentation module is used for completing the segmentation operation of the target object by utilizing the second type network in each group of optimized networks.
In a third aspect, the present invention provides a computer device comprising: the processor executes the computer instructions, thereby executing the method for dividing the target object according to the first aspect or any one of the embodiments corresponding to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the method for segmenting a target object according to the first aspect or any of its corresponding embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a segmentation method according to the 3 rd related art provided by the present invention;
FIG. 2 is a flow chart of a method of partitioning a target object according to an embodiment of the invention;
FIG. 3 is a flow chart of a segmentation method of another target object according to an embodiment of the invention;
FIG. 4 is a flow diagram of a method of noise filtering according to an embodiment of the invention;
fig. 5 is a schematic diagram showing the comparison of the effects of a target object segmentation method according to an embodiment of the present invention and a target object segmentation method in the 3 rd related art;
FIG. 6 is a block diagram of a target object segmentation apparatus according to an embodiment of the invention;
fig. 7 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Currently, image segmentation techniques typically employ training image segmentation artificial intelligence models to perform on images.
In the 1 st related art, cross learning can be utilized to reduce the influence of noisy training data on the performance of an image classification network. Two paths of randomly initialized neural networks are used to screen noise samples for each other. The clean samples screened by the first path of network are used as training samples of the second path of neural network, and the second path of network is trained by adopting a supervision loss function; the other samples are noise samples, and an unsupervised loss function is used to train the second-path neural network. The training samples screened by the second path of network adopt the same method to train the first path of neural network. The filtering of noise samples may employ a gaussian mixture model.
The method is suitable for image-level classification tasks, and when the method is used for pixel-level panorama segmentation tasks, the calculated amount can be increased in an exponential level; furthermore, this approach cannot utilize large amounts of unlabeled data; finally, the diversity of the network cannot be maximized by randomly initializing parameters of two neural networks of the same structure.
In the 2 nd related art, an early-stop strategy may be employed to reduce the impact of noisy training data on the performance of the image classification network. Based on the phenomenon that the deep network learns the information in the clean data and then fits the noise data in the training process, a method is provided for estimating the time point when the network starts fitting the noise data, at the time, the network training is stopped, and the label of the training data is corrected by the network which is currently trained, so that the noise in the labeling is reduced. Firstly, the method is used for image classification tasks, and cannot be directly used for panorama segmentation tasks at pixel level; secondly, the method requires that all training data are marked data, and the method cannot be used in semi-supervised scenes; finally, the method must ensure that there is enough training data so that the network trained before fitting the noise data can achieve a certain accuracy.
In the 3 rd related art, a semi-supervised panorama segmentation method based on cross learning may be employed, with two "teacher" networks jointly predicting pseudo tags of unlabeled data to prevent network false fits. As shown in figure 1 of the drawings,and->Two teacher networks respectively, < ->For student network->Is no tag data->For the label data, no module can be used for processing noise data, two teacher networks adopt the same network structure, and different training networks are obtained by using the difference of random initialization. This method relies on completely accurate, annotated training data. In an actual application scene, a situation that noise exists in the labeling data is often encountered. When training this approach with noisy data, performance can be significantly impacted.
As described above, the existing related technology cannot well ensure the accuracy of model training and the accuracy of image segmentation under the condition that only a small amount of labeling data or noise labeling exists.
Based on this, according to an embodiment of the present invention, there is provided a segmentation method embodiment of a target object, it is to be noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that herein.
In this embodiment, a method for partitioning a target object is provided, which may be used in the above-mentioned computer device, and fig. 2 is a flowchart of a method for partitioning a target object according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:
step S101, obtaining initial sample data, where the initial sample data includes a first object set and a second object set.
Specifically, the first object set is an object set with a pixel tag, and the second object set is an object set without a pixel tag.
Step S102, each object in the first object set is respectively input into each network in the pre-constructed multiple groups of networks to train, multiple groups of trained optimized networks are obtained, and each group of optimized networks comprises a first type network and a second type network.
Specifically, each of the pre-constructed groups of networks includes a first type of network and a second type of network. The structure of the first type of network and the second type of network may be different. And respectively inputting each object in the first object set into a first type network and a second type network in each group of networks to train for a certain number of times, so that a model in each group of networks can reach a certain training precision, and obtaining a plurality of groups of trained optimized networks, wherein each group of optimized networks comprises the first type network and the second type network.
In an alternative example, for convenience of explanation, two groups of networks are taken as an example, two groups of learning group networks with different Network structures are constructed, as shown in fig. 3, each group of networks includes a first type of Network and a second type of Network, where the first type of Network may be a student Network, and the second type of Network may be a teacher Network, that is, each group of two groups of networks includes a teacher Network and a student Network, where the two groups of networks have different Network structures, one group of Network structures may be a Residual Network (res net 50), and the other group may be a modified pyramid vision Network PVTv2-B2. In the training 1 to N times, the four networks respectively perform initial training by using the data in the first object set, the Loss function can be (Cross-control Loss, CE_loss), after N times of iteration, the two teacher networks stop training, the two student networks continue subsequent training, and two groups of trained optimized networks can be obtained through the step.
Step S103, in the current polling period, the first object set is input into a second type network in the first group of optimized networks to perform noise filtering, and a third object set is obtained.
Specifically, the first set of optimized networks is any one of a plurality of sets of optimized networks. The first object set is input into a second type network in any group of optimized networks to perform noise filtering, a third object set is obtained, the first object set can be input into the second type network in each group of optimized networks to perform noise filtering, and after the data after noise filtering are processed, object sets corresponding to each group of networks respectively are obtained.
In an alternative example, for example, a pre-trained noise filter may be added to the teacher network, and noise filtering is performed on the data in the first object set by using the noise filter of each teacher network, and after data processing is performed on the noise filtering result, a corresponding object set is output, that is, an updated labeled data set is obtained.
Step S104, when the object containing noise exists after the noise filtering processing is executed, the second object set and the object containing noise are constructed into a fourth object set.
Specifically, the fourth object set is a label-free object set. When the noise filtering processing is performed and the objects containing noise exist, merging all the objects containing noise into the second object set to construct a fourth object set, namely acquiring the updated label-free object set.
Step S105, inputting the fourth object set into the second class network in the first group of optimized networks to obtain a fifth object set.
Specifically, the objects in the fifth set of objects each have a pixel pseudo tag. And inputting the updated label-free data set into a second type network in the first group of optimized networks, and classifying pixels of each object in the fifth object set by using the parameters after initial training in the second type network to obtain the fifth object set with the pseudo labels of the pixels.
In an alternative example, for example, the fifth object set is input into two teacher networks, and the object sets with the pixel pseudo labels output by the two teacher networks are acquired respectively.
Step S106, inputting the third object set and the fifth object set into the first type network of the second group of the plurality of optimized networks for training, obtaining a first training result, and inputting the sixth object set and the seventh object set obtained by the second group of optimized networks into the first type network of the first optimized network for training, obtaining a second training result.
Specifically, the second set of optimization networks is any one of the multiple sets of optimization networks except the first set of optimization networks, the sixth object set is an object set with pixel labels, and the seventh object set is an object set with pixel pseudo labels. And (3) performing the processing from step S103 to step S105 on the networks of other groups in the multiple groups of optimized networks, acquiring a sixth object set and a seventh object set acquired by the second group of optimized networks, inputting a third object set and a fifth object set acquired by the first group of optimized networks into the first type of networks of the second group of optimized networks for training, inputting the sixth object set and the seventh object set acquired by the second group of optimized networks into the first type of networks of the first group of optimized networks for training, namely, inputting updated labeled data and unlabeled data acquired by each group of networks into other networks for cross training, and respectively acquiring training results of each group of networks. Because the network structures of each group of networks are different, the advantage of doing so is that multiple paths of different network structures can be adopted to prevent false fitting, and the different network structures can be utilized to provide greater prediction complementarity, so that the accuracy of a prediction result is improved.
In an alternative example, for example, there are three groups of optimized networks A, B, C, the tagged object set and the pseudo tag object set acquired by using the a network may be input into the B network, the tagged object set and the pseudo tag object set acquired by using the B network may be input into the C network, the tagged object set and the pseudo tag object set acquired by using the C network may be input into the a, and so on in the multi-group cross learning training, the principle is that the object set acquired by the present network is used for training of other networks.
In an alternative example, for example, in a network learning group having two paths of different structures as shown in fig. 3, the labeled data filtered by the teacher network noise of the first learning group and the pseudo label data generated by the teacher network of the first learning group are input into the student network of the second learning group for a certain number of iterative training, the loss function of the labeled data and the loss function of the pseudo label data may be different, for example, the loss function corresponding to the labeled data may be ce_loss, and the loss function corresponding to the pseudo label data may be (Consequential_loss, abbreviated as Con_loss). And similarly, inputting the labeled data filtered by the teacher network noise of the second learning group and the pseudo-labeled data generated by the teacher network of the second learning group into the student networks of the first learning group for iterative training for a certain number of times, wherein the loss function of the labeled data and the loss function of the pseudo-labeled data can be different, and obtaining training results of the first student network and the second student network after iterative preset times.
Step S107, updating parameters of the second class network in the first group of optimized networks by using the first training result, and after updating parameters of the second class network in the second group of optimized networks by using the second training result, if it is determined that all the first class networks and/or the second class networks in the plurality of groups of optimized networks do not reach the preset condition, performing the training of the next polling period.
Specifically, the training result is the network parameters of the first type network after the iteration of the preset times, the parameters of the first type network are used for updating the parameters of the second type network in the first group of optimized networks, the parameters of the second type network in the second group of optimized networks are used for updating the parameters of the second type network, whether the first type network and the second type network reach the preset standard or not is judged, and if the preset standard is not reached, the next polling period is trained.
In an alternative example, for example, in the network learning group with two paths of different structures shown in fig. 3, parameters of the first teacher network are updated by using parameters of the first student network, parameters of the second teacher network are updated by using parameters of the second student network, and then it is determined whether the loss functions of the two student networks reach a convergence state or fall slowly, if not, the next polling cycle training is performed, that is, the steps of noise filtering, cross training and parameter updating are performed again.
Or alternatively, the process may be performed,
and step S108, if all the first type networks and the second type networks in the plurality of groups of optimized networks reach the preset conditions, stopping training.
Specifically, after all the first-class networks and the second-class networks in the plurality of groups of optimized networks are determined to reach preset conditions, training is stopped, and a plurality of groups of trained networks are obtained.
Step S109, the second type network in each group of optimized networks is utilized to complete the image segmentation operation.
Specifically, the second type network in each group of optimized networks is utilized to segment the image, and the segmentation average value of the second type networks can be utilized to determine the final segmentation result of the target object.
According to the method for segmenting the target object, provided by the invention, the initial first object set with the pixel tag and the second object set without the pixel tag are acquired, the first object set is input into the pre-built multi-group network for iterative training for a certain number of times, each group of networks has certain training precision, each group of networks comprises a first class network and a second class network, for the noise labeling possibly contained in the pixel tag, the second class network of each group of networks is utilized for noise filtering, the noise labeling data can be recognized to a larger extent, the pixels and images of the noise labeling are processed, the third object set is acquired, so that the accuracy of training samples is greatly improved, namely the fourth object set is acquired, because the second class network is subjected to iterative training for a certain number of times, the output training data after the first group of networks is processed are input into the first class network of other groups of networks for training, the first class of networks can be utilized for updating the parameter of the first class network after a certain number of training, the noise labeling data can be more accurately updated, the quality of the training model can be realized when the filter model of the first class of the second class of networks is more accurate than the filter model is updated, the filter model is not more accurate, the quality of the training sample is achieved, the quality of the sample is greatly improved, the quality of the filter model is achieved when the filter model is not has been segmented by the iterative training of the quality of the filter model is more improved, and the quality of the filter model is achieved, and the quality of the filter model is more improved, the accuracy of the network segmentation model is further improved.
In an alternative example, for example, in a network learning group having two different structures as shown in fig. 3, the target object is segmented by using two teacher networks, and the final segmentation result is determined by using the segmentation mean of the two teacher networks.
In an alternative embodiment, during the current polling period, the first object set is input to the second type of network in the first group of optimized networks for noise filtering, and the third object set is obtained, which includes the method steps as shown in fig. 4:
in step S401, in the current polling period, a first object set is input into a first type network in a first group of optimized networks, and feature extraction is performed on a first object in the first object set, so as to obtain pixel features corresponding to each pixel in the first object.
Specifically, the first object is any object in the first object set.
In an alternative example, the first object is input into a pre-trained neural network in a first type of network to extract features of pixels in the imageThe size is +.>Wherein->For the width features of the image +.>For the height characteristics of the image, +.>For the classification feature dimension of pixels in the image there is +. >Individual pixel classification feature->
Step S402, a pixel data set is generated according to the pixel characteristics respectively corresponding to each pixel of all objects and the pixel labels respectively corresponding to each pixel.
Specifically, the pixel characteristics corresponding to each pixel and the pixel labels corresponding to each pixel are respectively generated to generate a pixel data set.
In an alternative embodiment, generating a pixel data set according to the pixel characteristics respectively corresponding to each pixel of all objects and the pixel labels respectively corresponding to each pixel includes:
respectively carrying out random sampling processing on pixels corresponding to each type of pixel labels in the first object to obtain a target pixel set corresponding to the first pixels;
and forming a pixel data set by the target pixel sets respectively corresponding to all the objects.
In particular, since image segmentation is a pixel-level classification task. If we regard each pixel on each image in the training set as a training sample, the calculation amount of the subsequent noise filtering process will be huge, and the high calculation cost is that the common graphics processor (Graphics Processing Unit, GPU for short) cannot bear, so that the pixels corresponding to each type of pixel labels can be randomly sampled in a random sampling manner, so as to obtain a target pixel set corresponding to the first pixel, and the target pixel sets corresponding to all objects respectively form a pixel data set. Therefore, the comprehensiveness of data can be guaranteed, the calculation force can be reduced, and the calculation efficiency is guaranteed.
In an alternative example, each type of tag in all objects in the first set of objects may be randomly sampled to construct a representative dataset, and all of the first objects may be labeled firstIs +.>I represents the ith row, j represents the jth column, from which M are randomly sampled, typically M is less than the label +.>The number of pixels in the class. Determining whether the pixels are characterized->Pixel feature corresponding to the upper part->And will (++>,/>) And placing the data in the representative data set, wherein k is the kth data in the representative data set.
Repeating the above steps for all images in the training set to obtain a representative data setWherein->For the pixel characteristic of the kth pixel, +.>A pixel label for the kth pixel.
Step S403, performing noise estimation operation on the pixel data set, and determining a noise proportion average value corresponding to the pixel data set.
Specifically, noise estimation operation is performed on the pixel data set, noise characteristics of each pixel classification in the pixel data set are determined, and then a noise proportion average value of the pixel data set is determined.
In an alternative embodiment, the step S403 specifically includes the following method steps:
and a step a1, performing noise estimation operation on the pixel data set, and determining a first probability and a second probability corresponding to each type of pixel labels respectively.
Specifically, the first probability is a probability that the first type of pixel tag is a true value, and the second probability is a probability that the first type of pixel tag is a given true value when the first type of pixel tag is a true value, and the first type of pixel tag is any type of pixel tag.
And a step a2, determining the noise proportion of the first type pixel labels according to the first probability and the second probability of the first type pixel labels.
Specifically, in an alternative example, a High-order component (High-Order Consensuses, abbreviated as HOC) method may be used to perform noise estimation, and for each type of pixel tag, a first probability may be obtainedAnd a second probabilityAccording to the two probabilities, the noise proportion of the pixel label can be obtained, and the noise proportion is specifically shown in the following formula:
(equation 1)
Wherein, the liquid crystal display device comprises a liquid crystal display device,noise ratio for pixel label class c, +.>Can be according to->And->And (5) obtaining by using a Bayesian theorem.
And a3, determining a noise proportion average value according to the noise proportions of all classes of pixel labels.
Specifically, the average value of the noise ratios of all the class pixel labels is used as the noise ratio average value of the pixel data set, and because the pixel data set is obtained by randomly sampling from the first object set, the noise ratio average value of the pixel data set can be used as the noise ratio average value of the first object set.
It should be noted that, for the pixel class c, if the first type network in the first set of optimized networks we use determines that the noise ratio isThe noise ratio determined using the first type network of the second set of optimized networks is +.>Similarly, the noise ratio determined using the first type network in the N-th group of optimized networks is +.>That is, because the network structure of each set of optimized networks may be different, the proportion of noise output for the same pixel class may also be different.
Step S404, inputting the first object into a second class network in the first group of optimized networks for prediction, and obtaining the prediction classification corresponding to each pixel in the first object.
In an alternative example, such as in the learning team with two sets of networks shown in FIG. 3, from the first set of objectsRandomly selecting an object, e.g. an object as an image, and applying the imageInputting into teacher network of the first group to obtain predictive classification result +.>At the same time, inputting the image into teacher network of the second group to obtain prediction classification resultThe prediction classification result comprises the prediction classification result of each pixel.
Step S405, determining a confidence mean value of the first object set according to the prediction classification and the pixel label corresponding to each pixel in all the objects of the first object set.
Specifically, according to the prediction classification and the pixel label corresponding to each pixel in all the objects of the first object set, the confidence coefficient of each pixel is determined, and then the confidence coefficient mean value of the first object set can be determined.
In an alternative embodiment, the step S405 specifically includes the following method steps:
step b1, determining the confidence of the first pixel according to each prediction classification of the first pixel in the first object and the pixel label of the first object.
Specifically, the first pixel is any pixel in the first object. The confidence of the first pixel is determined based on the prediction classifications of the second class network outputs in each set of optimized networks of the first pixel and the pixel labels (given true values) of the first pixel.
In an alternative example, two study groups are taken as an example, the imagesThe confidence level of the f-th pixel of (c) can be expressed by the following formula:
(equation 2)
Wherein, the liquid crystal display device comprises a liquid crystal display device,for the pixel label of the f-th pixel, i.e. given a true value, a is the weight coefficient, +.>Predictive classification result for f-th pixel of e-th image output by teacher network of first learning group,>predictive classification result for f-th pixel of e-th image output by teacher network of second learning group, >F-th pixel for the e-th image,>for cross entropy loss function Cross Entropy Loss, +.>Is a loss of divergence function Kullback-Leibler divergence Loss.
And b2, determining the confidence of the first object according to the confidence of all pixels in the first object.
Specifically, in an alternative example, the average value of the confidence degrees of all the pixels of the first object may be taken as the confidence degree of the first object, and of course, other methods for determining the confidence degrees may be adopted, which are more reasonable, such as variance.
And b3, determining a confidence mean value according to the confidence degrees of all the objects in the first object set.
Specifically, in an alternative example, an average value of the confidence degrees of all the objects in the first object set may be used as the confidence average value of the first object set.
Step S406, determining a noise filtering threshold according to the noise proportion average value and the confidence average value.
Specifically, the noise ratio average and the confidence average may be substituted into the following formula to determine the noise filtering threshold:
(equation 3)
Wherein, the liquid crystal display device comprises a liquid crystal display device,filtering threshold for noise, ++>Is constant (I)>For the confidence mean of the first set of objects,for the noise ratio mean of the first set of objects, +. >Is a multiplication operation.
Step S407, performing noise filtering on the first object set according to the noise filtering threshold value to obtain a third object set.
Specifically, after the noise filtering threshold is obtained, pixels with confidence coefficient greater than the noise filtering threshold in the first object set are processed, and a third object set is obtained.
In an alternative embodiment, step 407 above specifically includes the following method steps:
and c1, marking the first pixel as noise labeling when the confidence coefficient of the first pixel in the first object is determined to be larger than the noise filtering threshold value.
And c2, deleting all pixel labels of pixels marked as noise marks in the first object to obtain an updated object.
And c3, combining all updated objects and the objects without noise labels in the first object set into a third object set.
Specifically, the confidence of each pixel in the first object is compared with a noise filtering threshold, and when the confidence of the pixel is determined to be greater than the noise filtering threshold, the label of the pixel is marked as noise label. And deleting all labels of pixels marked as noise labels for each object in the first object set, acquiring updated objects, and forming a third object set by all the updated objects and other objects without noise labels in the first object set.
By the method, for each object in the first object set, firstly, extracting the characteristics of the pixel level, generating a pixel data set according to the pixel characteristics and the labels corresponding to the pixels, performing noise estimation operation on the pixel data set, determining a noise proportion mean value corresponding to the pixel data set, determining a confidence coefficient mean value of the first object set according to the confidence coefficient of each pixel, and determining a noise filtering threshold value according to the noise proportion mean value and the confidence coefficient mean value; and carrying out noise filtering on the first object set according to the noise filtering threshold value to obtain a third object set, so that noise labels in the first object set can be accurately identified, and the sample quality is improved.
In an alternative embodiment, the first training result includes a first parameter set, the second training result includes a second parameter set, the parameters of the second type network model in the first set of optimized networks are updated by using the first training result, and the parameters of the second type network model in the second set of optimized networks are updated by using the second training result, specifically including the following method steps:
step d1, carrying out a moving average operation on the first parameter set to obtain a third parameter set.
And d2, replacing the parameters of the second type of network in the first group of optimized networks with a third parameter set.
Specifically, for the first group of optimized networks, after the first group of networks complete the training of a polling period, a group of model parameters are obtained, the parameters are subjected to a moving average operation to obtain a third parameter set, and the parameters of the second group of networks are replaced by the third parameter set.
The method comprises the steps of,
and d3, carrying out a moving average operation on the second parameter set to obtain a fourth parameter set.
And d4, replacing the parameters of the second type network in the second group of optimized networks with a fourth parameter set.
Specifically, for the second group of optimized networks, after the first group of networks complete the training of a polling period, a group of model parameters are obtained, a moving average operation is performed on the parameters to obtain a fourth parameter set, and the parameters of the second group of networks are replaced by the fourth parameter set.
That is, for each group of optimized networks, after the first type network completes the training of one polling period, the parameters obtained by the first type network are subjected to a running average operation, and then the parameters of the second type network are updated.
In order to make the method of the present invention clearer, the present invention also provides a specific embodiment, as shown in fig. 3, for convenience of explanation, taking two sets of networks as an example, fig. 3 includes two sets of networks, LG1 and LG2, wherein, Teacher network for the first learning group, < ->For the first learning group's student network, +.>Teacher network for the second learning team, +.>For the second learning group's student network, +.>For pixel label data, +.>For no-pixel tag data, firstly, four networks respectively utilize the tagged data to perform preset timesIs the loss function of all four networks as CE_loss, then will +.>Inputting into teacher's network of the first learning group for noise filtering to obtain labeled data +.>Incorporating the noise-labeled data into unlabeled data and inputting to +.>Pseudo tag data set for acquiring tag-free data +.>Will->Inputting the filtered noise into teacher network of the first learning group to obtain labeled data +.>Incorporating the noise-labeled data into unlabeled data and inputting to +.>Pseudo tag data set for acquiring tag-free data +.>Will->And->The learning network parameters are updated to the parameters of the teacher network after the iterative training is carried out on the student networks input to the second learning group for a certain number of times, and the learning network parameters are updated to the parameters of the teacher network>And->And (3) carrying out iterative training on the student networks of the first learning group with input values, wherein the loss function of the labeled data is CE_loss, the loss function of the pseudo-labeled data is Con_loss, inhibiting the back propagation of the teacher network in the iterative process, updating the parameters of the teacher network by the parameters of the learning network after a certain number of times, and repeating the steps of noise filtering, student network training and parameter updating until the two student networks are converged.
The effectiveness of the method provided by the invention is tested on a public data set Pascal VOC 2012[1], and the evaluation index is equal-cross-ratio mIoU (%). Three different types of noise are respectively added into the marked data, and PL in the noise type represents a pseudo tag generated according to a Self-supervision constant attention-changing mechanism (Self-supervised Equivariant Attention Mechanism, SEAM) 2 of the existing weak supervision panoramic segmentation method; RDE represents randomly expanding or contracting the truth value segmentation label; SCP represents the tag class in the randomly changing image. From the test results (table 1), it can be seen that our method is comparable to the 3 rd related art performance provided above when there is no noise in the data; when noise is introduced into the data, the 3 rd correlation technique performance is obviously reduced, and compared with the 3 rd correlation technique performance, the method is obviously improved.
List one
The trends of the method and the 3 rd related technology in the test process are shown in fig. 5, it can be seen that under PL noise, the proportions of noise images and noise pixels in the labeling data set are controlled by changing the number of images with noise introduced, the coordinates without brackets in the abscissa indicate the proportions of noise images in the labeling data to all images, the coordinates in the brackets indicate the proportions of noise pixels in the labeling data to all pixels, and the performance of the algorithm and the 3 rd related technology in the condition of different proportions of noise images and noise pixels are tested, as shown in fig. 5. The method of the present invention is superior to the method proposed in the 3 rd related art in terms of various noise ratios. That is, when noise labeling is present, the method of the present invention is significantly superior to the method of the 3 rd related art.
The embodiment also provides a target object segmentation device, which is used for implementing the above embodiment and the preferred implementation manner, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides a target object segmentation apparatus, as shown in fig. 6, including:
an obtaining module 601, configured to obtain initial sample data, where the initial sample data includes a first object set and a second object set, and the first object set is an object set with a pixel tag, and the second object set is an object set without a pixel tag;
a first training module 602, configured to input each object in the first object set into each network in the pre-constructed multiple groups of networks for training, and obtain multiple groups of trained optimized networks, where each group of optimized networks includes a first type network and a second type network;
the noise filtering module 603 is configured to input, in a current polling period, a first set of objects into a second type of network in a first group of optimized networks to perform noise filtering, and obtain a third set of objects, where the first group of optimized networks is any one of multiple groups of optimized networks;
A construction module 604, configured to construct a fourth object set from the second object set and the object containing noise when the object containing noise exists after the noise filtering process is performed, where the fourth object set is a label-free object set;
the processing module 605 is configured to input the fourth object set into a second class network in the first group of optimized networks, and obtain a fifth object set, where objects in the fifth object set all have pixel pseudo labels;
the second training module 606 is configured to input the third object set and the fifth object set into a first type network of a second group of optimization networks in the plurality of optimization networks to perform training, obtain a first training result, and input a sixth object set and a seventh object set obtained by the second group of optimization networks into the first type network in the first optimization network to perform training, obtain a second training result, where the second group of optimization networks is any one of the multiple groups of optimization networks except the first group of optimization networks, the sixth object set is an object set with a pixel tag, and the seventh object set is an object set with a pixel pseudo tag;
the updating module 607 is configured to update parameters of the second type of network in the first set of optimized networks using the first training result, and perform training of a next polling period if it is determined that all of the first type of network and/or the second type of network in the plurality of sets of optimized networks do not reach a preset condition after updating parameters of the second type of network in the second set of optimized networks using the second training result; or stopping training after determining that all the first-class networks and the second-class networks in the plurality of groups of optimized networks reach preset conditions;
The partitioning module 608 is configured to complete the partitioning operation on the target object by using the second class network in each set of optimized networks.
In some alternative embodiments, the noise filtering module 602 includes:
the characteristic extraction unit is used for inputting a first object set into a second class network in a first group of optimized networks in a current polling period, extracting characteristics of first objects in the first object set, and obtaining pixel characteristics corresponding to each pixel in the first objects respectively, wherein the first objects are any one object in the first object set, and the first objects are any one object in the first object set;
the generating unit is used for generating a pixel data set according to the pixel characteristics respectively corresponding to each pixel of all the objects and the pixel labels respectively corresponding to each pixel;
the noise estimation unit is used for carrying out noise estimation operation on the pixel data set and determining a noise proportion average value corresponding to the pixel data set;
the prediction unit is used for inputting the first object into a second class network in the first group of optimized networks to perform prediction, and obtaining prediction classification corresponding to each pixel in the first object;
the determining unit is used for determining a confidence mean value of the first object set according to the prediction classification and the pixel label corresponding to each pixel in all the objects of the first object set; determining a noise filtering threshold according to the noise proportion average value and the confidence coefficient average value;
And the processing unit is used for carrying out noise filtering on the first object set according to the noise filtering threshold value to obtain a third object set.
In some alternative embodiments, the generating unit includes:
the random sampling subunit is used for respectively carrying out random sampling processing on pixels corresponding to each type of pixel labels in the first object to obtain a target pixel set corresponding to the first pixels;
and the composing subunit is used for composing the target pixel sets respectively corresponding to all the objects into a pixel data set.
In some alternative embodiments, the noise estimation unit includes:
the noise estimation subunit is used for carrying out noise estimation operation on the pixel data set, and determining a first probability and a second probability corresponding to each type of pixel label respectively, wherein the first probability is the probability that the first type of pixel label is a true value, the second probability is the probability that the first type of pixel label is a given true value when the first type of pixel label is the true value, and the first type of pixel label is any type of pixel label;
the first determining subunit is used for determining the noise proportion of the first type pixel labels according to the first probability and the second probability of the first type pixel labels; and determining a noise proportion average value according to the noise proportions of all classes of pixel labels.
In some alternative embodiments, the determining unit comprises:
a second determining subunit, configured to determine a confidence level of a first pixel according to each prediction classification of the first pixel in the first object and a pixel label of the first object, where the first pixel is any one pixel in the first object; determining the confidence level of the first object according to the confidence levels of all pixels in the first object; and determining a confidence mean according to the confidence degrees of all the objects in the first object set.
In some alternative embodiments, the processing unit includes:
a labeling subunit configured to label the first pixel as a noise label when it is determined that the confidence level of the first pixel in the first object is greater than the noise filtering threshold;
a deleting subunit, configured to delete pixel labels of all pixels marked as noise labels in the first object, and obtain an updated object;
and the combining subunit is used for combining all the updated objects and the objects without noise labels in the first object set into a third object set.
In some alternative embodiments, the update module 607 includes:
the first moving average unit is used for carrying out moving average operation on the first parameter set and obtaining a third parameter set;
A second replacing unit, configured to replace parameters of a second type of network in the first group of optimized networks with a third parameter set;
the method comprises the steps of,
the second moving average unit is used for carrying out moving average operation on the second parameter set and obtaining a fourth parameter set;
and the second replacing unit is used for replacing the parameters of the second type of network in the second group of optimized networks with the fourth parameter set.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The target object partitioning means in this embodiment are presented in the form of functional units, here referred to as ASIC (Application Specific Integrated Circuit ) circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the functionality described above.
The embodiment of the invention also provides computer equipment, which is provided with the target object segmentation device shown in the figure 6.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 7, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 7.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area and a storage object area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage object region may store objects created according to the use of the computer device, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method of segmenting a target object, the method comprising:
acquiring initial sample data, wherein the initial sample data comprises a first object set and a second object set, wherein the first object set is an object set with a pixel label, and the second object set is an object set without a pixel label;
inputting each object in the first object set into each network in a plurality of pre-constructed networks respectively for training, and obtaining a plurality of trained optimized networks, wherein each optimized network comprises a first type network and a second type network;
in the current polling period, inputting the first object set into a second class network in a first group of optimized networks to perform noise filtering to obtain a third object set, wherein the first group of optimized networks are any one group of optimized networks in the plurality of groups of optimized networks;
When the noise filtering processing is executed and then the object containing noise exists, constructing a fourth object set by the second object set and the object containing noise, wherein the fourth object set is a label-free object set;
inputting the fourth object set into a second class network in the first group of optimized networks to obtain a fifth object set, wherein objects in the fifth object set all have pixel pseudo tags;
inputting the third object set and the fifth object set into a first type network of a second group of optimizing networks in a plurality of optimizing networks for training, obtaining a first training result, inputting a sixth object set and a seventh object set obtained by the second group of optimizing networks into the first type network in the first optimizing network for training, and obtaining a second training result, wherein the second group of optimizing networks is any one of the optimizing networks except the first group of optimizing networks in a plurality of groups of optimizing networks, the sixth object set is an object set with pixel labels, and the seventh object set is an object set with pixel pseudo labels;
updating parameters of a second type network in the first group of optimized networks by using the first training result, and after updating parameters of the second type network in the second group of optimized networks by using the second training result, if all the first type networks and/or the second type networks in the plurality of groups of optimized networks are determined to not reach preset conditions, training of the next polling period is performed;
Or stopping training after determining that all the first-class networks and the second-class networks in the plurality of groups of optimized networks reach preset conditions;
and completing the segmentation operation on the target object by using the second class network in each group of optimized networks.
2. The method of claim 1, wherein inputting the first set of objects into a second class of network in the first set of optimized networks for noise filtering during the current polling period to obtain a third set of objects comprises:
in the current polling period, inputting the first object set into a second class network in a first group of optimized networks, extracting features of first objects in the first object set, and obtaining pixel features corresponding to each pixel in the first objects, wherein the first objects are any one object in the first object set;
generating a pixel data set according to the pixel characteristics respectively corresponding to each pixel of all the objects and the pixel labels respectively corresponding to each pixel;
performing noise estimation operation on the pixel data set, and determining a noise proportion mean value corresponding to the pixel data set;
inputting the first object into a second class network in a first group of optimized networks to predict, and obtaining prediction classification corresponding to each pixel in the first object;
Determining a confidence mean value of the first object set according to the prediction classification and the pixel label corresponding to each pixel in all the objects of the first object set;
determining a noise filtering threshold according to the noise proportion average value and the confidence coefficient average value;
and carrying out noise filtering on the first object set according to the noise filtering threshold value to obtain the third object set.
3. The method according to claim 2, wherein generating the pixel data set according to the pixel characteristics respectively corresponding to each pixel of all objects and the pixel labels respectively corresponding to each pixel comprises:
respectively carrying out random sampling processing on pixels corresponding to each type of pixel labels in a first object to obtain a target pixel set corresponding to the first pixels;
and respectively forming target pixel sets corresponding to all the objects into the pixel data set.
4. A method according to claim 3, wherein said performing a noise estimation operation on said set of pixel data to determine a noise ratio average corresponding to said set of pixel data comprises:
performing noise estimation operation on the pixel data set, and determining a first probability and a second probability corresponding to each type of pixel label respectively, wherein the first probability is a probability that the first type of pixel label is a true value, the second probability is a probability that the first type of pixel label is a given true value when the first type of pixel label is a true value, and the first type of pixel label is any type of pixel label;
Determining the noise proportion of the first type pixel labels according to the first probability and the second probability of the first type pixel labels;
and determining the average value of the noise proportion according to the noise proportions of all classes of pixel labels.
5. The method according to claim 2, wherein determining the confidence mean of the first object set according to the prediction classification and the pixel label corresponding to each pixel in all objects of the first object set, respectively, comprises:
determining the confidence of a first pixel in the first object according to each prediction classification of the first pixel and a pixel label of the first object, wherein the first pixel is any pixel in the first object;
determining the confidence degree of the first object according to the confidence degrees of all pixels in the first object;
and determining the confidence mean according to the confidence of all the objects in the first object set.
6. The method of claim 3, wherein noise filtering the first set of objects according to the noise filtering threshold to obtain the third set of objects comprises:
marking the first pixel as a noise label when the confidence of the first pixel in the first object is determined to be greater than the noise filtering threshold;
Deleting all pixel labels of pixels marked as noise marks in the first object to obtain an updated object;
and combining all updated objects and the objects which do not have noise marks in the first object set into the third object set.
7. A method according to claim 3, wherein the first training result comprises a first set of parameters, the second training result comprises a second set of parameters, the parameters of the second type of network in the first set of optimized networks are updated with the first training result, and the parameters of the second type of network in the second set of optimized networks are updated with the second training result, comprising:
performing a moving average operation on the first parameter set to obtain a third parameter set;
replacing parameters of the second type network in the first group of optimized networks with a third parameter set;
the method comprises the steps of,
performing a moving average operation on the second parameter set to obtain a fourth parameter set;
and replacing the parameters of the second type of network in the second group of optimized networks with the fourth parameter set.
8. A segmentation apparatus for a target object, the apparatus comprising:
The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring initial sample data, the initial sample data comprises a first object set and a second object set, the first object set is an object set with a pixel label, and the second object set is an object set without a pixel label;
the first training module is used for respectively inputting each object in the first object set into each network in the pre-constructed multiple groups of networks to train and obtaining multiple groups of trained optimized networks, and each group of optimized networks comprises a first type network and a second type network;
the noise filtering module is used for inputting the first object set into a second class network in a first group of optimized networks to perform noise filtering in a current polling period to obtain a third object set, wherein the first group of optimized networks are any one group of optimized networks in the plurality of groups of optimized networks;
the construction module is used for constructing a fourth object set from the second object set and the object containing noise when the object containing noise exists after noise filtering processing is executed, wherein the fourth object set is a label-free object set;
the processing module is used for inputting the fourth object set into a second class network in the first group of optimized networks to obtain a fifth object set, wherein objects in the fifth object set are provided with pixel pseudo tags;
The second training module is used for inputting the third object set and the fifth object set into a first type network of a second group of optimizing networks in a plurality of optimizing networks to train, obtaining a first training result, inputting a sixth object set and a seventh object set obtained by the second group of optimizing networks into the first type network in the first optimizing network to train, obtaining a second training result, wherein the second group of optimizing networks is any one of the optimizing networks except the first group of optimizing networks in a plurality of groups of optimizing networks, the sixth object set is an object set with pixel labels, and the seventh object set is an object set with pixel pseudo labels;
the updating module is used for updating parameters of a second type network in the first group of optimized networks by using the first training result, and after updating parameters of the second type network in the second group of optimized networks by using the second training result, if all the first type networks and/or the second type networks in the plurality of groups of optimized networks are determined to not reach preset conditions, training of the next polling period is carried out; or stopping training after determining that all the first-class networks and the second-class networks in the plurality of groups of optimized networks reach preset conditions;
And the segmentation module is used for completing the segmentation operation of the target object by utilizing the second type network in each group of optimized networks.
9. A computer device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of segmenting a target object according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the segmentation method of the target object according to any one of claims 1 to 7.
CN202311148049.7A 2023-09-07 2023-09-07 Target object segmentation method, device, computer equipment and storage medium Active CN116895003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311148049.7A CN116895003B (en) 2023-09-07 2023-09-07 Target object segmentation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311148049.7A CN116895003B (en) 2023-09-07 2023-09-07 Target object segmentation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116895003A true CN116895003A (en) 2023-10-17
CN116895003B CN116895003B (en) 2024-01-30

Family

ID=88311076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311148049.7A Active CN116895003B (en) 2023-09-07 2023-09-07 Target object segmentation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116895003B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115829A (en) * 2022-05-06 2022-09-27 腾讯医疗健康(深圳)有限公司 Medical image segmentation method, device, equipment, storage medium and program product
CN115205694A (en) * 2021-03-26 2022-10-18 北京沃东天骏信息技术有限公司 Image segmentation method, device and computer readable storage medium
CN115409991A (en) * 2022-11-02 2022-11-29 苏州魔视智能科技有限公司 Target identification method and device, electronic equipment and storage medium
CN116309571A (en) * 2023-05-18 2023-06-23 中国科学院自动化研究所 Three-dimensional cerebrovascular segmentation method and device based on semi-supervised learning
CN116468746A (en) * 2023-03-27 2023-07-21 华东师范大学 Bidirectional copy-paste semi-supervised medical image segmentation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205694A (en) * 2021-03-26 2022-10-18 北京沃东天骏信息技术有限公司 Image segmentation method, device and computer readable storage medium
CN115115829A (en) * 2022-05-06 2022-09-27 腾讯医疗健康(深圳)有限公司 Medical image segmentation method, device, equipment, storage medium and program product
CN115409991A (en) * 2022-11-02 2022-11-29 苏州魔视智能科技有限公司 Target identification method and device, electronic equipment and storage medium
CN116468746A (en) * 2023-03-27 2023-07-21 华东师范大学 Bidirectional copy-paste semi-supervised medical image segmentation method
CN116309571A (en) * 2023-05-18 2023-06-23 中国科学院自动化研究所 Three-dimensional cerebrovascular segmentation method and device based on semi-supervised learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG XUAN 等: "DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, pages 1 - 10 *

Also Published As

Publication number Publication date
CN116895003B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN110689038B (en) Training method and device for neural network model and medical image processing system
CN110163213B (en) Remote sensing image segmentation method based on disparity map and multi-scale depth network model
CN108961180B (en) Infrared image enhancement method and system
CN111767962B (en) One-stage target detection method, system and device based on generation countermeasure network
CN111222454B (en) Method and system for training multi-task target detection model and multi-task target detection
CN112132145B (en) Image classification method and system based on model extended convolutional neural network
CN112101364B (en) Semantic segmentation method based on parameter importance increment learning
CN115063786A (en) High-order distant view fuzzy license plate detection method
CN111815526B (en) Rain image rainstrip removing method and system based on image filtering and CNN
CN111611889A (en) Miniature insect pest recognition device in farmland based on improved convolutional neural network
CN115410059B (en) Remote sensing image part supervision change detection method and device based on contrast loss
CN113673482A (en) Cell antinuclear antibody fluorescence recognition method and system based on dynamic label distribution
CN113393385B (en) Multi-scale fusion-based unsupervised rain removing method, system, device and medium
CN114550014A (en) Road segmentation method and computer device
CN115880517A (en) Model training method and device and related equipment
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN113177956A (en) Semantic segmentation method for unmanned aerial vehicle remote sensing image
CN116895003B (en) Target object segmentation method, device, computer equipment and storage medium
CN112785610A (en) Lane line semantic segmentation method fusing low-level features
CN111612803A (en) Vehicle image semantic segmentation method based on image definition
CN116258877A (en) Land utilization scene similarity change detection method, device, medium and equipment
CN113470048B (en) Scene segmentation method, device, equipment and computer readable storage medium
CN111797732A (en) Video motion identification anti-attack method insensitive to sampling
CN116630367B (en) Target tracking method, device, electronic equipment and storage medium
CN111414922A (en) Feature extraction method, image processing method, model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant