WO2023207389A1 - Data processing method and apparatus, program product, computer device, and medium - Google Patents

Data processing method and apparatus, program product, computer device, and medium Download PDF

Info

Publication number
WO2023207389A1
WO2023207389A1 PCT/CN2023/081603 CN2023081603W WO2023207389A1 WO 2023207389 A1 WO2023207389 A1 WO 2023207389A1 CN 2023081603 W CN2023081603 W CN 2023081603W WO 2023207389 A1 WO2023207389 A1 WO 2023207389A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
image
prediction
feature
neural network
Prior art date
Application number
PCT/CN2023/081603
Other languages
French (fr)
Chinese (zh)
Inventor
徐哲
卢东焕
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023207389A1 publication Critical patent/WO2023207389A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/84Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data processing method, device, program product, computer equipment and medium.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers simulate or realize human learning behavior. To acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve its performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • Embodiments of the present application provide a data processing method, device, program product, computer equipment and medium, which can improve the accuracy of the trained predictive neural network. Subsequently, the trained predictive neural network can also be used to detect feature areas in the image. perform accurate segmentation.
  • the embodiment of this application provides a data processing method, which method includes:
  • the image area where the target object is located in the first image is the first characteristic area; and the image area where the target object is located in the second image is the second characteristic area.
  • the first image is input into the prediction neural network to obtain a first prediction result;
  • the first prediction result includes: first prediction pixel information indicating whether each pixel point of the first image belongs to the first feature area;
  • the second image is input into the prediction neural network to obtain a second prediction result;
  • the second prediction result includes: second prediction pixel information indicating whether each pixel point of the second image belongs to the second feature area;
  • Classification prediction is performed on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points belonging to the first classification and the pixel points belonging to the second classification in the second image; where, The mentioned The pixels of one category are pixels with correct label information in the second image predicted by the auxiliary neural network; the pixels of the second category are pixels with incorrect labels in the second image predicted by the auxiliary neural network. pixels of information;
  • the network parameters of the prediction neural network are optimized to obtain a trained prediction neural network.
  • the trained prediction neural network is used to perform image segmentation on the target image.
  • the embodiment of the present application also provides a data processing device, which includes:
  • An acquisition module configured to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is is the second characteristic area;
  • a first processing module configured to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel of the first image belongs to the first feature area. ;
  • a second processing module configured to input the second image into a prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information indicating whether each pixel of the second image belongs to the second feature area. ;
  • the classification module is used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points in the second image that belong to the first classification and the pixel points that belong to the second classification.
  • Pixel points wherein, the pixel points of the first classification are pixel points with correct label information in the second image predicted by the auxiliary neural network; and the pixel points of the second classification are pixel points predicted by the auxiliary neural network. Pixels with incorrect label information in the second image;
  • the optimization module is used to optimize the network parameters of the prediction neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained prediction neural network.
  • the trained prediction neural network is used to predict the target Image segmentation.
  • An embodiment of the present application also provides a computer device, including a memory and a processor.
  • the memory stores a computer program.
  • the computer program When executed by the processor, it causes the processor to execute the method in one aspect of the present application.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program includes program instructions. When executed by a processor, the program instructions cause the processor to perform the above-mentioned aspect. Methods.
  • Embodiments of the present application also provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided in various optional ways such as the above aspect.
  • Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a model training scenario provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of a data processing method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a network training scenario provided by an embodiment of the present application.
  • Figure 5 is a schematic flowchart of determining classification prediction results provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of a scenario for determining classification results provided by an embodiment of the present application.
  • Figure 7 is a schematic flowchart of determining a prediction deviation provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a model training scenario provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • This application involves artificial intelligence related technologies.
  • machine learning When performing machine learning on a model, a large amount of sample data is often required, and this large amount of sample data often has differences in sample quality. Therefore, how to train the model more accurately through sample data of varying quality has become a Problems to be solved.
  • the machine learning involved in the embodiments of this application mainly refers to how to train a predictive neural network, and then use the predictive neural network obtained through training to accurately segment the feature areas in the image.
  • the machine learning involved in the embodiments of this application mainly refers to how to train a predictive neural network, and then use the predictive neural network obtained through training to accurately segment the feature areas in the image.
  • Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • the network architecture may include a server 200 and a terminal device cluster.
  • the terminal device cluster may include one or more terminal devices. There will be no limit on the number of terminal devices here.
  • multiple terminal devices may specifically include terminal device 100a, terminal device 101a, terminal device 102a,..., terminal device 103a; as shown in Figure 1, terminal device 100a, terminal device 101a, terminal device 102a,... , the terminal device 103a can all have a network connection with the server 200, so that each terminal device can perform data interaction with the server 200 through the network connection.
  • the server 200 shown in Figure 1 can be an independent server or a server cluster composed of multiple servers. Or a distributed system, which can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms, etc.
  • Terminal devices can be: smart phones, tablets, laptops, desktop computers, smart TVs and other smart terminals. The following takes the communication between the terminal device 100a and the server 200 as an example to provide a detailed description of the embodiment of the present application.
  • FIG 2 is a schematic diagram of a model training scenario provided by an embodiment of the present application.
  • the server 200 can be used to train a student model (which can also be called a model to be trained).
  • the process can be: the server 200 can obtain sample data for model training, and the sample data can include A small number of samples labeled by experts and a large number of samples labeled by non-experts.
  • Any sample data can be image data. Any image data can contain several pixels. Any pixel in any image data has a label information. This data can contain target objects. The label information of any pixel is used to indicate whether the pixel belongs to the target object in the image.
  • the image area where the target object is located in the image can be called a feature area.
  • the server 200 can input sample data marked by experts and sample data marked by non-experts into the student model, and input sample data marked by non-experts into the teacher model (which can also be called a trained model, used to assist the training of the student model).
  • the student model can generate a mask for each pixel in the sample data labeled by experts, and based on the mask, generate predicted pixel information for judging whether each pixel in the sample data labeled by experts belongs to the target object.
  • the student model can also Generate a mask for each pixel in the sample data labeled by non-experts, and generate predicted pixel information based on the mask to determine whether each pixel in the sample data labeled by non-experts belongs to the target object.
  • the teacher model can generate a mask for each pixel in the sample data labeled by non-experts, and then obtain the characteristics between each pixel in the sample data labeled by non-experts and the target prototype/background prototype based on this mask. distance.
  • the target prototype can be used to represent the characteristics of the target object in the sample data.
  • the background prototype can be used to represent the characteristics of the background image of the target object in the sample data.
  • the teacher model can use the characteristic distance and the sample data labeled by non-experts to The labeling information of each pixel in the sample data labeled by non-experts is used to determine whether the labeling information of each pixel in the sample data labeled by non-experts is correctly labeled or incorrectly labeled, and the judgment results are given to the student model.
  • the student model can use the judgment results of the teacher model (i.e., the correctly labeled pixels and incorrectly labeled pixels in the sample data labeled by non-experts) and the predicted pixel information for the sample data (including the pixels labeled by experts).
  • the predicted pixel information of each pixel in the sample data, and the predicted pixel information of each pixel in the sample data labeled by non-experts generates a prediction deviation, and corrects the network parameters of the student model based on the prediction deviation to obtain the trained student model .
  • the server 200 can use the trained student model to segment the target object in the image.
  • the server 200 can provide the segmentation result to the terminal device 100a.
  • the terminal device 100a can display the segmentation result on the terminal interface. Contact relevant technical personnel for analysis.
  • the sample data marked by experts may be the following first image
  • the sample data marked by non-experts may be the second image described below
  • the mask may be the mask area described below
  • the student model may be the predictive neural network described below.
  • the teacher network can be the following auxiliary neural network
  • the predicted pixel information of the student model for each pixel in the sample data can be included in the following first predicted pixel information and the second predicted pixel information
  • the above-mentioned target prototype can be the following target center feature
  • the above-mentioned background prototype may be the following central feature of the background. Therefore, the specific process of how to train the student model through the teacher model can be referred to the description in the corresponding embodiments of Figure 3 and Figure 5 below.
  • the teacher model determines whether the labeling information of each pixel in the sample data labeled by non-experts is correctly labeled or incorrectly labeled based on the target prototype and background prototype of the target object, and then based on the judgment
  • the student model can perform differential training on pixels in the sample data labeled by non-experts, and supervise the training of the student model through the sample data labeled by experts to improve the training accuracy of the student model, and then train accurate students. Model.
  • Figure 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • the execution subject in the embodiment of this application may be a computer device or a computer device cluster composed of multiple computer devices.
  • the computer device can be a server or a terminal device.
  • the execution subjects in the embodiments of the present application are collectively referred to as computer devices as an example.
  • the method may include:
  • Step S101 obtain a first image and a second image containing a target object, wherein the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is the third feature area.
  • Two characteristic areas are provided.
  • the computer device can obtain the first image and the second image.
  • the number of the first image and the second image is determined according to the actual application scenario, and there is no limitation on this.
  • the first image and the second image are used for Sample data for training a predictive neural network.
  • both the first image and the second image may contain a target object
  • the target object may be any object that needs to be segmented from the image data.
  • the target object may be determined according to the actual application scenario.
  • the method provided by this application can be applied to any image segmentation scene, which may be a two-dimensional segmentation scene or a three-dimensional segmentation scene.
  • the display form of the target object in the first image for example, the object category is different, or the object category is the same but the posture is different, or the environment in which the object is located is different, etc.
  • the display form of the target object in the second image may be different.
  • the target object may be the left ventricle, and both the first image and the second image may contain images of the left ventricle, but the images of the left ventricle contained in the first image and the images of the left ventricle contained in the second image may be different.
  • the image area where the target object in the image is located can be called a feature area. Further, the image area where the target object is located in the first image can be called the first feature area. The target in the second image can be called a feature area. The image area where the object is located is called the second feature area.
  • Both the first image and the second image may contain several pixels.
  • the first image and the second image may be two-dimensional images, and the pixels in the first image and the second image may be two-dimensional
  • the target object can be any object that needs to be segmented from a two-dimensional image.
  • the object can be specifically determined according to the actual application scenario.
  • the target object can be an object whose local structural features are highly correlated or similar.
  • the target object can be a two-dimensional object.
  • the overall texture and structure characteristics in the dimensional images are relatively similar to plants, etc.
  • the first image and the second image may be three-dimensional images, and the pixels in the first image and the second image may be three-dimensional (in this case The pixels in the first image and the second image can also be called voxels).
  • the target object can be any object that needs to be segmented from the three-dimensional image.
  • the specific object can also be determined according to the actual application scenario.
  • the implementation of this application For example, it can be applied to medical image segmentation scenarios.
  • the target object can be an object whose local structural features are relatively relevant or similar.
  • the target object can be a human organ (can be called a part) that needs to be segmented in three-dimensional image data, such as the organ ( site) can be any organ such as the left ventricle.
  • the supervision data set for the above-mentioned first image can be called first supervision data.
  • the first supervision data is used to indicate whether each pixel point in the first image belongs to the first characteristic area.
  • the first supervision data is used to indicate whether each pixel in the first image belongs to the target object.
  • the first supervision data may include: label information of each pixel in the first image.
  • the mark information of each pixel in the first image is used to respectively indicate whether each pixel belongs to the target object in the first image or to the background image of the target object in the first image.
  • the label information of each pixel point in the first image is used to indicate whether each pixel point belongs to a pixel point in the first characteristic area or belongs to an area in the first image other than the first characteristic area (i.e., the first characteristic area).
  • the background image of the target object in the first image may also be called the background image of the first feature area in the first image.
  • the supervision data set for the above-mentioned second image can be called second supervision data.
  • the second supervision data is used to indicate whether each pixel point in the second image belongs to the second feature area.
  • the second supervision data is used to indicate whether each pixel in the second image belongs to the target object.
  • the second supervision data may include: label information of each pixel in the second image. Wherein, the label information of each pixel point in the second image is used to respectively indicate whether each pixel point belongs to the target object in the second image or belongs to the background image of the target object in the second image.
  • the label information of each pixel point in the second image is used to indicate whether each pixel point belongs to a pixel point in the second characteristic area or belongs to an area in the second image other than the second characteristic area (i.e., the second characteristic area).
  • the background image of the target object in the second image may also be called the background image of the second feature area in the second image.
  • the label information of any pixel is used to indicate the position between the pixel and the target object in the image.
  • the ownership relationship can be that the pixel belongs to the target object (that is, the pixel belongs to the pixel included in the image of the target object in the image), or the ownership relationship can be that the pixel does not belong to the target object. relationship (that is, the pixel does not belong to the pixel contained in the image of the target object in the image).
  • the above-mentioned first image may be high-quality annotated sample data
  • the above-mentioned second image may be low-quality annotated sample data. This may be reflected in the accuracy of the first supervision data set for the first image. (i.e., the accuracy of the labeling information of each pixel in the first image) is higher than the accuracy of the second supervision data set for the second image (i.e., the accuracy of the labeling information of each pixel in the second image), which accuracy Sex can be subjective in the sense of accuracy.
  • the first image may be sample data marked by experts, that is, the first supervision data of the first image may be marked by professionals in the technical field; and the second image may be samples marked by non-experts.
  • the data, ie the second supervision data of the second image may be labeled by a person in a non-technical field.
  • the first image and the second image may be image data containing organs that need to be segmented, and the label information of the pixels in the first image may be Marked by professionals in the medical field, the marking information of the pixels in the second image can be marked by amateurs. Therefore, usually the accuracy of the marking information of the pixels in the first image will be higher than that of the pixels in the second image. Point labeling information.
  • the cost of obtaining a large number of high-quality annotated samples is very high, especially in the field of medical imaging that relies on expert knowledge. Therefore, in order to save the cost of obtaining samples, this paper
  • the first image in the application can be a small amount, and the second image can be a large amount.
  • This application can effectively utilize a small amount of high-quality annotation data (such as the first image) and a large amount of low-quality annotation data (such as the second image).
  • the label information of a pixel (such as the label information of a pixel in the first image or the label information of a pixel in the second image) can be recorded as 0 or 1. If the label information of a pixel is 0, It indicates that the pixel does not belong to the target object in the image. On the contrary, if the label information of a pixel is 1, it indicates that the pixel belongs to the target object in the image.
  • the label information of a certain pixel in the first image is 0, it can indicate that the pixel does not belong to the pixel of the first feature area in the first image.
  • the label information of a certain pixel in the first image If the information is 1, it can indicate that the pixel belongs to the pixel of the first feature area in the first image.
  • the label information of a certain pixel in the second image is 0, it can indicate that the pixel does not belong to the pixel of the second feature area in the second image.
  • the label information of a certain pixel in the second image is If the label information is 1, it can indicate that the pixel belongs to the pixel of the second feature area in the second image.
  • Step S102 input the first image into the prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel point of the first image belongs to the first feature area.
  • the computer device can call the prediction neural network to predict the first image, that is, predict the relationship between each pixel in the first image and the target object, and the prediction neural network can predict the relationship between the first image and the target object.
  • the prediction result of each pixel in an image is called the first prediction result.
  • the first prediction result includes: first predicted pixel information of each pixel in the first image, used to indicate whether the corresponding pixel belongs to the first feature area.
  • the first predicted pixel information of each pixel may include: the probability that the pixel belongs to the target object in the first image (which may be called the target probability), and the pixel does not belong to the target object in the first image (that is, it belongs to the target object in the first image).
  • the probability of the background image of the target object in an image can be called the background probability
  • the sum of the target probability and the background probability corresponding to any pixel in the first image can be 1.
  • the process of generating the first prediction result of the first image through the prediction neural network may include: the computer device may generate the mask area of each pixel in the first image through the prediction neural network, wherein the pixel The mask area may refer to the area used to select the main features of the pixel.
  • the computer device can predict the target probability that each pixel point in the first image belongs to the target object and the background probability that it does not belong to the target object based on the characteristics of each pixel point in the corresponding mask area in the first image through the prediction neural network, In this way, the first predicted pixel information of each pixel in the first image is obtained.
  • the first predicted pixel information of any pixel in the first image includes the target probability that the pixel belongs to the target object (such as the mask of the pixel).
  • the probability that the image features in the mask area belong to the target object) and the background probability that the pixel does not belong to the target object but belongs to the background image of the target object for example, the image features in the mask area of the pixel belong to the target the probability of features of the background image of the object.
  • the first predicted pixel information of each pixel in the first image can constitute the first prediction result.
  • Step S103 input the second image into the prediction neural network to obtain a second prediction result;
  • the second prediction result includes: second prediction pixel information respectively indicating whether each pixel point of the second image belongs to the second feature area.
  • the computer device can call the prediction neural network to predict the second image, that is, predict the relationship between each pixel in the second image and the target object, and can use the prediction neural network to predict the second image.
  • the prediction result of each pixel is called the second prediction result.
  • the second prediction result includes: second predicted pixel information indicating whether each pixel point in the second image belongs to the second feature area.
  • the second predicted pixel information may include second predicted pixel information for each pixel point in the second image. .
  • the second predicted pixel information of each pixel point may include: a target probability that the pixel point belongs to the target object in the second image, and that the pixel point does not belong to the target object in the second image (that is, belongs to the target object in the second image). background probability of the second image), the sum of the target probability and the background probability corresponding to any pixel in the second image can be 1.
  • the process of generating the second prediction result of the second image through the prediction neural network may include: the computer device may generate the mask area of each pixel in the second image through the prediction neural network, wherein the pixel of The mask area may refer to the area used to select the main features of the pixel.
  • the computer device can predict the target probability that each pixel in the second image belongs to the target object and the background probability that it does not belong to the target object based on the characteristics of each pixel in the corresponding mask area in the second image through the prediction neural network.
  • the second predicted pixel information of each pixel in the second image is obtained.
  • the second predicted pixel information of any pixel in the second image includes the target probability that the pixel belongs to the target object (such as the mask of the pixel).
  • the probability that the image features in the mask area belong to the target object) and the background probability that the pixel does not belong to the target object but belongs to the background image of the target object for example, the image features in the mask area of the pixel belong to the target the probability of features of the background image of the object).
  • the second predicted pixel information of each pixel in the second image can constitute the second prediction result.
  • the process by which the above-mentioned prediction neural network predicts the predicted pixel information of the pixels in the first image or the second image is the same as the process by which the following auxiliary neural network predicts the predicted pixel information of the pixels in the second image.
  • Step S104 perform classification prediction on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixels belonging to the first classification and the pixels belonging to the second classification in the second image.
  • the auxiliary neural network can be used in the embodiments of the present application to determine which pixels in the second image are correctly labeled (that is, have accurate labeling information) and which pixels are incorrectly labeled (that is, have inaccurate labeling information). marking information), and then allow the predictive neural network to perform differential training on correctly marked pixels and incorrectly marked pixels in the second image.
  • marking information that is, have accurate labeling information
  • marking information allow the predictive neural network to perform differential training on correctly marked pixels and incorrectly marked pixels in the second image.
  • the pixels with correct labels in the second image predicted by the auxiliary neural network can be called pixels of the first category.
  • the pixels of the first category include the pixels predicted by the auxiliary neural network. pixels with correct label information in the second image.
  • the pixels with wrong label information in the second image predicted by the auxiliary neural network can be called pixels of the second category.
  • the pixels of the second category include the pixels predicted by the auxiliary neural network. Pixels with incorrect label information in the second image.
  • the computer device can call the auxiliary neural network to generate the area center feature of the second feature area (which can be understood as the object center feature of the target object in the second image) and the pixel features of each pixel point in the second image based on the second image.
  • the area center feature of the second feature area (which can be understood as the object center feature of the target object in the second image) and the pixel features of each pixel point in the second image based on the second image.
  • the area center feature may include the target center feature of the second feature area and the background center feature of the second feature area.
  • the target center feature is used to characterize the structural features of the target object in the second image (that is, the structural features of the image within the second feature area in the second image, such as the texture structure, color structure and edge structure of the target object in the second image) and other features), in other words, the target center feature can be used to represent the features of the target object in the second image.
  • the background center feature is used to characterize the structural features of the background image of the target object in the second image (i.e., the second characteristic area of the second image in the second image).
  • Structural features of the background image such as texture structure, color structure, edge structure and other features of the background image of the target object in the second image).
  • the background center feature can be used to represent the background image of the target object in the second image. Characteristics.
  • the target center feature of the target object is obtained by integrating the pixel features of the predicted pixels belonging to the target object in the second image
  • the background center feature of the target object is obtained by integrating the predicted pixel features of the target object in the second image.
  • the pixel characteristics of the pixels in the background image are obtained; the pixel characteristics of each pixel in the second image generated by the above-mentioned auxiliary neural network can be the relatively accurate (can be determined based on experimental experience) second image generated by the auxiliary neural network.
  • the intermediate features of each pixel in the image and the pixel features of any pixel in the second image generated by the auxiliary neural network can be used to represent the structural features of the pixel.
  • the computer device can obtain the classification prediction result for the pixels in the second image through the above-generated region center features, pixel features of each pixel in the second image, and label information of each pixel in the second image.
  • the classification prediction result is used to indicate pixels belonging to the first category (ie, correctly labeled pixels) and pixels belonging to the second category (ie, incorrectly labeled pixels) on the second image.
  • the pixels in the second image can be divided into two categories through the auxiliary neural network, one category is the pixel points of the first category, and the other category is the pixel points of the second category.
  • the pixels of the first category include pixels with accurate label information (that is, correct labeling) in the second image predicted by the auxiliary neural network
  • the pixels of the second category include the second pixels predicted by the auxiliary neural network. Pixels in the image that do not have accurate labeling information (i.e., are incorrectly labeled).
  • the embodiments of the present application can continuously iteratively train the predictive neural network through several first images and several second images.
  • the network parameters of the auxiliary neural network are also It will be updated iteratively by predicting the network parameters updated (that is, optimized) by the neural network.
  • the specific principle of iteratively updating the network parameters of the auxiliary neural network by predicting the network parameters after iterative update (ie, iterative optimization) of the neural network can be found in the description in step S105 below.
  • Step S105 According to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized to obtain a trained prediction neural network.
  • the trained prediction neural network is used to perform prediction on the target image. Image segmentation.
  • the computer device may predict based on the first prediction result, the second prediction result, and the classification prediction As a result, the final prediction deviation (which can be called a prediction deviation) of the prediction pixel information of each pixel point (including each pixel point in the first image and each pixel point in the second image) of the prediction neural network is generated, and the prediction deviation is expressed by In order to represent the deviation between the predicted pixel information of each pixel predicted by the prediction neural network and the label information of the pixel, the prediction deviation can also be understood as the prediction loss of the prediction neural network.
  • the computer equipment can perform network parameter optimization based on the above prediction deviation.
  • the network parameter optimization includes any one of parameter optimization of the predictive neural network and parameter optimization of the auxiliary neural network, or a combination of both.
  • each iterative update of the predictive neural network will generate the corresponding prediction deviation through the above process.
  • the prediction deviation generated during each iterative training process of the network continuously updates the network parameters of the predictive neural network iteratively (i.e., iteratively corrects or optimizes).
  • the trained predictive neural network can be obtained.
  • network which can be called a predictive neural network after parameter optimization).
  • the trained predictive neural network includes network parameters that have been corrected (that is, optimized).
  • the completion of updating the network parameters of the predictive neural network may refer to updating the network parameters of the predictive neural network to a convergence state, or may refer to the number of iterative updates of the network parameters of the predictive neural network (ie, iterative training). times) reaches a certain number threshold, which can be set according to the actual application scenario.
  • Figure 4 is a schematic diagram of a network training scenario provided by an embodiment of the present application.
  • the first image contains multiple pixels, the first image has first supervision data, and the first supervision data includes the label information of each pixel in the first image; similarly, in the second image Also containing a plurality of pixels, the second image has second supervision data, and the second supervision data includes label information of each pixel in the second image.
  • the computer device can call the auxiliary neural network to generate a classification prediction result for each pixel point in the second image.
  • the classification prediction result includes a classification result for each pixel point in the second image.
  • the classification result of any pixel point in the second image indicates It determines whether the marking information of the pixel is correctly marked or incorrectly marked, that is, whether the pixel belongs to the first category or the second category.
  • the correctly marked pixels in the second image can be regarded as the pixels of the first category, and the incorrectly marked pixels in the second image can be regarded as the pixels of the second category. Therefore, the prediction neural network can perform differential training on the pixels in the first image, the pixels of the first category in the second image, and the pixels of the second category in the second image, thereby obtaining a trained prediction neural network.
  • the network parameters of the predictive neural network will also be passed to the auxiliary neural network, so that during the training process of the predictive neural network, the auxiliary neural network that continuously optimizes parameters can also
  • the classification results of each pixel in the second image can be determined more accurately by classifying the second image More accurate determination of the classification results of each pixel in the image can also enable more accurate training of the predictive neural network.
  • the predictive neural network can be understood as a student network (i.e. student model), the auxiliary neural network can be understood as a teacher network (i.e. teacher model).
  • This application can use a design similar to the Mean-Teacher (MT) architecture to update the network parameters of the auxiliary neural network by predicting the network parameters of the neural network, because the weighted average self-integration strategy of MT can effectively improve the intermediate feature representation and final
  • the stability and smoothness of the prediction which is very suitable for the labeling separation strategy based on the feature prototype (the above-mentioned target center feature can characterize the feature prototype of the target object) (that is, distinguishing whether the labeling information of each pixel in the second image is correctly labeled or incorrectly labeled) ), because this can obtain a more stable and smooth feature space (such as a feature space composed of the pixel features of each pixel in the second image), the process can be shown as the following formula (1):
  • t and t-1 both represent the number of iterative trainings of the predictive neural network (which can also be understood as the number of iterative optimizations of the network parameters of the auxiliary neural network), t represents the t-th iterative training, and t-1 represents the t-th iterative training.
  • t-1 iterations of training. Represents the network parameters of the auxiliary neural network after the t-th iteration training, represents the network parameters of the auxiliary neural network after the t-1 iteration of training, and ⁇ t represents the network parameters of the predictive neural network after the t-th iteration of training.
  • represents the EMA (exponential moving average) decay rate
  • can be set to 0.99.
  • the network parameters of the auxiliary neural network can also be iteratively updated through the network parameters after each iterative update of the predictive neural network, so that the network parameters of the auxiliary neural network can be updated
  • the features learned by the predictive neural network from the first image and the second image are passed to the auxiliary neural network, and then through the auxiliary neural network, the classification results of the pixels in the second image can be more accurately classified during each iterative training process. judgment.
  • the trained predictive neural network can be used to segment the target object in the image data.
  • the trained predictive neural network can identify the pixels belonging to the target object in the image.
  • the pixels of the target object can be used to segment the image area where the target object is located (i.e., the feature area) from the image.
  • the computer device can obtain a target image.
  • the target image can include a target object.
  • the image area where the target object is located in the target image can be called a target feature area.
  • the target image can be any object that needs to be segmented. image.
  • the computer device can call a trained predictive neural network (i.e., a predictive neural network with optimized parameters) to predict the target image, that is, predict the relationship between each pixel in the target image and the target object, and obtain the target image.
  • a trained predictive neural network i.e., a predictive neural network with optimized parameters
  • the target prediction result includes: a target prediction result used to determine whether each pixel point in the target image belongs to the target feature area.
  • the target prediction result includes the predicted pixel information of each pixel point in the target image.
  • the predicted pixel information of any pixel in the target image may include the target probability that the pixel belongs to the target object in the target image and the background probability that the pixel does not belong to the target object in the target image.
  • the background probability is the pixel. The probability of a background image belonging to the target object in the target image.
  • the computer device can regard the pixels in the target image whose corresponding target probability is greater than the background probability as the identified pixels belonging to the target object (that is, the identified pixels belonging to the target feature area). Therefore, the identified pixels can be The pixel points belonging to the target object are segmented from the target image, that is, the segmentation of the target object in the target image is achieved, and the image segmentation of the target feature area in the target image is also achieved.
  • high-quality annotation data (such as the first image) is difficult to obtain and usually requires experts to annotate it
  • low-quality annotation data (such as the second image) is relatively easy to obtain. Therefore, high-quality annotation data (such as the second image) is relatively easy to obtain.
  • the amount of labeled data is very small, and the labeled data of low quality is very large. Therefore, the problem of inaccurate model (network) training in this scenario can be solved by using the method provided by the embodiments of this application.
  • the method provided by the application embodiment can separate and learn samples in a mixed sample composed of a small amount of high-quality annotated data and a large amount of low-quality annotated data, thereby accurately learning the accurate characteristics of the sample, and then training to obtain an accurate model. (such as predictive neural network).
  • the same type of segmentation area that is, the area where the target object is located
  • low-quality images can be detected through the feature prototype of the target object (such as the target center feature).
  • the classification results of the pixels in the second image are accurately judged.
  • the prediction neural network is trained differentially through the pixels of different categories in the second image, and combined with the high-quality first image as supervised training data, they are trained together.
  • the predictive neural network can improve the training effect of the predictive neural network, and then train a more accurate predictive neural network. Through the trained accurate predictive neural network, it can also achieve accurate segmentation of target objects in the image.
  • Embodiments of the present application can acquire a first image with a first characteristic area and a second image with a second characteristic area; predict the first image through a prediction neural network to obtain a first prediction result; the first prediction result includes: indication First predicted pixel information of whether each pixel of the first image belongs to the first feature area; predicting the second image through a prediction neural network to obtain a second prediction result; the second prediction result includes: indicating each pixel of the second image The second predicted pixel information of whether the point belongs to the second feature area; classify and predict the pixel points of the second image through the auxiliary neural network to obtain the classification prediction result; the classification prediction result is used to indicate that the pixels belonging to the first classification on the second image Pixel points, and pixel points belonging to the second category; perform network parameter optimization based on the first prediction result, the second prediction result, and the classification prediction result.
  • the network parameter optimization includes parameter optimization of the prediction neural network, and auxiliary neural network Perform any one or a combination of both parameter optimizations. It can be seen that the method proposed in the embodiment of the present application can use the auxiliary neural network to analyze the second image Classify the pixels, and then use the auxiliary neural network to classify the pixels in the second image to optimize the parameters of the predictive neural network or the auxiliary neural network. This can improve the accuracy of parameter optimization of the predictive neural network. Subsequently, the predictive neural network with optimized parameters can also accurately segment the feature areas in the image.
  • Figure 5 is a schematic flowchart of determining a classification prediction result provided by an embodiment of the present application.
  • the execution subject in the embodiment of the present application may be the same as the execution subject in Figure 3 above.
  • the method may include:
  • Step S201 Generate regional center features of the second feature region and pixel features of each pixel in the second image based on the second image through an auxiliary neural network.
  • the computer device can input the second image into the auxiliary neural network to perform feature learning on the second image, and thereby generate pixel features for each pixel in the second image.
  • the auxiliary neural network may include multiple convolutional layers for feature learning on the second image. Therefore, the pixel feature of each pixel may be the penultimate among the multiple convolutional layers.
  • the smooth features of each pixel generated by the convolution layer because experiments have proven that the smooth features of the pixels generated by the penultimate convolution layer are better features.
  • the computer device can also predict the mask area of each pixel in the second image through an auxiliary neural network.
  • the mask area is used to select the main features of each pixel in the second image.
  • the computer device can also generate a prediction accuracy index of the mask area of each pixel in the second image.
  • the prediction accuracy index reflects the uncertainty of the mask area of each pixel in the generated second image. As the name suggests, the prediction accuracy index
  • the prediction accuracy index of the mask area of any pixel in the image represents the accuracy of the mask area of the pixel.
  • this application can perform Bayesian approximation through Monte Carlo dropout (Monte Carlo) to generate the prediction accuracy index of the mask area of each pixel in the second image.
  • the process can be:
  • Computer equipment can randomly drop (i.e., dropout) K times the network parameters of the auxiliary neural network (which can be called neurons), and then K deformed networks of the auxiliary neural network can be obtained.
  • K is a positive integer, and the specific value of K is It can be determined according to the actual application scenario.
  • Any discarding of network parameters is performed on the auxiliary neural network with complete network parameters. Any deformation network is obtained by randomly discarding the network parameters of the auxiliary neural network.
  • Randomly discarding the network parameters of the neural network can mean randomly setting some network parameters of the auxiliary neural network to 0.
  • the network parameters set to 0 are also the discarded network parameters.
  • the network parameters set to 0 will not play a role in the subsequent prediction process. .
  • the network parameters of the auxiliary neural network are randomly discarded to obtain the deformation network of the auxiliary neural network, mainly for the subsequent generation of the prediction accuracy index of the mask area of pixels through the deformation network, and
  • the above-mentioned pixel features of each pixel and the mask area of each pixel are predicted by the auxiliary neural network without discarding network parameters.
  • any pixel in the second image can be represented as a target pixel. Since the mask of each pixel is obtained, The process of predicting the accuracy index of a region is the same. Therefore, obtaining the prediction accuracy index of the mask region of the target pixel is used as an example for explanation.
  • the computer device can separately predict the predicted pixel information for the target pixel point according to the mask area of the target pixel point through each deformation network (ie, obtain the prediction based on the image features at the mask area of the target pixel point in the second image).
  • Pixel information the predicted pixel information of a pixel predicted by the deformation network can be called deformation prediction pixel information, and any deformation network can predict a deformation prediction pixel information for the target pixel.
  • the above process can be understood as performing K times of forward random inference on the target pixel, and letting K deformation networks perform K times of softmax (logistic regression) prediction on the target pixel to obtain the predicted pixel information of each deformation of the target pixel.
  • any deformation prediction pixel information may include the target probability that the target pixel predicted by the corresponding deformation network belongs to the target in the second feature area in the second image (can be called the first prediction probability, that is, the target pixel in the second image belongs to target probability of the object), and includes the background probability that the target pixel predicted by the corresponding deformation network does not belong to the second feature area in the second image (can be called the second prediction probability, that is, the target pixel that belongs to the target object in the second image probability of pixels in the background image).
  • the sum of the first prediction probability and the second prediction probability may be 1.
  • the background image of the target object in the second image refers to the image in the second image other than the image of the target object.
  • the computer device can determine the prediction accuracy index of the mask area of the target pixel point based on the K deformation prediction pixel information obtained by the above K deformation networks. For details, please refer to the following description.
  • any deformation prediction pixel information includes a first prediction probability
  • the K deformation prediction pixel information includes a total of K first prediction probabilities
  • the computer device can obtain the standard deviation between the K first prediction probabilities, and The standard deviation serves as the target prediction accuracy index for the target pixel, and the target prediction accuracy index also indicates the accuracy of predicting the target pixel as belonging to the target object.
  • any deformation prediction pixel information includes a second prediction probability. Therefore, the K deformation prediction pixel information includes a total of K second prediction probabilities, and the computer device can obtain the standard deviation between the K second prediction probabilities. , the standard deviation is used as the background prediction accuracy index for the target pixel.
  • the background prediction accuracy index indicates the accuracy of predicting the target pixel as a background image belonging to the target object.
  • both the target prediction accuracy index and the background prediction accuracy index for the target pixel can be used as the prediction accuracy index of the mask area of the target pixel.
  • the computer device can obtain the prediction accuracy index of the mask area of each pixel in the second image in the same manner as the prediction accuracy index of the mask area of the target pixel.
  • the computer equipment can also use the above-mentioned auxiliary neural network (the auxiliary neural network here refers to a network with complete network parameters, and the above-mentioned K deformed networks can be obtained by randomly discarding the network parameters of the auxiliary neural network) according to
  • the generated mask area of each pixel point in the second image is predicted to obtain a third prediction result for the second image.
  • the third prediction result may include: third prediction pixel information of each pixel point of the second image, The third of each pixel The predicted pixel information indicates whether the pixel belongs to the second feature area.
  • the third predicted pixel information of each pixel includes the probability that the pixel belongs to the second feature area in the second image predicted by the auxiliary neural network (which can be called the target probability), and includes the probability of the pixel predicted by the auxiliary neural network.
  • the probability of not belonging to the second feature area in the second image can be called background probability).
  • the computer device can generate the target object based on the pixel characteristics of each pixel in the second image generated by the auxiliary neural network, the prediction accuracy index of the mask area of each pixel in the second image, and the third prediction result. Regional center characteristics, the process is described below.
  • the computer device can obtain, from several pixels contained in the second image, pixels whose prediction accuracy index of the corresponding mask area is greater than an index threshold (the index threshold can be set according to the actual application scenario) as evaluation pixels, and the evaluation pixels Points refer to pixels in the second image whose prediction accuracy index of the corresponding mask area is greater than the index threshold.
  • the number of evaluation pixels may be at least one.
  • the target pixel is also used as an example to illustrate the evaluation pixel. Since the prediction accuracy index of the mask area corresponding to the target pixel includes the target prediction accuracy index and the background prediction accuracy index, therefore, the mask of the target pixel
  • the prediction accuracy index of the area is greater than the index threshold, which can mean that the target prediction accuracy index and the background prediction accuracy index of the mask area of the target pixel are greater than the index threshold, that is, when the target prediction accuracy index of the mask area of the target pixel is greater than
  • the target pixel can be used as an evaluation pixel.
  • the computer device can obtain several pixels that can be used as evaluation pixels among the pixels contained in the second image, thereby obtaining at least one evaluation pixel.
  • the computer device can generate the regional center feature of the target object based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point:
  • the evaluation pixel point For each evaluation pixel point in the at least one evaluation pixel point, if the third prediction pixel information of the evaluation pixel point indicates that it belongs to the second feature area (that is, the target probability in the third prediction pixel information is greater than the background probability), then the evaluation pixel point is The pixel is used as the target evaluation pixel of the second feature area (that is, the target evaluation pixel of the target object), that is, the target evaluation pixel is the target object in the second image predicted by the auxiliary neural network and corresponds to the mask. Pixels whose prediction accuracy index of the area is greater than the index threshold.
  • the evaluation pixel point is used as the background evaluation pixel point of the second feature area (that is, the background evaluation pixel point of the target object), that is, the background evaluation pixel point is the second image predicted by the auxiliary neural network and does not belong to the target.
  • the background evaluation pixel point is the second image predicted by the auxiliary neural network and does not belong to the target.
  • the computer device can be based on the pixel characteristics of the target evaluation pixel and the third predicted image of the target evaluation pixel.
  • the target probability in the voxel information (that is, the probability of belonging to the target object) is used to generate the target center feature of the second feature area.
  • the target center feature is used to represent the structural features of the target object in the second image, that is, it is used to represent the structural features of the target object in the second image. Structural features of the image in the second feature area.
  • the target center feature can be recorded as q obj . Therefore, as shown in the following formula (2), the target center feature q obj can be:
  • the label information of each pixel in the second image (0 or 1, 0 means not belonging to the target object, 1 means belonging to the target object) is represented by the same label vector, and in the auxiliary neural
  • the pixel features of each pixel in the second image generated by the network can also be included in the same feature matrix.
  • One row in the feature matrix can represent the pixel features of a pixel. Therefore, the auxiliary neural network generates the target
  • the central feature can be generated based on the operation of the feature matrix and the label vector.
  • the dimensions of the pixel features of each pixel generated by the auxiliary neural network are usually different from the dimensions of the above-mentioned label vector.
  • the pixel features of the target evaluation pixels can be upsampled through a linear interpolation method (if the pixels are three-dimensional, the linear interpolation method can be a trilinear interpolation method) to convert the target
  • the dimension of the pixel feature of the evaluated pixel is raised to the same dimension as the above label vector.
  • A represents the total number of all target evaluation pixels, a is less than or equal to A, e a represents the dimension of the pixel feature of the a-th target evaluation pixel generated by the auxiliary neural network after it is raised to the same dimension as the above label vector
  • the obtained pixel characteristics (that is, the pixel characteristics of the a-th target evaluation pixel after the dimension is increased), Indicates the probability that the a-th target evaluation pixel point in the third predicted pixel information of the a-th target evaluation pixel point belongs to the target object in the second image (i.e., target probability), by introducing each target evaluation pixel when obtaining the target center feature
  • the target probability that a point belongs to the target object can reflect the different contributions of each target evaluation pixel to the target center feature. That is, the greater the target probability of the target evaluation pixel, the greater the contribution weight of the target evaluation pixel to generating the target center feature.
  • the computer device can generate the background center feature of the second feature area based on the pixel characteristics of the background assessment pixel and the background probability in the third predicted pixel information of the background assessment pixel (that is, the probability of not belonging to the target object).
  • the background center feature is used to represent the structural features of the background image of the target object in the second image, that is, it is used to represent the structural features of the image in the second image except the image of the second feature area.
  • the background center feature can be recorded as q bg . Therefore, as shown in the following formula (3), the target center feature q bg can be:
  • B represents the total number of all background evaluation pixels, and b is less than or equal to B.
  • e b represents the dimension of the pixel feature of the bth background evaluation pixel generated by the auxiliary neural network to be increased to the same as the above label vector.
  • the pixel features obtained after the dimensions are the same (that is, the pixel features of the b-th background evaluation pixel after the dimension is increased), Indicates the probability that the b-th background evaluation pixel in the third predicted pixel information of the b-th background evaluation pixel belongs to the background image of the target object in the second image (i.e., background probability), by introducing each background when obtaining the background center feature
  • the background probability that the assessment pixel belongs to the background image of the target object can reflect the different contributions of each background assessment pixel to the background center feature. That is, the greater the background probability of the background assessment pixel, the contribution of the background assessment pixel to generating the background center feature.
  • the weight is also greater.
  • the computer device can use the above-mentioned target center feature (which can be understood as the target center feature of the target object in the second image) and the background center feature (which can be understood as the background center feature of the target object in the second image) as the second feature.
  • the region center feature of the region can be understood as the object center feature of the target object.
  • Step S202 Determine the classification prediction result based on the regional center feature, the pixel feature of each pixel in the second image, and the second supervision data.
  • the computer device can use the generated regional center features of the target object, the pixel features of each pixel in the second image, and the label information of each pixel in the second image (i.e., the second supervisory data). the label information of each pixel in the second image) to obtain the classification result of each pixel in the second image.
  • the classification result of any pixel in the second image can be the result of the pixel belonging to the first category (that is, the pixel The labeling information of the pixel is correctly labeled) or the pixel belongs to the second category of pixels (that is, the labeling information of the pixel is incorrectly labeled).
  • the computer device can obtain the characteristic distance between the pixel feature of the target pixel point and the target center feature, which can be called the first feature distance; the computer device can also obtain the pixel feature of the target pixel point and the background center
  • the characteristic distance between features can be called the second characteristic distance.
  • the above first characteristic distance can be recorded as The above second characteristic distance is recorded as Therefore, as shown in the following formula, the first characteristic distance and the second feature distance Can be:
  • em can also represent the pixel features obtained after the dimension of the pixel feature of the target pixel generated by the auxiliary neural network is raised to the same dimension as the above-mentioned label vector (that is, the target pixel after the dimension is raised) pixel features).
  • q obj represents the above-mentioned target center feature
  • q bg represents the above-mentioned background center feature.
  • ⁇ 2 represents the second norm.
  • the first feature distance is greater than the second feature distance (indicating that the target pixel is more likely to belong to the target pixel in the second image background image of the target object)
  • the label information of the target pixel point in the second supervision data is used to indicate that the target pixel point belongs to the target object in the second image (that is, a pixel point belonging to the second feature area)
  • the classification result of the target pixel is used to indicate that the target pixel belongs to the second category.
  • the label information of the target pixel in the second supervision data is used to indicate that the target pixel does not belong to The target object in the second image (that is, the pixel point that does not belong to the second feature area, that is, the background image of the target object in the second image), then it can be determined that the classification result of the target pixel point is used to indicate the target pixel point It belongs to the first category.
  • the first feature distance is smaller than the second feature distance (indicating that the target pixel is more likely to belong to the target object in the second image)
  • the label information of the target pixel in the second supervision data is used to indicate that the target pixel belongs to the second image of the target object, it can be determined that the classification result of the target pixel is used to indicate that the target pixel belongs to the first category.
  • the first feature distance is smaller than the second feature distance (indicating that the target pixel is more likely to belong to the target object in the second image)
  • the label information of the target pixel in the second supervision data is used to indicate that the target pixel does not belong to the second image
  • the target object in the second image that is, the background image belonging to the target object in the second image
  • it can be determined that the classification result of the target pixel point is used to indicate that the target pixel point belongs to the second classification.
  • the pixel feature of the target pixel points is biased towards the feature type (such as the feature type of the target object or the feature type of the background image of the target object) and the feature type indicated by the label information of the target pixel point (such as If the feature type of the target object or the feature type of the background image of the target object) are inconsistent (for example, one is the feature type of the target object and the other is the feature type of the background image of the target object), it can be considered that the marking information of the target pixel is wrong.
  • Marked that is, the classification result of the target pixel is used to indicate that the target pixel belongs to the second category in the second image; conversely, if the pixel features of the target pixel tend to be of a feature type that is consistent with the features indicated by the marking information of the target pixel
  • the types are consistent (for example, both are feature types of the target object, or both are feature types of the background image of the target object), then it can be considered that the marking information of the target pixel is correctly marked, that is, the classification result of the target pixel Used to indicate that the target pixel belongs to the first category in the second image.
  • the classification prediction result for the second image can be obtained through the classification result of each pixel point in the second image, and the classification prediction result includes the classification result of each pixel point in the second image.
  • FIG. 6 is a schematic diagram of a scenario for determining a classification result provided by an embodiment of the present application.
  • the pixel points in the second image may include pixel point 1 to pixel point W, where W is a positive integer, and the specific value of W is determined according to the actual application scenario.
  • the computer device can obtain the feature distance between the pixel feature of each pixel point in the second image and the target center feature, including the feature distance between the pixel feature of pixel point 1 and the target center feature (i.e., the first feature distance 1), The characteristic distance between the pixel feature of pixel point 2 and the target center feature (i.e., the first feature distance 2), the feature distance between the pixel feature of pixel point 3 and the target center feature (i.e., the first feature distance 3), ... and Pixel characteristics and purpose of pixel point W
  • the characteristic distance between the mark center features i.e.
  • the feature distance between each pixel point and the background center feature can be obtained, including the feature distance between the pixel feature of pixel 1 and the background center feature (i.e., the second feature distance 1), the feature distance between the pixel feature of pixel point 2 and the background center feature (i.e., the second feature distance 2), the feature distance between the pixel feature of pixel point 3 and the background center feature (i.e. The second feature distance 3), ... and the feature distance between the pixel feature of the pixel point W and the background center feature (ie, the second feature distance W).
  • the feature distance between the pixel feature of pixel 1 and the background center feature i.e., the second feature distance 1
  • the feature distance between the pixel feature of pixel point 2 and the background center feature i.e., the second feature distance 2
  • the feature distance between the pixel feature of pixel point 3 and the background center feature i.e. The second feature distance 3
  • ... the feature distance between the pixel feature of the pixel point W and the background center feature
  • the computer device can obtain the classification result of the pixel point 1 according to the first characteristic distance 1, the label information and the second characteristic distance 1 of the pixel point 1, and can obtain the classification result of the pixel point 1 according to the first characteristic distance 2, the label information and the second characteristic distance 1 of the pixel point 2.
  • the feature distance 2 obtains the classification result of pixel 2
  • the classification result of pixel 3 can be obtained based on the first feature distance 3, label information and second feature distance 3 of pixel 3,..., and can be based on the first feature distance 3 of pixel W.
  • the first feature distance W, the label information and the second feature distance W obtain the classification result of the pixel point W.
  • this application passes The regional center feature can accurately determine the classification results of each pixel in the second image.
  • the embodiments of this application focus on using the feature prototype (which can be reflected by the regional center feature) to train more robust characteristics of noise labeling to perform label separation with the assistance of the mean teacher model (that is, the above-mentioned auxiliary neural network).
  • V-Net an image segmentation network
  • U-Net a semantic segmentation network
  • DenseNet a dense connection network
  • ResNet a residual network
  • FIG. 7 is a schematic flowchart of determining a prediction deviation provided by an embodiment of the present application.
  • the execution subject in the embodiment of the present application may be the same as the execution subject in Figure 3 above.
  • the method may include:
  • Step S301 Generate a first prediction deviation of the prediction neural network based on the first prediction result and the first supervision data of the first image.
  • the computer device may generate a prediction neural network for the first image based on the first predicted pixel information of each pixel point of the first image in the first prediction result and the label information of each pixel point in the first supervision data.
  • the cross entropy loss and the image segmentation loss (Dice loss) of the image are used to obtain the prediction loss of the prediction neural network for the first image through the cross entropy loss and the image segmentation loss. This prediction loss can be called the first prediction deviation.
  • the cross-entropy loss of the prediction neural network for the first image can be recorded as L s1 , as shown in the following formula (6), the cross-entropy loss L s1 is:
  • y true.i represents the label information of the i-th pixel in the first image, that is, y true.i represents the true label of the i-th pixel, i is less than or equal to N, and N can be all pixels in the first image. total quantity. If the label information of the i-th pixel is To truly represent that the i-th pixel belongs to the target object in the first image, then y true.i can be equal to 1, otherwise, that is, if the label information of the i-th pixel is used to truly represent that the i-th pixel does not belong to the For a target object in an image, y true.i can be equal to 0.
  • y pred.i represents the probability that the i-th pixel in the first predicted pixel information of the i-th pixel predicted by the prediction neural network belongs to the target object (i.e., target probability).
  • the image segmentation loss of the prediction neural network for the first image can be recorded as L Dice1 , as shown in the following formula (7), the image segmentation loss L Dice1 is:
  • y true.i represents the label information of the i-th pixel in the first image
  • y true.i is 1 or
  • y pred.i represents the first predicted pixel information of the i-th pixel.
  • the probability that the i-th pixel belongs to the target object i.e., target probability).
  • the first prediction deviation of the above prediction neural network can be recorded as L HQ .
  • the first prediction deviation L HQ is the sum of the cross entropy loss L s1 and the image segmentation loss L Dice1 :
  • L HQ L s1 +L Dice1 (8)
  • the first prediction deviation L HQ of the prediction neural network for the first image can play a role in forward supervision training of the prediction neural network.
  • Step S302 Generate a second prediction deviation of the prediction neural network based on the second prediction pixel information of the pixels belonging to the first category in the second prediction image and the second supervision data of the second image.
  • the marking information of the first classified pixels in the second supervision data can be called the first marking information, that is, the first marking information includes: the preset second supervision data for marking Marking information of whether the pixels belonging to the first category on the second image belong to the second feature area.
  • the computer device may generate a prediction loss of the prediction neural network for the pixels of the first classification based on the second predicted pixel information of the pixels belonging to the first classification in the second image and the first label information, and the prediction loss may be referred to as is the second prediction bias.
  • the computer device may generate the intersection of the prediction neural network for the pixels of the first category based on the second predicted pixel information of each pixel of the first category in the second image and the label information of each pixel of the first category.
  • Entropy loss and image segmentation loss (Dice loss)
  • the second prediction deviation of the prediction neural network for the pixels of the first classification is obtained.
  • the cross-entropy loss of the prediction neural network for the first classification pixels can be recorded as L s2 , as shown in the following formula (9), the cross-entropy loss L s2 is:
  • y true.j represents the label information of the j-th pixel in the first category of pixels, that is, y true.j represents the j-th pixel.
  • the true label of the point, j is less than or equal to M, and M can be the total number of all pixels belonging to the first category. If the label information of the j-th pixel is used to truly represent that the j-th pixel belongs to the target object in the second image, then y true.j can be equal to 1, otherwise, that is, if the label information of the j-th pixel is used True means that the j-th pixel does not belong to the target object in the second image, then y true.j can be equal to 0.
  • y pred.j represents the probability that the j-th pixel in the second predicted pixel information of the j-th pixel belongs to the target object (ie, target probability).
  • the image segmentation loss of the predictive neural network for the first category pixels can be recorded as L Dice2 .
  • the image segmentation loss L Dice2 is:
  • y true.j represents the label information of the j-th pixel
  • y true.j is 1 or 0,
  • y pred.j represents the j-th pixel in the second predicted pixel information of the j-th pixel.
  • the probability that a point belongs to the target object i.e., target probability).
  • the second prediction deviation of the above prediction neural network can be recorded as L ls , as shown in the following formula (11).
  • the second prediction deviation L ls obtained above is the prediction loss of the prediction neural network for correctly marked pixels (that is, pixels of the first category).
  • Step S303 Generate a third prediction deviation of the prediction neural network based on the second prediction pixel information of the pixels belonging to the second category in the second image and the second supervision data of the second image.
  • the labeling information of the second classification of pixels in the second supervision data can be called the second labeling information, that is, the second labeling information includes: the preset second supervision data for labeling. Marking information of whether the pixels belonging to the second category on the second image belong to the second feature area.
  • the computer device may generate a prediction loss of the prediction neural network for the pixels of the second classification based on the second predicted pixel information and the second label information of the pixels belonging to the second classification in the second image, and the prediction loss may be referred to as is the third prediction bias.
  • the third prediction deviation of the prediction neural network for the pixels of the second classification can be obtained by entropy minimization loss, that is, The training effect of the pixel points of the second classification on the prediction neural network can be used as a third prediction deviation with smaller influence (smaller entropy).
  • the third prediction deviation can be recorded as L ent , as shown in the following formula (12).
  • the third prediction deviation L ent can be:
  • F obj.g represents the probability that the g-th pixel in the second predicted pixel information of the second classification pixel belongs to the target object in the second image (i.e., target probability)
  • F bg .g represents the second prediction of the g-th pixel
  • G is the total number of all pixels in the second category, and g is less than or equal to G.
  • Step S304 Generate the prediction deviation of the prediction neural network based on the first prediction deviation, the second prediction deviation and the third prediction deviation.
  • the computer device can generate the final prediction loss of the prediction neural network based on the first prediction deviation, the second prediction deviation and the third prediction deviation obtained above, and the prediction loss is also the prediction deviation of the prediction neural network (i.e. Predict the final prediction bias of the neural network).
  • the prediction deviation refers to the deviation of the prediction neural network's predicted pixel information for pixels (including pixels of the first image and pixels of the second image).
  • the computer device can obtain the prediction neural network for the second image based on the second prediction deviation of the prediction neural network for the pixels of the first category and the third prediction deviation of the prediction neural network for the pixels of the second category obtained above.
  • the final prediction loss can be called the comprehensive prediction deviation of the prediction neural network for the second image.
  • the comprehensive prediction deviation can be recorded as L LQ .
  • the computer device also obtains the weighting coefficient for the comprehensive prediction deviation, and then can weight the comprehensive prediction deviation according to the weighting coefficient to obtain the weighted comprehensive prediction deviation, and then the computer device can calculate the weighted coefficient according to the first prediction deviation and the weighted comprehensive prediction deviation.
  • the comprehensive prediction deviation generates the final prediction deviation of the prediction neural network (that is, the final prediction loss value).
  • the weighted coefficient of the comprehensive prediction deviation can be composed of a Gaussian function that slopes as the training time (number of times) increases. Since the prediction neural network can be trained for multiple iterations, the t-th iteration of the prediction neural network can be During the training process, the weighting coefficient of the comprehensive prediction deviation is recorded as ⁇ (t). As shown in the following formula (14), the weighting coefficient ⁇ (t) can be:
  • t max represents the preset maximum number of iterative training times for the prediction neural network, which can be called the maximum number of iterations
  • e represents a natural constant
  • the method of obtaining the weighting coefficient for the comprehensive prediction deviation in the current iterative training process of the predictive neural network can be: the computer can obtain the current iterative correction of the network parameters of the predictive neural network (i.e., the current iteration training) (can be called the current iteration number), and can obtain the preset prediction neural network The maximum number of iterations for iteratively correcting the network parameters of the network, and then the computer device can substitute the current number of iterations into t in the above formula (14), and substitute the maximum number of iterations into t max in the above formula (14), that is The weighting coefficient for the comprehensive prediction deviation in the current iterative training process can be obtained.
  • the computer can obtain the current iterative correction of the network parameters of the predictive neural network (i.e., the current iteration training) (can be called the current iteration number), and can obtain the preset prediction neural network
  • the maximum number of iterations for iteratively correcting the network parameters of the network and then the computer device can substitute the
  • the prediction loss (i.e., prediction deviation) of the prediction neural network can be recorded as L z .
  • L HQ in formula (14) is the t-th iterative training process.
  • the first prediction deviation, L LQ is the comprehensive prediction deviation in the t-th iterative training process, and the obtained L z is the prediction loss in the t-th iterative training process.
  • the network parameters of the predictive neural network can be iteratively optimized through the prediction deviation L z obtained during each training process to obtain the predictive neural network with final parameter optimization completed (ie, the trained predictive neural network).
  • the smaller the number of iterative training times t the smaller the weighted coefficient of the comprehensive prediction deviation will be.
  • the larger the number of iterative training times t the larger the weighted coefficient of the comprehensive prediction deviation will be. This is to reduce the training interference of the second image on the predictive neural network when training the predictive neural network at the beginning (for example, when the number of iterative training times t is relatively small).
  • the predictive neural network becomes more and more accurate, so it can have a larger weighting coefficient to increase the training effect of the second image on the predictive neural network, which can improve the training accuracy of the predictive neural network.
  • Figure 8 is a schematic diagram of a model training scenario provided by an embodiment of the present application.
  • the computer device may generate a first prediction deviation based on the first image through a prediction neural network.
  • the computer device may also use an auxiliary neural network to label and separate the pixels in the second image, that is, to distinguish the pixels in the second image into pixels of the first category and pixels of the second category.
  • the computer device can generate a second prediction deviation based on the pixel points of the first classification through a prediction neural network, and generate a third prediction deviation based on the pixel points of the second classification.
  • the computer device can generate a comprehensive prediction deviation for the second image based on the second prediction deviation and the third prediction deviation, and can weight the comprehensive prediction deviation according to the weighting coefficient to obtain a weighted comprehensive prediction deviation.
  • the computer device can obtain the final prediction loss of the prediction neural network (i.e., the above-mentioned prediction deviation or prediction deviation) based on the above-mentioned first prediction deviation and the weighted comprehensive prediction deviation, and the prediction neural network can perform network parameter optimization based on the prediction deviation. Modify and obtain the trained predictive neural network (that is, the predictive neural network after parameter optimization).
  • the first image with high-quality label information can be used to perform supervised training of the prediction neural network, and for the second image with low-quality label information, regardless of the predicted correctly labeled pixels (i.e., pixels of the first category) or predicted mislabeled pixels (i.e., pixels of the second category) can participate in the comparison.
  • the predicted correctly labeled pixels i.e., pixels of the first category
  • predicted mislabeled pixels i.e., pixels of the second category
  • Neural networks are trained, and therefore, very accurate predictive neural networks can be trained.
  • the embodiments of the present application can be used to perform differential learning on mixed-quality sample data (including the first image and the second image), that is, label-isolated learning of mixed-quality sample data can be implemented to fully learn the correct identity of the sample data. Features, and then train an accurate prediction neural network.
  • the embodiments of this application also conducted precise experiments on the method provided.
  • the left atrium (LA) segmentation data set was used for the experiment.
  • the left atrial segmentation data set provides 100 3D magnetic resonance images (which can be understood as the three-dimensional first image) with expert labels (which can be understood as the label information of the pixels in the first image).
  • the resolution of the image can be 0.625*0.625*0.625m ⁇ 3. All images are cropped to the center of the heart region and normalized to zero mean and unit variance. In order to simulate actual scenarios, the embodiments of this application study extreme settings and common soft settings.
  • HQ label information i.e. samples with high-quality label information, which can be understood as the first image
  • the soft setting uses 8 (10%) samples as HQ marker information.
  • the remaining samples are treated as non-expert low-quality annotated data (i.e., samples with low-quality label information, which can be understood as second images), and these samples are processed through commonly used simulated label corruption schemes, including random erosion of 3-15 voxels. and expansion processing.
  • the experimental framework uses NVIDIA GeForce RTX (graphics processing platform) 3090GPU (graphics processing unit) with 24GB (computer storage unit) memory, and is implemented in Python (a computer programming language) and PyTorch (an open source machine learning library).
  • NVIDIA GeForce RTX graphics processing platform
  • 3090GPU graphics processing unit
  • 24GB computer storage unit
  • Python a computer programming language
  • PyTorch an open source machine learning library
  • the learning rate is initialized to 0.01 and decays by a power of 0.9 after each step.
  • This application randomly crops 112 ⁇ 112 ⁇ 80 voxel blocks as network input, and also applies standard data expansion, including random cropping, flipping and rotation, and uses a sliding window strategy of 18 ⁇ 18 ⁇ 4 voxel step size. for testing phase.
  • the embodiment of this application uses four indicators for comprehensive evaluation, including: Dice (an image segmentation evaluation indicator), Jaccard (a data mining indicator), ASD (average surface distance), 95HD (a (evaluation index for medical image segmentation).
  • Dice an image segmentation evaluation indicator
  • Jaccard a data mining indicator
  • ASD average surface distance
  • 95HD a (evaluation index for medical image segmentation).
  • Rows 2 to 9 in Table 1 are the experimental data of one experiment, and rows 10 to 17 of Table 1 are the experimental data of another experiment.
  • Set-HQ represents the number of high-quality annotated data used for training
  • Set-LQ represents the number of low-quality annotated data used for training.
  • "HQ-LQ separation?” indicates whether the corresponding method performs separate training on low-quality annotated data and high-quality annotated data. The higher Dice and Jaccard are, the better the effect is, while the smaller ASD and 95HD are, the better the effect is.
  • the values in brackets in Table 1 represent the standard deviation of the indicator under the corresponding method.
  • H-Sup represents supervised training with only high-quality annotated data
  • HL-Sup represents mixed supervised training with high-quality and low-quality annotated data
  • TriNet represents the use of a joint learning framework composed of three networks that integrates data from two network’s predictions to supervise the third network
  • 2RnT represents a two-stage method to improve annotation quality by estimating a confusion matrix for label correction
  • PNL represents the introduction of an image-level label quality assessment module to identify images with clean labels
  • KDEM means that knowledge distillation technology and entropy minimization optimization terms are introduced to train the network
  • Decoupled means that two separate decoders (one corresponding to high-quality annotated data and one corresponding to low-quality annotated data) are used to implicitly solve the problem coupled to train the network.
  • FIG. 9 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing device may be a computer program (including program code) running in a computer device.
  • the data processing device may be an application software.
  • the data processing device may be used to execute corresponding steps in the method provided by the embodiments of the present application.
  • the data processing device 1 may include: an acquisition module 11 , a first calling module 12 , a second calling module 13 , a classification module 14 and an optimization module 15 .
  • Acquisition module 11 is used to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image where the target object is located in the second image is The area is the second characteristic area;
  • the first processing module 12 is used to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: a first prediction pixel indicating whether each pixel of the first image belongs to the first feature area. information;
  • the second processing module 13 is used to input the second image into a prediction neural network to obtain a second prediction result; the second prediction result includes: a second prediction pixel indicating whether each pixel of the second image belongs to the second feature area. information;
  • the classification module 14 is used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points in the second image that belong to the first classification and those that belong to the second classification. pixels of pixels with incorrect label information in the second image;
  • the optimization module 15 is used to optimize the network parameters of the predictive neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained predictive neural network.
  • the trained predictive neural network is For image segmentation of target images.
  • the steps involved in the data processing method shown in FIG. 3 may be performed by various modules in the data processing device 1 shown in FIG. 9 .
  • step S101 shown in FIG. 3 can be performed by the acquisition module 11 in FIG. 9
  • step S102 shown in FIG. 3 can be performed by the first processing module 12 in FIG. 9
  • step S103 shown in FIG. 3 The step S104 shown in Figure 3 can be performed by the second processing module 13 in Figure 9
  • the step S104 shown in Figure 3 can be performed by the classification module 14 in Figure 9
  • the step S105 shown in Figure 3 can be performed by the optimization module 15 in Figure 9 .
  • Embodiments of the present application can acquire a first image with a first characteristic area and a second image with a second characteristic area; predict the first image through a prediction neural network to obtain a first prediction result; the first prediction result includes: indication First predicted pixel information of whether each pixel of the first image belongs to the first feature area; predicting the second image through a prediction neural network to obtain a second prediction result; the second prediction result includes: indicating each pixel of the second image The second predicted pixel information of whether the point belongs to the second feature area; classify and predict the pixel points of the second image through the auxiliary neural network to obtain the classification prediction result; the classification prediction result is used to indicate that the pixels belonging to the first classification on the second image pixel points, and pixel points belonging to the second category; according to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized.
  • the device proposed in the embodiment of the present application can classify the pixels in the second image through the auxiliary neural network, and can subsequently perform the prediction neural network on the classification results of each pixel in the second image through the auxiliary neural network.
  • Parameter optimization which can improve the accuracy of parameter optimization of the predictive neural network.
  • the predictive neural network after parameter optimization can also accurately segment the feature areas in the image.
  • each module in the data processing device 1 shown in Figure 9 can be separately or entirely combined into one or several units, or some of the units can be further divided into Multiple subunits with smaller functions can implement the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above modules are divided based on logical functions.
  • the function of one module can also be realized by multiple units, or the functions of multiple modules can be realized by one unit.
  • the data processing device 1 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
  • the method can be implemented by running on a general-purpose computer device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and other processing elements and storage elements.
  • a computer program (including program code) capable of executing the steps involved in the corresponding method as shown in Figure 3 to construct the data processing device 1 as shown in Figure 9 and to implement the data processing method of the embodiment of the present application .
  • the above-mentioned computer program can be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and run therein.
  • the computer device 1000 may include: a processor 1001, a network interface 1004 and a memory 1005.
  • the computer device 1000 may also include: a user interface 1003, and at least one communication bus 1002.
  • the communication bus 1002 is used to realize connection communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include standard wired interfaces and wireless interfaces (such as WI-FI interfaces).
  • the memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001.
  • memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the data processing method described in the above embodiment.
  • the computer device 1000 described in the embodiment of the present application can execute the description of the above-mentioned data processing method in the embodiment corresponding to FIG. 3, and can also execute the description of the above-mentioned data processing device 1 in the embodiment corresponding to FIG. 9. , which will not be described in detail here.
  • the description of the beneficial effects of using the same method will not be described again.
  • the embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores the computer program executed by the aforementioned data processing device 1, and the computer program includes Program instructions, when the processor executes the program instructions, can execute the description of the data processing method in the embodiment corresponding to Figure 3. Therefore, the details will not be described here. In addition, the description of the beneficial effects of using the same method will not be described again. For technical details not disclosed in the computer storage medium embodiments involved in this application, please refer to the description of the method embodiments in this application.
  • the above program instructions may be deployed on one computer device for execution, or on multiple computer devices located at one location, or on multiple computer devices distributed at multiple locations and interconnected through a communication network.
  • Multiple computer devices distributed in multiple locations and interconnected through communication networks can form a blockchain network.
  • the above-mentioned computer-readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned computer equipment, such as the hard disk or memory of the computer equipment.
  • the computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the computer device, Flash card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
  • Embodiments of the present application provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the description of the above-mentioned data processing method in the corresponding embodiment of Figure 3. Therefore, the description will not be repeated here. Elaborate.
  • the description of the beneficial effects of using the same method will not be described again.
  • For technical details not disclosed in the computer-readable storage medium embodiments involved in this application please refer to the description of the method embodiments in this application.
  • each process and/or the method flowcharts and/or structural schematic diagrams can be implemented by computer program instructions. or blocks, and combinations of processes and/or blocks in flowcharts and/or block diagrams.
  • These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the structural diagram.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or in one block or multiple blocks in the structural diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart and/or a block or blocks of a structural representation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Disclosed in embodiments of the present application are a data processing method and apparatus, a program product, a computer device, and a medium. The method comprises: acquiring a first image and a second image comprising a target object; generating first prediction pixel information of pixel points in the first image by means of a prediction neural network; generating second prediction pixel information of pixel points in the second image by means of the prediction neural network; classifying and predicting the pixel points of the second image by means of an auxiliary neural network to obtain a classification and prediction result, the classification and prediction result being used for indicating pixel points in the second image belonging to a first class and pixel points belonging to a second class; and performing network parameter optimization on the prediction neural network according to the first prediction pixel information, second prediction pixel information, and classification and prediction result of the pixel points.

Description

数据处理方法、装置、程序产品、计算机设备和介质Data processing methods, devices, program products, computer equipment and media
本申请要求于2022年4月29日提交中国专利局、申请号为202210466331.9,发明名称为“数据处理方法、装置、程序产品、计算机设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on April 29, 2022, with the application number 202210466331.9, and the invention name is "data processing method, device, program product, computer equipment and medium", and its entire content is approved by This reference is incorporated into this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种数据处理方法、装置、程序产品、计算机设备和介质。The present application relates to the field of computer technology, and in particular, to a data processing method, device, program product, computer equipment and medium.
背景技术Background technique
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科,专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers simulate or realize human learning behavior. To acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve its performance.
机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
技术内容Technical content
本申请实施例提供了一种数据处理方法、装置、程序产品、计算机设备和介质,可以提高训练得到的预测神经网络的准确性,后续也可以通过训练得到的预测神经网络对图像中的特征区域进行准确分割。Embodiments of the present application provide a data processing method, device, program product, computer equipment and medium, which can improve the accuracy of the trained predictive neural network. Subsequently, the trained predictive neural network can also be used to detect feature areas in the image. perform accurate segmentation.
本申请实施例提供了一种数据处理方法,该方法包括:The embodiment of this application provides a data processing method, which method includes:
获取包含目标对象的第一图像和第二图像;其中,所述第一图像中目标对象所在的图像区域为第一特征区域;所述第二图像中目标对象所在的图像区域为第二特征区域;Obtain the first image and the second image containing the target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image area where the target object is located in the second image is the second characteristic area. ;
将所述第一图像输入预测神经网络,得到第一预测结果;第一预测结果包括:指示第一图像的各像素点是否属于第一特征区域的第一预测像素信息;The first image is input into the prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel point of the first image belongs to the first feature area;
将所述第二图像输入预测神经网络,得到第二预测结果;第二预测结果包括:指示第二图像的各像素点是否属于第二特征区域的第二预测像素信息;The second image is input into the prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information indicating whether each pixel point of the second image belongs to the second feature area;
通过辅助神经网络对第二图像的各像素点进行分类预测,得到分类预测结果;分类预测结果用于指示第二图像中属于第一分类的像素点、及属于第二分类的像素点;其中,所述第 一分类的像素点为所述辅助神经网络预测出的第二图像中具有正确标记信息的像素点;所述第二分类的像素点为所述辅助神经网络预测出的第二图像中具有错误标记信息的像素点;Classification prediction is performed on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points belonging to the first classification and the pixel points belonging to the second classification in the second image; where, The mentioned The pixels of one category are pixels with correct label information in the second image predicted by the auxiliary neural network; the pixels of the second category are pixels with incorrect labels in the second image predicted by the auxiliary neural network. pixels of information;
根据第一预测结果、第二预测结果、及分类预测结果,对预测神经网络进行网络参数优化,得到训练后的预测神经网络,所述训练后的预测神经网络用于对目标图像进行图像分割。According to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized to obtain a trained prediction neural network. The trained prediction neural network is used to perform image segmentation on the target image.
本申请实施例还提供了一种数据处理装置,该装置包括:The embodiment of the present application also provides a data processing device, which includes:
获取模块,用于获取包含目标对象的第一图像和第二图像;其中,所述第一图像中目标对象所在的图像区域为第一特征区域;所述第二图像中目标对象所在的图像区域为第二特征区域;An acquisition module, configured to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is is the second characteristic area;
第一处理模块,用于将所述第一图像输入预测神经网络,得到第一预测结果;第一预测结果包括:指示第一图像的各像素点是否属于第一特征区域的第一预测像素信息;A first processing module configured to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel of the first image belongs to the first feature area. ;
第二处理模块,用于将所述第二图像输入预测神经网络,得到第二预测结果;第二预测结果包括:指示第二图像的各像素点是否属于第二特征区域的第二预测像素信息;A second processing module, configured to input the second image into a prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information indicating whether each pixel of the second image belongs to the second feature area. ;
分类模块,用于通过辅助神经网络对第二图像的各像素点进行分类预测,得到分类预测结果;分类预测结果用于指示第二图像中属于第一分类的像素点、及属于第二分类的像素点;其中,所述第一分类的像素点为所述辅助神经网络预测出的第二图像中具有正确标记信息的像素点;所述第二分类的像素点为所述辅助神经网络预测出的第二图像中具有错误标记信息的像素点;The classification module is used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points in the second image that belong to the first classification and the pixel points that belong to the second classification. Pixel points; wherein, the pixel points of the first classification are pixel points with correct label information in the second image predicted by the auxiliary neural network; and the pixel points of the second classification are pixel points predicted by the auxiliary neural network. Pixels with incorrect label information in the second image;
优化模块,用于根据第一预测结果、第二预测结果、及分类预测结果,对预测神经网络进行网络参数优化,得到训练后的预测神经网络,所述训练后的预测神经网络用于对目标图像进行图像分割。The optimization module is used to optimize the network parameters of the prediction neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained prediction neural network. The trained prediction neural network is used to predict the target Image segmentation.
本申请实施例还提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行本申请中一方面中的方法。An embodiment of the present application also provides a computer device, including a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, it causes the processor to execute the method in one aspect of the present application.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令被处理器执行时使该处理器执行上述一方面中的方法。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. The computer program includes program instructions. When executed by a processor, the program instructions cause the processor to perform the above-mentioned aspect. Methods.
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述一方面等各种可选方式中提供的方法。 Embodiments of the present application also provide a computer program product or computer program. The computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided in various optional ways such as the above aspect.
附图简要说明Brief description of the drawings
图1是本申请实施例提供的一种网络架构的结构示意图;Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application;
图2是本申请实施例提供的一种模型训练的场景示意图;Figure 2 is a schematic diagram of a model training scenario provided by an embodiment of the present application;
图3是本申请实施例提供的一种数据处理方法的流程示意图;Figure 3 is a schematic flow chart of a data processing method provided by an embodiment of the present application;
图4是本申请实施例提供的一种网络训练的场景示意图;Figure 4 is a schematic diagram of a network training scenario provided by an embodiment of the present application;
图5是本申请实施例提供的一种确定分类预测结果的流程示意图;Figure 5 is a schematic flowchart of determining classification prediction results provided by an embodiment of the present application;
图6是本申请实施例提供的一种确定分类结果的场景示意图;Figure 6 is a schematic diagram of a scenario for determining classification results provided by an embodiment of the present application;
图7是本申请实施例提供的一种确定预测偏差的流程示意图;Figure 7 is a schematic flowchart of determining a prediction deviation provided by an embodiment of the present application;
图8是本申请实施例提供的一种模型训练的场景示意图;Figure 8 is a schematic diagram of a model training scenario provided by an embodiment of the present application;
图9是本申请实施例提供的一种数据处理装置的结构示意图;Figure 9 is a schematic structural diagram of a data processing device provided by an embodiment of the present application;
图10是本申请实施例提供的一种计算机设备的结构示意图。Figure 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in this application will be clearly and completely described below with reference to the accompanying drawings in this application. Obviously, the described embodiments are only some of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
本申请涉及到人工智能相关技术。在对模型进行机器学习时,往往需要大量的样本数据,而该大量的样本数据往往具有样本质量上的差异,因此,如何通过质量不一的样本数据对模型进行更为准确的训练成为了一个亟待解决的问题。This application involves artificial intelligence related technologies. When performing machine learning on a model, a large amount of sample data is often required, and this large amount of sample data often has differences in sample quality. Therefore, how to train the model more accurately through sample data of varying quality has become a Problems to be solved.
本申请实施例中所涉及到的机器学习主要指,如何训练得到预测神经网络,后续以通过训练得到的预测神经网络对图像中的特征区域进行准确分割,具体可以参见下述图3对应的实施例中的描述。The machine learning involved in the embodiments of this application mainly refers to how to train a predictive neural network, and then use the predictive neural network obtained through training to accurately segment the feature areas in the image. For details, please refer to the corresponding implementation in Figure 3 below. Description in the example.
请参见图1,图1是本申请实施例提供的一种网络架构的结构示意图。如图1所示,网络架构可以包括服务器200和终端设备集群,终端设备集群可以包括一个或者多个终端设备,这里将不对终端设备的数量进行限制。如图1所示,多个终端设备具体可以包括终端设备100a、终端设备101a、终端设备102a、…、终端设备103a;如图1所示,终端设备100a、终端设备101a、终端设备102a、…、终端设备103a均可以与服务器200进行网络连接,以便于每个终端设备可以通过网络连接与服务器200之间进行数据交互。Please refer to Figure 1. Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application. As shown in Figure 1, the network architecture may include a server 200 and a terminal device cluster. The terminal device cluster may include one or more terminal devices. There will be no limit on the number of terminal devices here. As shown in Figure 1, multiple terminal devices may specifically include terminal device 100a, terminal device 101a, terminal device 102a,..., terminal device 103a; as shown in Figure 1, terminal device 100a, terminal device 101a, terminal device 102a,... , the terminal device 103a can all have a network connection with the server 200, so that each terminal device can perform data interaction with the server 200 through the network connection.
如图1所示的服务器200可以是独立的服务器,也可以是多个服务器构成的服务器集群 或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端设备可以是:智能手机、平板电脑、笔记本电脑、桌上型电脑、智能电视等智能终端。下面以终端设备100a与服务器200之间的通信为例,进行本申请实施例的具体描述。The server 200 shown in Figure 1 can be an independent server or a server cluster composed of multiple servers. Or a distributed system, which can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms, etc. Cloud server for basic cloud computing services. Terminal devices can be: smart phones, tablets, laptops, desktop computers, smart TVs and other smart terminals. The following takes the communication between the terminal device 100a and the server 200 as an example to provide a detailed description of the embodiment of the present application.
请参见图2,图2是本申请实施例提供的一种模型训练的场景示意图。如图2所示,服务器200可以用于对学生模型(也可以称为待训练的模型)进行训练,该过程可以是:服务器200可以获取到用于模型训练的样本数据,该样本数据可以包括专家标注的少量样本以及非专家标注的大量样本,任一样本数据可以是图像数据,任一图像数据可以包含若干的像素点,任一图像数据中任一个像素点都具有一个标记信息,任一样本数据都可以包含目标对象,任一像素点的标记信息用于指示该像素点是否属于所在图像中的目标对象,可以将图像中目标对象所在的图像区域称之为是特征区域。服务器200可以将专家标注的样本数据和非专家标注的样本数据输入学生模型,并将非专家标注的样本数据输入教师模型(也可以称为已训练的模型,用于辅助学生模型的训练)。Please refer to Figure 2. Figure 2 is a schematic diagram of a model training scenario provided by an embodiment of the present application. As shown in Figure 2, the server 200 can be used to train a student model (which can also be called a model to be trained). The process can be: the server 200 can obtain sample data for model training, and the sample data can include A small number of samples labeled by experts and a large number of samples labeled by non-experts. Any sample data can be image data. Any image data can contain several pixels. Any pixel in any image data has a label information. This data can contain target objects. The label information of any pixel is used to indicate whether the pixel belongs to the target object in the image. The image area where the target object is located in the image can be called a feature area. The server 200 can input sample data marked by experts and sample data marked by non-experts into the student model, and input sample data marked by non-experts into the teacher model (which can also be called a trained model, used to assist the training of the student model).
其中,学生模型可以生成专家标注的样本数据中各个像素点的掩码,并基于该掩码生成用于判别专家标注的样本数据中各个像素点是否属于目标对象的预测像素信息,学生模型还可以生成非专家标注的样本数据中各个像素点的掩码,并基于该掩码生成用于判别非专家标注的样本数据中各个像素点是否属于目标对象的预测像素信息。Among them, the student model can generate a mask for each pixel in the sample data labeled by experts, and based on the mask, generate predicted pixel information for judging whether each pixel in the sample data labeled by experts belongs to the target object. The student model can also Generate a mask for each pixel in the sample data labeled by non-experts, and generate predicted pixel information based on the mask to determine whether each pixel in the sample data labeled by non-experts belongs to the target object.
更多的,教师模型则是可以生成非专家标注的样本数据中各个像素点的掩码,进而基于该掩码获取非专家标注的样本数据中各个像素点与目标原型/背景原型之间的特征距离,该目标原型可以用于表示样本数据中目标对象的特征,该背景原型可以用于表示样本数据中目标对象的背景图像的特征,进而教师模型可以根据该特征距离以及非专家标注的样本数据中各个像素点的标记信息,来判断非专家标注的样本数据中各个像素点的标记信息是正确标记的还是错误标记的,并将判断结果给到学生模型。What's more, the teacher model can generate a mask for each pixel in the sample data labeled by non-experts, and then obtain the characteristics between each pixel in the sample data labeled by non-experts and the target prototype/background prototype based on this mask. distance. The target prototype can be used to represent the characteristics of the target object in the sample data. The background prototype can be used to represent the characteristics of the background image of the target object in the sample data. Then the teacher model can use the characteristic distance and the sample data labeled by non-experts to The labeling information of each pixel in the sample data labeled by non-experts is used to determine whether the labeling information of each pixel in the sample data labeled by non-experts is correctly labeled or incorrectly labeled, and the judgment results are given to the student model.
接着,学生模型即可根据教师模型的判断结果(即所判断的非专家标注的样本数据中正确标注的像素点和错误标记的像素点)以及针对样本数据的预测像素信息(包括针对专家标注的样本数据中各个像素点的预测像素信息、及针对非专家标注的样本数据中各个像素点的预测像素信息)生成预测偏差,并基于该预测偏差修正学生模型的网络参数,得到训练后的学生模型。Then, the student model can use the judgment results of the teacher model (i.e., the correctly labeled pixels and incorrectly labeled pixels in the sample data labeled by non-experts) and the predicted pixel information for the sample data (including the pixels labeled by experts). The predicted pixel information of each pixel in the sample data, and the predicted pixel information of each pixel in the sample data labeled by non-experts) generates a prediction deviation, and corrects the network parameters of the student model based on the prediction deviation to obtain the trained student model .
后续,服务器200可以利用训练后的学生模型对图像中的目标对象进行分割,服务器200可以将分割的结果给到终端设备100a,终端设备100a可以在终端界面显示该分割的结果,给 到相关技术人员进行分析。Subsequently, the server 200 can use the trained student model to segment the target object in the image. The server 200 can provide the segmentation result to the terminal device 100a. The terminal device 100a can display the segmentation result on the terminal interface. Contact relevant technical personnel for analysis.
其中,上述专家标注的样本数据可以是下述第一图像,非专家标注的样本数据可以是下述第二图像,上述掩码可以是下述掩码区域,学生模型可以是下述预测神经网络,教师网络可以是下述辅助神经网络,学生模型针对样本数据中各像素点的预测像素信息可以包含于下述第一预测像素信息和第二预测像素信息,上述目标原型可以下述目标中心特征,上述背景原型可以是下述背景中心特征,因此,如何通过教师模型对学生模型进行训练的具体过程可以参见下述图3及图5对应实施例中的描述。Wherein, the sample data marked by experts may be the following first image, the sample data marked by non-experts may be the second image described below, the mask may be the mask area described below, and the student model may be the predictive neural network described below. , the teacher network can be the following auxiliary neural network, the predicted pixel information of the student model for each pixel in the sample data can be included in the following first predicted pixel information and the second predicted pixel information, and the above-mentioned target prototype can be the following target center feature , the above-mentioned background prototype may be the following central feature of the background. Therefore, the specific process of how to train the student model through the teacher model can be referred to the description in the corresponding embodiments of Figure 3 and Figure 5 below.
采用本申请实施例提供的方法,通过教师模型基于目标对象的目标原型和背景原型来对非专家标注的样本数据中各个像素点的标记信息是正确标注的还是错误标注的进行判断,继而基于判断结果使得学生模型对非专家标注的样本数据中的像素点进行区别训练,并通过专家标注的样本数据对学生模型进行监督训练,以提升对学生模型的训练准确性,进而可以训练得到准确的学生模型。Using the method provided by the embodiments of this application, the teacher model determines whether the labeling information of each pixel in the sample data labeled by non-experts is correctly labeled or incorrectly labeled based on the target prototype and background prototype of the target object, and then based on the judgment As a result, the student model can perform differential training on pixels in the sample data labeled by non-experts, and supervise the training of the student model through the sample data labeled by experts to improve the training accuracy of the student model, and then train accurate students. Model.
请参见图3,图3是本申请实施例提供的一种数据处理方法的流程示意图。本申请实施例中的执行主体可以是一个计算机设备或者多个计算机设备所构成的计算机设备集群。该计算机设备可以是服务器,也可以终端设备。下述,将本申请实施例中的执行主体统称为计算机设备为例进行说明。如图3所示,该方法可以包括:Please refer to Figure 3. Figure 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application. The execution subject in the embodiment of this application may be a computer device or a computer device cluster composed of multiple computer devices. The computer device can be a server or a terminal device. In the following description, the execution subjects in the embodiments of the present application are collectively referred to as computer devices as an example. As shown in Figure 3, the method may include:
步骤S101,获取包含目标对象的第一图像和第二图像,其中,所述第一图像中目标对象所在的图像区域为第一特征区域;所述第二图像中目标对象所在的图像区域为第二特征区域。Step S101, obtain a first image and a second image containing a target object, wherein the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is the third feature area. Two characteristic areas.
在一些实施例中,计算机设备可以获取到第一图像和第二图像,第一图像和第二图像的数量根据实际应用场景确定,对此不作限制,该第一图像和第二图像是用于对预测神经网络进行训练的样本数据。In some embodiments, the computer device can obtain the first image and the second image. The number of the first image and the second image is determined according to the actual application scenario, and there is no limitation on this. The first image and the second image are used for Sample data for training a predictive neural network.
其中,第一图像和第二图像均可以包含目标对象,该目标对象可以是任意需要从图像数据中进行分割的对象,该目标对象可以根据实际应用场景确定。本申请所提供的方法可以应用在任意的图像分割场景,该图像分割场景可以是二维的分割场景,也可以是三维的分割场景。第一图像中目标对象的显示形态(如对象类别不同,或者对象类别相同但姿势不同,或者对象所在环境不同等等)和第二图像中目标对象的显示形态可以是不同的。例如,目标对象可以是左心室,第一图像和第二图像都可以包含左心室的图像,但是第一图像包含的左心室的图像和第二图像包含的左心室的图像可以是不一样的。Wherein, both the first image and the second image may contain a target object, and the target object may be any object that needs to be segmented from the image data. The target object may be determined according to the actual application scenario. The method provided by this application can be applied to any image segmentation scene, which may be a two-dimensional segmentation scene or a three-dimensional segmentation scene. The display form of the target object in the first image (for example, the object category is different, or the object category is the same but the posture is different, or the environment in which the object is located is different, etc.) and the display form of the target object in the second image may be different. For example, the target object may be the left ventricle, and both the first image and the second image may contain images of the left ventricle, but the images of the left ventricle contained in the first image and the images of the left ventricle contained in the second image may be different.
其中,可以将图像中目标对象所在的图像区域称之为是特征区域,进一步的,可以将第一图像中目标对象所在的图像区域称之为是第一特征区域,可以将第二图像中目标对象所在的图像区域称之为是第二特征区域。 Among them, the image area where the target object in the image is located can be called a feature area. Further, the image area where the target object is located in the first image can be called the first feature area. The target in the second image can be called a feature area. The image area where the object is located is called the second feature area.
第一图像和第二图像均可以包含若干个像素点。Both the first image and the second image may contain several pixels.
例如,若本申请实施例所提供的方法应用在二维图像分割场景,则第一图像和第二图像可以是二维图像,第一图像和第二图像中的像素点可以是二维的,目标对象可以是需要从二维的图像中进行分割的任意对象,该对象具体可以根据实际应用场景确定,该目标对象可以是各个局部的结构特征高度相关或者相似的对象,例如目标对象可以是二维的图像中整体的纹理结构特征都比较相似的植物等。For example, if the method provided by the embodiment of the present application is applied to a two-dimensional image segmentation scenario, the first image and the second image may be two-dimensional images, and the pixels in the first image and the second image may be two-dimensional, The target object can be any object that needs to be segmented from a two-dimensional image. The object can be specifically determined according to the actual application scenario. The target object can be an object whose local structural features are highly correlated or similar. For example, the target object can be a two-dimensional object. The overall texture and structure characteristics in the dimensional images are relatively similar to plants, etc.
再例如,若本申请实施例所提供的方法应用在三维图像分割场景,则第一图像和第二图像可以是三维图像,第一图像和第二图像中的像素点可以是三维的(此时第一图像和第二图像中的像素点也可以称为体素),目标对象可以是需要从三维的图像中进行分割的任意对象,该对象具体也可以根据实际应用场景确定,例如本申请实施例可以应用于医学图像分割场景,目标对象可以是各个局部的结构特征比较相关或者相似的对象,如目标对象可以是三维图像数据中需要分割的人体器官(可以称为部位),如该器官(部位)可以是左心室等任意器官。For another example, if the method provided by the embodiment of the present application is applied to a three-dimensional image segmentation scenario, the first image and the second image may be three-dimensional images, and the pixels in the first image and the second image may be three-dimensional (in this case The pixels in the first image and the second image can also be called voxels). The target object can be any object that needs to be segmented from the three-dimensional image. The specific object can also be determined according to the actual application scenario. For example, the implementation of this application For example, it can be applied to medical image segmentation scenarios. The target object can be an object whose local structural features are relatively relevant or similar. For example, the target object can be a human organ (can be called a part) that needs to be segmented in three-dimensional image data, such as the organ ( site) can be any organ such as the left ventricle.
更多的,可以将为上述第一图像设置的监督数据称之为是第一监督数据,该第一监督数据用于指示第一图像中各个像素点是否属于第一特征区域,换句话说,该第一监督数据用于指示第一图像中各个像素点是否属于目标对象。该第一监督数据可以包括:第一图像中各个像素点的标记信息。其中,第一图像中各个像素点的标记信息用于分别指示该各个像素点是属于第一图像中的目标对象的,还是属于第一图像中目标对象的背景图像的。换句话说,第一图像中各个像素点的标记信息用于指示该各个像素点是属于第一特征区域的像素点,还是属于第一图像中除第一特征区域之外的区域(即第一图像中目标对象的背景图像的区域)中的像素点。第一图像中目标对象的背景图像也可以称之为是第一图像中第一特征区域的背景图像。Furthermore, the supervision data set for the above-mentioned first image can be called first supervision data. The first supervision data is used to indicate whether each pixel point in the first image belongs to the first characteristic area. In other words, The first supervision data is used to indicate whether each pixel in the first image belongs to the target object. The first supervision data may include: label information of each pixel in the first image. The mark information of each pixel in the first image is used to respectively indicate whether each pixel belongs to the target object in the first image or to the background image of the target object in the first image. In other words, the label information of each pixel point in the first image is used to indicate whether each pixel point belongs to a pixel point in the first characteristic area or belongs to an area in the first image other than the first characteristic area (i.e., the first characteristic area). The pixels in the area of the background image of the target object in the image). The background image of the target object in the first image may also be called the background image of the first feature area in the first image.
同理,可以将为上述第二图像设置的监督数据称之为是第二监督数据,该第二监督数据用于指示第二图像中各个像素点是否属于第二特征区域,换句话说,该第二监督数据用于指示第二图像中各个像素点是否属于目标对象。该第二监督数据可以包括:第二图像中各个像素点的标记信息。其中,第二图像中各个像素点的标记信息用于分别指示该各个像素点是属于第二图像中的目标对象的,还是属于第二图像中目标对象的背景图像的。换句话说,第二图像中各个像素点的标记信息用于指示该各个像素点是属于第二特征区域的像素点,还是属于第二图像中除第二特征区域之外的区域(即第二图像中目标对象的背景图像的区域)中的像素点。第二图像中目标对象的背景图像也可以称之为是第二图像中第二特征区域的背景图像。In the same way, the supervision data set for the above-mentioned second image can be called second supervision data. The second supervision data is used to indicate whether each pixel point in the second image belongs to the second feature area. In other words, the second supervision data The second supervision data is used to indicate whether each pixel in the second image belongs to the target object. The second supervision data may include: label information of each pixel in the second image. Wherein, the label information of each pixel point in the second image is used to respectively indicate whether each pixel point belongs to the target object in the second image or belongs to the background image of the target object in the second image. In other words, the label information of each pixel point in the second image is used to indicate whether each pixel point belongs to a pixel point in the second characteristic area or belongs to an area in the second image other than the second characteristic area (i.e., the second characteristic area). The pixels in the area of the background image of the target object in the image). The background image of the target object in the second image may also be called the background image of the second feature area in the second image.
换句话说,任一个像素点具有的标记信息用于指示该像素点在所在图像中与目标对象之 间的所属关系,该所属关系可以是该像素点属于目标对象的关系(即该像素点属于图像中目标对象的图像所包含的像素点),或者该所属关系可以是该像素点不属于目标对象的关系(即该像素点不属于所在图像中目标对象的图像所包含的像素点)。In other words, the label information of any pixel is used to indicate the position between the pixel and the target object in the image. The ownership relationship can be that the pixel belongs to the target object (that is, the pixel belongs to the pixel included in the image of the target object in the image), or the ownership relationship can be that the pixel does not belong to the target object. relationship (that is, the pixel does not belong to the pixel contained in the image of the target object in the image).
需要进行说明的是,上述第一图像可以是高质量标注的样本数据,上述第二图像可以是低质量标注的样本数据,这可以体现在,为第一图像设置的第一监督数据的准确性(即第一图像中各个像素点的标记信息的准确性)高于为第二图像设置的第二监督数据的准确性(即第二图像中各个像素点的标记信息的准确性),该准确性可以是主观意义上的准确性。例如,第一图像可以是由专家进行标记的样本数据,即第一图像的第一监督数据可以是由技术领域中的专业人员进行标记的;而第二图像可以是由非专家进行标记的样本数据,即第二图像的第二监督数据可以是由非技术领域中的人员进行标记的。It should be noted that the above-mentioned first image may be high-quality annotated sample data, and the above-mentioned second image may be low-quality annotated sample data. This may be reflected in the accuracy of the first supervision data set for the first image. (i.e., the accuracy of the labeling information of each pixel in the first image) is higher than the accuracy of the second supervision data set for the second image (i.e., the accuracy of the labeling information of each pixel in the second image), which accuracy Sex can be subjective in the sense of accuracy. For example, the first image may be sample data marked by experts, that is, the first supervision data of the first image may be marked by professionals in the technical field; and the second image may be samples marked by non-experts. The data, ie the second supervision data of the second image, may be labeled by a person in a non-technical field.
举个例子,若将本申请实施例提供的方法应用在医学分割场景,则第一图像和第二图像可以是包含需要分割的器官的图像数据,第一图像中像素点的标记信息可以是由医学领域的专业人员进行标记的,第二图像中像素点的标记信息可以是由业余人员进行标记的,因此,通常第一图像中像素点的标记信息的准确性会高于第二图像中像素点的标记信息。For example, if the method provided by the embodiment of the present application is applied to a medical segmentation scenario, the first image and the second image may be image data containing organs that need to be segmented, and the label information of the pixels in the first image may be Marked by professionals in the medical field, the marking information of the pixels in the second image can be marked by amateurs. Therefore, usually the accuracy of the marking information of the pixels in the first image will be higher than that of the pixels in the second image. Point labeling information.
并且,需要进行说明的是,获取大量高质量标注样本(如第一图像)的成本非常大,尤其是在专家知识依赖的医学影像领域其难度更大,因此,为了节省获取样本的成本,本申请中第一图像可以是少量的,第二图像可以是大量的,本申请可以有效地利用少量的高质量标注数据(如第一图像)和大量的低质量标注数据(如第二图像)来准确地训练模型(如预测神经网络)。Moreover, it should be noted that the cost of obtaining a large number of high-quality annotated samples (such as the first image) is very high, especially in the field of medical imaging that relies on expert knowledge. Therefore, in order to save the cost of obtaining samples, this paper The first image in the application can be a small amount, and the second image can be a large amount. This application can effectively utilize a small amount of high-quality annotation data (such as the first image) and a large amount of low-quality annotation data (such as the second image). Accurately train models (such as predictive neural networks).
在一些实施例中,像素点的标记信息(如第一图像中像素点的标记信息或第二图像中像素点的标记信息)可以记为0或者1,若一个像素点的标记信息是0,则表明该像素点不属于所在图像中的目标对象,反之,若一个像素点的标记信息是1,则表明该像素点属于所在图像中的目标对象。In some embodiments, the label information of a pixel (such as the label information of a pixel in the first image or the label information of a pixel in the second image) can be recorded as 0 or 1. If the label information of a pixel is 0, It indicates that the pixel does not belong to the target object in the image. On the contrary, if the label information of a pixel is 1, it indicates that the pixel belongs to the target object in the image.
例如,若第一图像中某个像素点的标记信息是0,则可以表明该像素点不属于第一图像中第一特征区域的像素点,反之,若第一图像中某个像素点的标记信息是1,则可以表明该像素点属于第一图像中第一特征区域的像素点。For example, if the label information of a certain pixel in the first image is 0, it can indicate that the pixel does not belong to the pixel of the first feature area in the first image. On the contrary, if the label information of a certain pixel in the first image If the information is 1, it can indicate that the pixel belongs to the pixel of the first feature area in the first image.
再例如,若第二图像中某个像素点的标记信息是0,则可以表明该像素点不属于第二图像中第二特征区域的像素点,反之,若第二图像中某个像素点的标记信息是1,则可以表明该像素点属于第二图像中第二特征区域的像素点。For another example, if the label information of a certain pixel in the second image is 0, it can indicate that the pixel does not belong to the pixel of the second feature area in the second image. On the contrary, if the label information of a certain pixel in the second image is If the label information is 1, it can indicate that the pixel belongs to the pixel of the second feature area in the second image.
步骤S102,将第一图像输入预测神经网络,得到第一预测结果;第一预测结果包括:分别指示第一图像的各像素点是否属于第一特征区域的第一预测像素信息。 Step S102, input the first image into the prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel point of the first image belongs to the first feature area.
在一些实施例中,计算机设备可以调用预测神经网络对第一图像进行预测,即预测第一图像中每个像素点分别与目标对象之间的所属关系,可以将预测神经网络所预测的针对第一图像中各个像素点的预测结果称之为是第一预测结果。该第一预测结果包括:第一图像中各个像素点的第一预测像素信息,用于指示对应的像素点是否属于第一特征区域。每个像素点的第一预测像素信息可以包括:该像素点属于第一图像中目标对象的概率(可以称为目标概率)、及该像素点不属于第一图像中的目标对象(即属于第一图像中目标对象的背景图像)的概率(可以称为背景概率),第一图像中任一个像素点对应的目标概率和背景概率之和可以为1。In some embodiments, the computer device can call the prediction neural network to predict the first image, that is, predict the relationship between each pixel in the first image and the target object, and the prediction neural network can predict the relationship between the first image and the target object. The prediction result of each pixel in an image is called the first prediction result. The first prediction result includes: first predicted pixel information of each pixel in the first image, used to indicate whether the corresponding pixel belongs to the first feature area. The first predicted pixel information of each pixel may include: the probability that the pixel belongs to the target object in the first image (which may be called the target probability), and the pixel does not belong to the target object in the first image (that is, it belongs to the target object in the first image). The probability of the background image of the target object in an image (can be called the background probability), and the sum of the target probability and the background probability corresponding to any pixel in the first image can be 1.
其中,通过预测神经网络生成第一图像的第一预测结果的过程可以包括:计算机设备可以通过预测神经网络生成第一图像中每个像素点在第一图像中的掩码区域,其中,像素点的掩码区域可以是指用于选取该像素点的主要特征的区域。Wherein, the process of generating the first prediction result of the first image through the prediction neural network may include: the computer device may generate the mask area of each pixel in the first image through the prediction neural network, wherein the pixel The mask area may refer to the area used to select the main features of the pixel.
进而计算机设备可以通过预测神经网络根据第一图像中各个像素点在对应掩码区域内的特征,分别预测第一图像中各个像素点属于目标对象的目标概率,以及不属于目标对象的背景概率,以此得到第一图像中每个像素点的第一预测像素信息,第一图像中任一个像素点的第一预测像素信息就包括该像素点属于目标对象的目标概率(如该像素点的掩码区域内的图像特征是属于目标对象的特征的概率)以及该像素点不属于目标对象而是属于目标对象的背景图像的背景概率(如该像素点的掩码区域内的图像特征是属于目标对象的背景图像的特征的概率)。Furthermore, the computer device can predict the target probability that each pixel point in the first image belongs to the target object and the background probability that it does not belong to the target object based on the characteristics of each pixel point in the corresponding mask area in the first image through the prediction neural network, In this way, the first predicted pixel information of each pixel in the first image is obtained. The first predicted pixel information of any pixel in the first image includes the target probability that the pixel belongs to the target object (such as the mask of the pixel). The probability that the image features in the mask area belong to the target object) and the background probability that the pixel does not belong to the target object but belongs to the background image of the target object (for example, the image features in the mask area of the pixel belong to the target the probability of features of the background image of the object).
综上,上述第一图像中各个像素点的第一预测像素信息即可构成上述第一预测结果。In summary, the first predicted pixel information of each pixel in the first image can constitute the first prediction result.
步骤S103,将第二图像输入所述预测神经网络,得到第二预测结果;第二预测结果包括:分别指示第二图像的各像素点是否属于第二特征区域的第二预测像素信息。Step S103, input the second image into the prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information respectively indicating whether each pixel point of the second image belongs to the second feature area.
同理,计算机设备可以调用预测神经网络对第二图像进行预测,即预测第二图像中每个像素点分别与目标对象之间的所属关系,可以将预测神经网络所预测的针对第二图像中各个像素点的预测结果称之为是第二预测结果。该第二预测结果包括:指示第二图像中各个像素点是否属于第二特征区域的第二预测像素信息,该第二预测像素信息可以包括针对第二图像中各个像素点的第二预测像素信息。每个像素点的第二预测像素信息可以包括:该像素点属于第二图像中目标对象的目标概率、及该像素点不属于第二图像中的目标对象(即属于第二图像中目标对象的背景图像)的背景概率,第二图像中任一个像素点对应的目标概率和背景概率之和可以为1。In the same way, the computer device can call the prediction neural network to predict the second image, that is, predict the relationship between each pixel in the second image and the target object, and can use the prediction neural network to predict the second image. The prediction result of each pixel is called the second prediction result. The second prediction result includes: second predicted pixel information indicating whether each pixel point in the second image belongs to the second feature area. The second predicted pixel information may include second predicted pixel information for each pixel point in the second image. . The second predicted pixel information of each pixel point may include: a target probability that the pixel point belongs to the target object in the second image, and that the pixel point does not belong to the target object in the second image (that is, belongs to the target object in the second image). background probability of the second image), the sum of the target probability and the background probability corresponding to any pixel in the second image can be 1.
其中,通过预测神经网络生成第二图像的第二预测结果的过程可以包括:计算机设备可以通过预测神经网络生成第二图像中每个像素点在第二图像中的掩码区域,其中,像素点的 掩码区域可以是指用于选取该像素点的主要特征的区域。Wherein, the process of generating the second prediction result of the second image through the prediction neural network may include: the computer device may generate the mask area of each pixel in the second image through the prediction neural network, wherein the pixel of The mask area may refer to the area used to select the main features of the pixel.
进而计算机设备可以通过预测神经网络根据第二图像中各个像素点在对应掩码区域内的特征,分别预测第二图像中各个像素点属于目标对象的目标概率,以及不属于目标对象的背景概率,以此得到第二图像中每个像素点的第二预测像素信息,第二图像中任一个像素点的第二预测像素信息就包括该像素点属于目标对象的目标概率(如该像素点的掩码区域内的图像特征是属于目标对象的特征的概率)以及该像素点不属于目标对象而是属于目标对象的背景图像的背景概率(如该像素点的掩码区域内的图像特征是属于目标对象的背景图像的特征的概率)。Furthermore, the computer device can predict the target probability that each pixel in the second image belongs to the target object and the background probability that it does not belong to the target object based on the characteristics of each pixel in the corresponding mask area in the second image through the prediction neural network. In this way, the second predicted pixel information of each pixel in the second image is obtained. The second predicted pixel information of any pixel in the second image includes the target probability that the pixel belongs to the target object (such as the mask of the pixel). The probability that the image features in the mask area belong to the target object) and the background probability that the pixel does not belong to the target object but belongs to the background image of the target object (for example, the image features in the mask area of the pixel belong to the target the probability of features of the background image of the object).
上述第二图像中各个像素点的第二预测像素信息即可构成上述第二预测结果。The second predicted pixel information of each pixel in the second image can constitute the second prediction result.
其中,上述预测神经网络预测第一图像或者第二图像中的像素点的预测像素信息的过程与下述辅助神经网络预测第二图像中的像素点的预测像素信息的过程相同。The process by which the above-mentioned prediction neural network predicts the predicted pixel information of the pixels in the first image or the second image is the same as the process by which the following auxiliary neural network predicts the predicted pixel information of the pixels in the second image.
步骤S104,通过辅助神经网络对第二图像的各像素点进行分类预测,得到分类预测结果;分类预测结果用于指示第二图像中属于第一分类的像素点、及属于第二分类的像素点。Step S104, perform classification prediction on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixels belonging to the first classification and the pixels belonging to the second classification in the second image. .
在一些实施例中,本申请实施例中可以通过辅助神经网络来判断第二图像中哪些像素点是正确标记(即具有准确的标记信息)的以及哪些像素点是错误标记(即具有不准确的标记信息)的,进而让预测神经网络对第二图像中正确标记的像素点以及错误标记的像素点进行区别训练,具体可以参见下述内容描述。In some embodiments, the auxiliary neural network can be used in the embodiments of the present application to determine which pixels in the second image are correctly labeled (that is, have accurate labeling information) and which pixels are incorrectly labeled (that is, have inaccurate labeling information). marking information), and then allow the predictive neural network to perform differential training on correctly marked pixels and incorrectly marked pixels in the second image. For details, please refer to the following description.
更多的,可以将通过辅助神经网络所预测出的第二图像中具有正确标记的像素点称之为是第一分类的像素点,该第一分类的像素点就包括通过辅助神经网络所预测出的第二图像中具有正确标记信息的像素点。同理,可以将通过辅助神经网络所预测出的第二图像中具有错误标记信息的像素点称之为是第二分类的像素点,该第二分类的像素点就包括通过辅助神经网络所预测出的第二图像中具有错误标记信息的像素点。Furthermore, the pixels with correct labels in the second image predicted by the auxiliary neural network can be called pixels of the first category. The pixels of the first category include the pixels predicted by the auxiliary neural network. pixels with correct label information in the second image. Similarly, the pixels with wrong label information in the second image predicted by the auxiliary neural network can be called pixels of the second category. The pixels of the second category include the pixels predicted by the auxiliary neural network. Pixels with incorrect label information in the second image.
其中,辅助神经网络预测第二图像中各个像素点是正确标记的还是错误标记的过程可以包括:The process of auxiliary neural network predicting whether each pixel in the second image is correctly labeled or incorrectly labeled may include:
计算机设备可以调用辅助神经网络根据第二图像生成第二特征区域的区域中心特征(可以理解为是第二图像中目标对象的对象中心特征)以及第二图像中各个像素点的像素特征。The computer device can call the auxiliary neural network to generate the area center feature of the second feature area (which can be understood as the object center feature of the target object in the second image) and the pixel features of each pixel point in the second image based on the second image.
其中,该区域中心特征可以包括第二特征区域的目标中心特征以及第二特征区域的背景中心特征。该目标中心特征用于表征第二图像中目标对象的结构特征(也就是第二图像中第二特征区域内的图像的结构特征,如第二图像中目标对象的纹理结构、色彩结构以及边缘结构等特征),换句话说,该目标中心特征可以用于表示第二图像中目标对象的特征。该背景中心特征用于表征第二图像中目标对象的背景图像的结构特征(即第二图像中第二特征区域的 背景图像的结构特征,如第二图像中目标对象的背景图像的纹理结构、色彩结构以及边缘结构等特征),换句话说,该背景中心特征可以用于表示第二图像中目标对象的背景图像的特征。The area center feature may include the target center feature of the second feature area and the background center feature of the second feature area. The target center feature is used to characterize the structural features of the target object in the second image (that is, the structural features of the image within the second feature area in the second image, such as the texture structure, color structure and edge structure of the target object in the second image) and other features), in other words, the target center feature can be used to represent the features of the target object in the second image. The background center feature is used to characterize the structural features of the background image of the target object in the second image (i.e., the second characteristic area of the second image in the second image). Structural features of the background image, such as texture structure, color structure, edge structure and other features of the background image of the target object in the second image). In other words, the background center feature can be used to represent the background image of the target object in the second image. Characteristics.
其中,上述目标对象的目标中心特征是综合了所预测的第二图像中属于目标对象的像素点的像素特征得到,上述目标对象的背景中心特征是综合了所预测的第二图像中属于目标对象的背景图像的像素点的像素特征得到;上述辅助神经网络所生成的第二图像中各个像素点的像素特征,可以是辅助神经网络中所生成的比较准确(可以根据实验经验判定)的第二图像中各个像素点的中间特征,辅助神经网络所生成的第二图像中任一个像素点的像素特征可以用于表示该像素点的结构特征。第二图像中各个像素点的像素特征以及第二特征区域的区域中心特征(包括目标中心特征和背景中心特征)的具体生成过程还可以参见下述图5对应实施例中的描述。The target center feature of the target object is obtained by integrating the pixel features of the predicted pixels belonging to the target object in the second image, and the background center feature of the target object is obtained by integrating the predicted pixel features of the target object in the second image. The pixel characteristics of the pixels in the background image are obtained; the pixel characteristics of each pixel in the second image generated by the above-mentioned auxiliary neural network can be the relatively accurate (can be determined based on experimental experience) second image generated by the auxiliary neural network. The intermediate features of each pixel in the image and the pixel features of any pixel in the second image generated by the auxiliary neural network can be used to represent the structural features of the pixel. For the specific generation process of the pixel features of each pixel in the second image and the area center features of the second feature area (including target center features and background center features), please refer to the description in the corresponding embodiment of FIG. 5 below.
进而,计算机设备就可以通过上述生成的区域中心特征、第二图像中各个像素点的像素特征以及第二图像中各个像素点的标记信息,来得到针对第二图像中的像素点的分类预测结果,该分类预测结果用于指示在第二图像上属于第一分类的像素点(即正确标记的像素点)、以及属于第二分类的像素点(即错误标记的像素点)。Furthermore, the computer device can obtain the classification prediction result for the pixels in the second image through the above-generated region center features, pixel features of each pixel in the second image, and label information of each pixel in the second image. , the classification prediction result is used to indicate pixels belonging to the first category (ie, correctly labeled pixels) and pixels belonging to the second category (ie, incorrectly labeled pixels) on the second image.
其中,通过区域中心特征、第二图像中各个像素点的像素特征以及第二图像中各个像素点的标记信息,得到针对第二图像中的像素点的分类预测结果的具体过程也可以参见下述图5对应实施例中的描述。Among them, the specific process of obtaining the classification prediction results for the pixels in the second image through the regional center features, the pixel features of each pixel in the second image, and the label information of each pixel in the second image can also be found below. Figure 5 corresponds to the description in the embodiment.
通过上述可以知道的是,通过辅助神经网络可以将第二图像中的像素点分为两类,一类是第一分类的像素点,一类是第二分类的像素点。其中,第一分类的像素点包括辅助神经网络所预测得到的第二图像中具有准确的标记信息(即正确标记)的像素点,第二分类的像素点包括辅助神经网络所预测得到的第二图像中不具有准确的标记信息(即错误标记)的像素点。From the above, it can be known that the pixels in the second image can be divided into two categories through the auxiliary neural network, one category is the pixel points of the first category, and the other category is the pixel points of the second category. Among them, the pixels of the first category include pixels with accurate label information (that is, correct labeling) in the second image predicted by the auxiliary neural network, and the pixels of the second category include the second pixels predicted by the auxiliary neural network. Pixels in the image that do not have accurate labeling information (i.e., are incorrectly labeled).
此外,需要进行说明的是,本申请实施例可以通过若干第一图像和若干第二图像不断对预测神经网络进行迭代训练,对预测神经网络进行迭代训练的过程中,辅助神经网络的网络参数也会通过预测神经网络所更新(即优化)的网络参数进行迭代更新。其中,通过预测神经网络迭代更新(即迭代优化)后的网络参数对辅助神经网络的网络参数进行迭代更新的具体原理可以参见下述步骤S105中的描述。In addition, it should be noted that the embodiments of the present application can continuously iteratively train the predictive neural network through several first images and several second images. During the process of iteratively training the predictive neural network, the network parameters of the auxiliary neural network are also It will be updated iteratively by predicting the network parameters updated (that is, optimized) by the neural network. The specific principle of iteratively updating the network parameters of the auxiliary neural network by predicting the network parameters after iterative update (ie, iterative optimization) of the neural network can be found in the description in step S105 below.
步骤S105,根据第一预测结果、第二预测结果、及分类预测结果,对预测神经网络进行网络参数优化,得到训练后的预测神经网络,所述训练后的预测神经网络用于对目标图像进行图像分割。Step S105: According to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized to obtain a trained prediction neural network. The trained prediction neural network is used to perform prediction on the target image. Image segmentation.
在一些实施例中,计算机设备可以根据上述第一预测结果、第二预测结果、及分类预测 结果,生成预测神经网络对各像素点(包括第一图像中的各像素点和第二图像中的各像素点)的预测像素信息最终的预测偏差(可以称为预测偏差),该预测偏差用于表示预测神经网络所预测的各像素点的预测像素信息与像素点的标记信息之间的偏差,该预测偏差又可以理解为是预测神经网络的预测损失。In some embodiments, the computer device may predict based on the first prediction result, the second prediction result, and the classification prediction As a result, the final prediction deviation (which can be called a prediction deviation) of the prediction pixel information of each pixel point (including each pixel point in the first image and each pixel point in the second image) of the prediction neural network is generated, and the prediction deviation is expressed by In order to represent the deviation between the predicted pixel information of each pixel predicted by the prediction neural network and the label information of the pixel, the prediction deviation can also be understood as the prediction loss of the prediction neural network.
其中,生成预测神经网络对像素点的预测像素信息的预测偏差的具体过程可以参见下述图8对应实施例中的相关描述。For the specific process of generating the prediction deviation of the predicted pixel information of the pixel point by the prediction neural network, please refer to the relevant description in the corresponding embodiment of FIG. 8 below.
计算机设备可以根据上述预测偏差进行网络参数优化,该网络参数优化就包括对预测神经网络进行参数优化、对辅助神经网络进行参数优化中的任意一种或者两种的组合。The computer equipment can perform network parameter optimization based on the above prediction deviation. The network parameter optimization includes any one of parameter optimization of the predictive neural network and parameter optimization of the auxiliary neural network, or a combination of both.
其中,由于可以对预测神经网络不断进行迭代更新(即对预测神经网络的网络参数进行迭代优化),因此,对预测神经网络的每一次迭代更新都会通过上述过程生成对应的预测偏差,通过预测神经网络每次迭代训练过程中所产生的预测偏差不断对预测神经网络的网络参数进行迭代更新(即迭代修正或优化),直到对预测神经网络的网络参数更新完成,即可得到训练好的预测神经网络(可以称之为是参数优化后的预测神经网络),该训练好的预测神经网络就包括修正完成(即优化完成)的网络参数。Among them, since the predictive neural network can be continuously iteratively updated (that is, the network parameters of the predictive neural network are iteratively optimized), each iterative update of the predictive neural network will generate the corresponding prediction deviation through the above process. The prediction deviation generated during each iterative training process of the network continuously updates the network parameters of the predictive neural network iteratively (i.e., iteratively corrects or optimizes). Until the network parameters of the predictive neural network are updated, the trained predictive neural network can be obtained. network (which can be called a predictive neural network after parameter optimization). The trained predictive neural network includes network parameters that have been corrected (that is, optimized).
在一些实施例中,对预测神经网络的网络参数更新完成,可以是指对预测神经网络的网络参数更新至收敛状态,或者,是指对预测神经网络的网络参数迭代更新的次数(即迭代训练次数)达到某个次数阈值,该次数阈值可以根据实际应用场景进行设置。In some embodiments, the completion of updating the network parameters of the predictive neural network may refer to updating the network parameters of the predictive neural network to a convergence state, or may refer to the number of iterative updates of the network parameters of the predictive neural network (ie, iterative training). times) reaches a certain number threshold, which can be set according to the actual application scenario.
请参见图4,图4是本申请实施例提供的一种网络训练的场景示意图。如图4所示,第一图像中包含多个像素点,第一图像具有第一监督数据,第一监督数据中包括第一图像中每个像素点的标记信息;同理,第二图像中也包含多个像素点,第二图像具有第二监督数据,第二监督数据中包括第二图像中每个像素点的标记信息。Please refer to Figure 4. Figure 4 is a schematic diagram of a network training scenario provided by an embodiment of the present application. As shown in Figure 4, the first image contains multiple pixels, the first image has first supervision data, and the first supervision data includes the label information of each pixel in the first image; similarly, in the second image Also containing a plurality of pixels, the second image has second supervision data, and the second supervision data includes label information of each pixel in the second image.
计算机设备可以调用辅助神经网络生成针对第二图像中各个像素点的分类预测结果,该分类预测结果包括第二图像中各像素点的分类结果,第二图像中任一个像素点的分类结果就指示了该像素点的标记信息是正确标记的还是错误标记的,即该像素点是属于第一分类的像素点还是属于第二分类的像素点。The computer device can call the auxiliary neural network to generate a classification prediction result for each pixel point in the second image. The classification prediction result includes a classification result for each pixel point in the second image. The classification result of any pixel point in the second image indicates It determines whether the marking information of the pixel is correctly marked or incorrectly marked, that is, whether the pixel belongs to the first category or the second category.
因此,可以将第二图像中正确标记的像素点作为第一分类的像素点,将第二图像中错误标记的像素点作为第二分类的像素点。因此,预测神经网络可以对第一图像中的像素点、第二图像中第一分类的像素点以及第二图像中第二分类的像素点进行区别训练,进而得到训练好的预测神经网络。并且,在对预测神经网络进行训练过程中,也会将预测神经网络的网络参数传递给辅助神经网络,使得在对预测神经网络进行训练过程中,不断进行参数优化的辅助神经网络也可以对第二图像中各个像素点的分类结果进行更准确的判定,通过对第二图像 中各个像素点的分类结果进行更准确的判定也可以实现对预测神经网络更准确的训练。Therefore, the correctly marked pixels in the second image can be regarded as the pixels of the first category, and the incorrectly marked pixels in the second image can be regarded as the pixels of the second category. Therefore, the prediction neural network can perform differential training on the pixels in the first image, the pixels of the first category in the second image, and the pixels of the second category in the second image, thereby obtaining a trained prediction neural network. Moreover, during the training process of the predictive neural network, the network parameters of the predictive neural network will also be passed to the auxiliary neural network, so that during the training process of the predictive neural network, the auxiliary neural network that continuously optimizes parameters can also The classification results of each pixel in the second image can be determined more accurately by classifying the second image More accurate determination of the classification results of each pixel in the image can also enable more accurate training of the predictive neural network.
通过上述可以知道,在对预测神经网络的网络参数进行迭代优化的过程中,也会通过预测神经网络的网络参数来迭代更新辅助神经网络的网络参数,预测神经网络可以理解为是学生网络(即学生模型),辅助神经网络可以理解为是教师网络(即教师模型)。本申请可以采用类似均值教师(Mean-Teacher,MT)架构的设计来通过预测神经网络的网络参数更新辅助神经网络的网络参数,因为MT的权重平均的自集成策略可以有效提高中间特征表示和最终预测的稳定性和平滑度,这非常适合基于特征原型(上述目标中心特征可以表征目标对象的特征原型)的标注分离策略(即区分第二图像中各个像素点的标记信息是正确标记还是错误标记),因为这能够得到更高稳定和平滑的特征空间(如由第二图像中各个像素点的像素特征构成的特征空间),该过程可以如下述公式(1)所示:
From the above, we can know that in the process of iterative optimization of the network parameters of the predictive neural network, the network parameters of the auxiliary neural network will also be iteratively updated through the network parameters of the predictive neural network. The predictive neural network can be understood as a student network (i.e. student model), the auxiliary neural network can be understood as a teacher network (i.e. teacher model). This application can use a design similar to the Mean-Teacher (MT) architecture to update the network parameters of the auxiliary neural network by predicting the network parameters of the neural network, because the weighted average self-integration strategy of MT can effectively improve the intermediate feature representation and final The stability and smoothness of the prediction, which is very suitable for the labeling separation strategy based on the feature prototype (the above-mentioned target center feature can characterize the feature prototype of the target object) (that is, distinguishing whether the labeling information of each pixel in the second image is correctly labeled or incorrectly labeled) ), because this can obtain a more stable and smooth feature space (such as a feature space composed of the pixel features of each pixel in the second image), the process can be shown as the following formula (1):
其中,t和t-1均表示对预测神经网络的迭代训练次数(也可以理解为是对辅助神经网络的网络参数进行迭代优化的次数),t表示第t次迭代训练,t-1表示第t-1次迭代训练。表示辅助神经网络在第t次迭代训练后的网络参数,表示辅助神经网络在第t-1次迭代训练后的网络参数,θt表示预测神经网络在第t次迭代训练后的网络参数。Among them, t and t-1 both represent the number of iterative trainings of the predictive neural network (which can also be understood as the number of iterative optimizations of the network parameters of the auxiliary neural network), t represents the t-th iterative training, and t-1 represents the t-th iterative training. t-1 iterations of training. Represents the network parameters of the auxiliary neural network after the t-th iteration training, represents the network parameters of the auxiliary neural network after the t-1 iteration of training, and θ t represents the network parameters of the predictive neural network after the t-th iteration of training.
更多的,α表示EMA(指数移动平均)衰减率,α可以设置为0.99,通过采用指数移动平均的衰减率,可以使得特征和网络的预测结果可以在相邻的迭代训练过程中平滑,从而有利于目标对象的特征原型(如目标中心特征)生成,以实现鲁棒的标注分离。More, α represents the EMA (exponential moving average) decay rate, and α can be set to 0.99. By using the exponential moving average decay rate, the prediction results of the feature and network can be smoothed in the adjacent iterative training process, thus It is conducive to the generation of feature prototypes of target objects (such as target center features) to achieve robust annotation separation.
通过上述过程,可以理解的是,在预测神经网络的网络参数不断进行迭代更新的过程中,也可以通过预测神经网络每次迭代更新后的网络参数迭代更新辅助神经网络的网络参数,使得可以将预测神经网络从第一图像和第二图像中学习到的特征传递给辅助神经网络,进而通过辅助神经网络也可以在每次迭代训练过程中对第二图像中的像素点的分类结果进行更准确的判别。Through the above process, it can be understood that in the process of continuous iterative updating of the network parameters of the predictive neural network, the network parameters of the auxiliary neural network can also be iteratively updated through the network parameters after each iterative update of the predictive neural network, so that the network parameters of the auxiliary neural network can be updated The features learned by the predictive neural network from the first image and the second image are passed to the auxiliary neural network, and then through the auxiliary neural network, the classification results of the pixels in the second image can be more accurately classified during each iterative training process. judgment.
本申请实施例中,训练好的预测神经网络可以用于对图像数据中的目标对象进行分割,如通过训练好的预测神经网络可以识别出图像中属于目标对象的像素点,通过识别出的属于目标对象的像素点,可以从图像中分割出目标对象所在的图像区域(即特征区域)。In the embodiment of the present application, the trained predictive neural network can be used to segment the target object in the image data. For example, the trained predictive neural network can identify the pixels belonging to the target object in the image. The pixels of the target object can be used to segment the image area where the target object is located (i.e., the feature area) from the image.
举个例子,计算机设备可以获取到目标图像,目标图像可以包括目标对象,可以将目标图像中目标对象所在的图像区域称之为是目标特征区域,目标图像可以是任意需要进行目标对象的分割的图像。For example, the computer device can obtain a target image. The target image can include a target object. The image area where the target object is located in the target image can be called a target feature area. The target image can be any object that needs to be segmented. image.
因此,计算机设备可以调用训练好的预测神经网络(即参数优化后的预测神经网络)对目标图像进行预测,即预测目标图像中每个像素点与目标对象之间的所属关系,得到目标图 像的目标预测结果。Therefore, the computer device can call a trained predictive neural network (i.e., a predictive neural network with optimized parameters) to predict the target image, that is, predict the relationship between each pixel in the target image and the target object, and obtain the target image. Image target prediction results.
该目标预测结果包括:用于判别目标图像的各个像素点是否属于目标特征区域的目标预测结果,该目标预测结果就包括目标图像中各个像素点的预测像素信息。目标图像中任一个像素点的预测像素信息可以包括该像素点属于目标图像中的目标对象的目标概率以及该像素点不属于目标图像中的目标对象的背景概率,该背景概率即为该像素点属于目标图像中目标对象的背景图像的概率。The target prediction result includes: a target prediction result used to determine whether each pixel point in the target image belongs to the target feature area. The target prediction result includes the predicted pixel information of each pixel point in the target image. The predicted pixel information of any pixel in the target image may include the target probability that the pixel belongs to the target object in the target image and the background probability that the pixel does not belong to the target object in the target image. The background probability is the pixel. The probability of a background image belonging to the target object in the target image.
因此,计算机设备可以将目标图像中对应的目标概率大于背景概率的像素点,作为识别出来的属于目标对象的像素点(即识别出来的属于目标特征区域的像素点),因此,可以将识别出来的属于目标对象的像素点从目标图像中分割出来,即实现了对目标图像中目标对象的分割,也就实现了在目标图像中对目标特征区域的图像的分割。Therefore, the computer device can regard the pixels in the target image whose corresponding target probability is greater than the background probability as the identified pixels belonging to the target object (that is, the identified pixels belonging to the target feature area). Therefore, the identified pixels can be The pixel points belonging to the target object are segmented from the target image, that is, the segmentation of the target object in the target image is achieved, and the image segmentation of the target feature area in the target image is also achieved.
其中,在一些场景下,高质量的标注数据(如第一图像)比较难获取,通常需要专家来标注,而低质量的标注数据(如第二图像)则是比较容易获取,因此,高质量的标注数据的数量就非常少,而低质量的标注数据就非常多,因此,采用本申请实施例所提供的方法就可以解决在此种场景下模型(网络)训练不准确的问题,采用本申请实施例所提供的方法可以在少量的高质量标注数据以及大量的低质量标注数据所构成的混合样本中对样本进行分离学习,进而准确地学习到样本准确的特征,进而训练得到准确的模型(如预测神经网络)。Among them, in some scenarios, high-quality annotation data (such as the first image) is difficult to obtain and usually requires experts to annotate it, while low-quality annotation data (such as the second image) is relatively easy to obtain. Therefore, high-quality annotation data (such as the second image) is relatively easy to obtain. The amount of labeled data is very small, and the labeled data of low quality is very large. Therefore, the problem of inaccurate model (network) training in this scenario can be solved by using the method provided by the embodiments of this application. The method provided by the application embodiment can separate and learn samples in a mixed sample composed of a small amount of high-quality annotated data and a large amount of low-quality annotated data, thereby accurately learning the accurate characteristics of the sample, and then training to obtain an accurate model. (such as predictive neural network).
本申请实施例中,由于同一类分割区域(即目标对象这一类对象所在的区域)往往具有高度相关的结构特征,因此,通过目标对象的特征原型(如目标中心特征)可以对低质量的第二图像中的像素点的分类结果进行准确判定,后续通过第二图像中不同分类的像素点对预测神经网络进行区别训练,并结合上高质量的第一图像作为监督训练的数据,一起训练预测神经网络,可以提升对预测神经网络的训练效果,进而训练得到更为准确的预测神经网络,通过所训练的准确的预测神经网络也可以实现对图像中的目标对象的准确分割。In the embodiment of the present application, since the same type of segmentation area (that is, the area where the target object is located) often has highly correlated structural features, low-quality images can be detected through the feature prototype of the target object (such as the target center feature). The classification results of the pixels in the second image are accurately judged. Subsequently, the prediction neural network is trained differentially through the pixels of different categories in the second image, and combined with the high-quality first image as supervised training data, they are trained together. The predictive neural network can improve the training effect of the predictive neural network, and then train a more accurate predictive neural network. Through the trained accurate predictive neural network, it can also achieve accurate segmentation of target objects in the image.
本申请实施例可以获取具有第一特征区域的第一图像和具有第二特征区域的第二图像;通过预测神经网络对第一图像进行预测,得到第一预测结果;第一预测结果包括:指示第一图像的各像素点是否属于第一特征区域的第一预测像素信息;通过预测神经网络对第二图像进行预测,得到第二预测结果;第二预测结果包括:指示第二图像的各像素点是否属于第二特征区域的第二预测像素信息;通过辅助神经网络对第二图像的像素点进行分类预测,得到分类预测结果;分类预测结果用于指示在第二图像上属于第一分类的像素点、及属于第二分类的像素点;根据第一预测结果、第二预测结果、及分类预测结果,进行网络参数优化,该网络参数优化包括对预测神经网络进行参数优化、对辅助神经网络进行参数优化中的任意一种或两种的组合。由此可见,本申请实施例提出的方法可以通过辅助神经网络对第二图像中 的像素点进行分类,后续可以通过辅助神经网络对第二图像中各像素点的分类结果来对预测神经网络或辅助神经网络进行参数优化,这可以提升对预测神经网络进行参数优化的准确性,后续通过参数优化后的预测神经网络也可以准确地对图像中的特征区域进行分割。Embodiments of the present application can acquire a first image with a first characteristic area and a second image with a second characteristic area; predict the first image through a prediction neural network to obtain a first prediction result; the first prediction result includes: indication First predicted pixel information of whether each pixel of the first image belongs to the first feature area; predicting the second image through a prediction neural network to obtain a second prediction result; the second prediction result includes: indicating each pixel of the second image The second predicted pixel information of whether the point belongs to the second feature area; classify and predict the pixel points of the second image through the auxiliary neural network to obtain the classification prediction result; the classification prediction result is used to indicate that the pixels belonging to the first classification on the second image Pixel points, and pixel points belonging to the second category; perform network parameter optimization based on the first prediction result, the second prediction result, and the classification prediction result. The network parameter optimization includes parameter optimization of the prediction neural network, and auxiliary neural network Perform any one or a combination of both parameter optimizations. It can be seen that the method proposed in the embodiment of the present application can use the auxiliary neural network to analyze the second image Classify the pixels, and then use the auxiliary neural network to classify the pixels in the second image to optimize the parameters of the predictive neural network or the auxiliary neural network. This can improve the accuracy of parameter optimization of the predictive neural network. Subsequently, the predictive neural network with optimized parameters can also accurately segment the feature areas in the image.
请参见图5,图5是本申请实施例提供的一种确定分类预测结果的流程示意图。本申请实施例中的执行主体可以与上述图3中的执行主体相同,如图5所示,该方法可以包括:Please refer to Figure 5. Figure 5 is a schematic flowchart of determining a classification prediction result provided by an embodiment of the present application. The execution subject in the embodiment of the present application may be the same as the execution subject in Figure 3 above. As shown in Figure 5, the method may include:
步骤S201,通过辅助神经网络基于第二图像生成第二特征区域的区域中心特征及第二图像中各像素点的像素特征。Step S201: Generate regional center features of the second feature region and pixel features of each pixel in the second image based on the second image through an auxiliary neural network.
在一些实施例中,计算机设备可以将第二图像输入辅助神经网络,以对该第二图像进行特征学习,进而生成该第二图像中每个像素点的像素特征。In some embodiments, the computer device can input the second image into the auxiliary neural network to perform feature learning on the second image, and thereby generate pixel features for each pixel in the second image.
在一些实施例中,辅助神经网络中可以包含多个用于对第二图像进行特征学习的卷积层,因此,每个像素点的像素特征可以是该多个卷积层中倒数第二个卷积层所生成的各个像素点的平滑特征,因为实验证明倒数第二个卷积层所生成的像素点的平滑特征是效果比较好的特征。In some embodiments, the auxiliary neural network may include multiple convolutional layers for feature learning on the second image. Therefore, the pixel feature of each pixel may be the penultimate among the multiple convolutional layers. The smooth features of each pixel generated by the convolution layer, because experiments have proven that the smooth features of the pixels generated by the penultimate convolution layer are better features.
更多的,计算机设备还可以通过辅助神经网络预测第二图像中每个像素点的掩码区域,该掩码区域是用于选取第二图像中各个像素点主要特征的区域。计算机设备还可以生成第二图像中每个像素点的掩码区域的预测准确指数,该预测准确指数体现了所生成的第二图像中各个像素点的掩码区域的不确定性,顾名思义,第二图像中任一个像素点的掩码区域的预测准确指数就表征了该像素点的掩码区域的准确性。Furthermore, the computer device can also predict the mask area of each pixel in the second image through an auxiliary neural network. The mask area is used to select the main features of each pixel in the second image. The computer device can also generate a prediction accuracy index of the mask area of each pixel in the second image. The prediction accuracy index reflects the uncertainty of the mask area of each pixel in the generated second image. As the name suggests, the prediction accuracy index The prediction accuracy index of the mask area of any pixel in the image represents the accuracy of the mask area of the pixel.
在一些实施例中,本申请可以通过Monte Carlo dropout(蒙特卡罗)进行贝叶斯逼近,以生成第二图像中每个像素点的掩码区域的预测准确指数,该过程可以是:In some embodiments, this application can perform Bayesian approximation through Monte Carlo dropout (Monte Carlo) to generate the prediction accuracy index of the mask area of each pixel in the second image. The process can be:
计算机设备可以分别对辅助神经网络的网络参数(可以称为神经元)进行K次随机丢弃(即dropout),就可以得到辅助神经网络的K个变形网络,K为正整数,K的具体取值可以根据实际应用场景确定,任一次对网络参数的丢弃都是对具有完整网络参数的辅助神经网络执行的,任一个变形网络是对辅助神经网络的网络参数进行一次随机丢弃后得到的,对辅助神经网络的网络参数进行随机丢弃可以是指将辅助神经网络的部分网络参数进行随机置0,置0的网络参数也就是被丢弃的网络参数,置0的网络参数在后续预测过程中不发挥作用。Computer equipment can randomly drop (i.e., dropout) K times the network parameters of the auxiliary neural network (which can be called neurons), and then K deformed networks of the auxiliary neural network can be obtained. K is a positive integer, and the specific value of K is It can be determined according to the actual application scenario. Any discarding of network parameters is performed on the auxiliary neural network with complete network parameters. Any deformation network is obtained by randomly discarding the network parameters of the auxiliary neural network. For the auxiliary neural network, Randomly discarding the network parameters of the neural network can mean randomly setting some network parameters of the auxiliary neural network to 0. The network parameters set to 0 are also the discarded network parameters. The network parameters set to 0 will not play a role in the subsequent prediction process. .
其中,可以理解的是,此处,对辅助神经网络的网络参数进行随机丢弃以得到辅助神经网络的变形网络,主要是为了后续通过该变形网络生成像素点的掩码区域的预测准确指数,而上述各个像素点的像素特征以及各个像素点的掩码区域都是通过未进行网络参数丢弃的辅助神经网络预测得到的。Among them, it can be understood that here, the network parameters of the auxiliary neural network are randomly discarded to obtain the deformation network of the auxiliary neural network, mainly for the subsequent generation of the prediction accuracy index of the mask area of pixels through the deformation network, and The above-mentioned pixel features of each pixel and the mask area of each pixel are predicted by the auxiliary neural network without discarding network parameters.
其中,可以将第二图像中任一个像素点表示为目标像素点,由于获取每个像素点的掩码 区域的预测准确指数的过程都相同,因此,此处以获取目标像素点的掩码区域的预测准确指数为例进行说明。Among them, any pixel in the second image can be represented as a target pixel. Since the mask of each pixel is obtained, The process of predicting the accuracy index of a region is the same. Therefore, obtaining the prediction accuracy index of the mask region of the target pixel is used as an example for explanation.
因此,计算机设备可以分别通过每个变形网络根据目标像素点的掩码区域预测针对目标像素点的预测像素信息(即根据第二图像中目标像素点的掩码区域处的图像特征来获取该预测像素信息),可以将通过变形网络所预测得到的像素点的预测像素信息称之为是变形预测像素信息,任一个变形网络可以预测得到针对目标像素点的一个变形预测像素信息。上述过程可以理解为是对目标像素点进行K次前向随机推理,让K个变形网络对目标像素点进行K次softmax(逻辑回归)预测,以得到目标像素点的各个变形预测像素信息。Therefore, the computer device can separately predict the predicted pixel information for the target pixel point according to the mask area of the target pixel point through each deformation network (ie, obtain the prediction based on the image features at the mask area of the target pixel point in the second image). Pixel information), the predicted pixel information of a pixel predicted by the deformation network can be called deformation prediction pixel information, and any deformation network can predict a deformation prediction pixel information for the target pixel. The above process can be understood as performing K times of forward random inference on the target pixel, and letting K deformation networks perform K times of softmax (logistic regression) prediction on the target pixel to obtain the predicted pixel information of each deformation of the target pixel.
其中,任一个变形预测像素信息可以包括对应的变形网络所预测的目标像素点属于第二图像中第二特征区域内的目标概率(可以称为第一预测概率,即属于第二图像中的目标对象的目标概率),以及包括对应的变形网络所预测的目标像素点不属于第二图像中第二特征区域内的背景概率(可以称为第二预测概率,即属于第二图像中目标对象的背景图像中的像素点的概率)。其中,第一预测概率和第二预测概率之和可以为1。第二图像中目标对象的背景图像是指第二图像中除目标对象的图像之外的图像。Wherein, any deformation prediction pixel information may include the target probability that the target pixel predicted by the corresponding deformation network belongs to the target in the second feature area in the second image (can be called the first prediction probability, that is, the target pixel in the second image belongs to target probability of the object), and includes the background probability that the target pixel predicted by the corresponding deformation network does not belong to the second feature area in the second image (can be called the second prediction probability, that is, the target pixel that belongs to the target object in the second image probability of pixels in the background image). Wherein, the sum of the first prediction probability and the second prediction probability may be 1. The background image of the target object in the second image refers to the image in the second image other than the image of the target object.
因此,计算机设备就可以根据上述K个变形网络所获取到的K个变形预测像素信息来确定目标像素点的掩码区域的预测准确指数,具体可以参见下述内容描述。Therefore, the computer device can determine the prediction accuracy index of the mask area of the target pixel point based on the K deformation prediction pixel information obtained by the above K deformation networks. For details, please refer to the following description.
由于任一个变形预测像素信息包括一个第一预测概率,因此,K个变形预测像素信息共包含K个第一预测概率,计算机设备可以获取该K个第一预测概率之间的标准差,并将该标准差作为针对目标像素点的目标预测准确指数,该目标预测准确指数也就指示了将目标像素点预测为属于目标对象的准确性。Since any deformation prediction pixel information includes a first prediction probability, the K deformation prediction pixel information includes a total of K first prediction probabilities, and the computer device can obtain the standard deviation between the K first prediction probabilities, and The standard deviation serves as the target prediction accuracy index for the target pixel, and the target prediction accuracy index also indicates the accuracy of predicting the target pixel as belonging to the target object.
更多的,任一个变形预测像素信息包括一个第二预测概率,因此,K个变形预测像素信息共包含K个第二预测概率,计算机设备可以获该K个第二预测概率之间的标准差,将该标准差作为针对目标像素点的背景预测准确指数,该背景预测准确指数就指示了将目标像素点预测为属于目标对象的背景图像的准确性。Furthermore, any deformation prediction pixel information includes a second prediction probability. Therefore, the K deformation prediction pixel information includes a total of K second prediction probabilities, and the computer device can obtain the standard deviation between the K second prediction probabilities. , the standard deviation is used as the background prediction accuracy index for the target pixel. The background prediction accuracy index indicates the accuracy of predicting the target pixel as a background image belonging to the target object.
因此,可以将针对目标像素点的目标预测准确指数和背景预测准确指数都作为目标像素点的掩码区域的预测准确指数。计算机设备可以以获取目标像素点的掩码区域的预测准确指数同样的方式获取到第二图像中每个像素点的掩码区域的预测准确指数。Therefore, both the target prediction accuracy index and the background prediction accuracy index for the target pixel can be used as the prediction accuracy index of the mask area of the target pixel. The computer device can obtain the prediction accuracy index of the mask area of each pixel in the second image in the same manner as the prediction accuracy index of the mask area of the target pixel.
更多的,计算机设备还可以通过上述辅助神经网络(此处的辅助神经网络是指具有完整网络参数的网络,上述K个变形网络可以是对该辅助神经网络的网络参数进行随机丢弃得到)根据所生成的第二图像中每个像素点的掩码区域,预测得到针对第二图像的第三预测结果,该第三预测结果可以包括:第二图像的各个像素点的第三预测像素信息,每个像素点的第三 预测像素信息指示该像素点是否属于第二特征区域。Furthermore, the computer equipment can also use the above-mentioned auxiliary neural network (the auxiliary neural network here refers to a network with complete network parameters, and the above-mentioned K deformed networks can be obtained by randomly discarding the network parameters of the auxiliary neural network) according to The generated mask area of each pixel point in the second image is predicted to obtain a third prediction result for the second image. The third prediction result may include: third prediction pixel information of each pixel point of the second image, The third of each pixel The predicted pixel information indicates whether the pixel belongs to the second feature area.
每个像素点的第三预测像素信息包括辅助神经网络所预测的该像素点属于第二图像中第二特征区域的概率(可以称为目标概率),以及包括辅助神经网络所预测的该像素点不属于第二图像中第二特征区域的概率(可以称为背景概率)。The third predicted pixel information of each pixel includes the probability that the pixel belongs to the second feature area in the second image predicted by the auxiliary neural network (which can be called the target probability), and includes the probability of the pixel predicted by the auxiliary neural network. The probability of not belonging to the second feature area in the second image (can be called background probability).
因此,计算机设备可以根据辅助神经网络生成的上述第二图像中每个像素点的像素特征、第二图像中每个像素点的掩码区域的预测准确指数以及第三预测结果,生成目标对象的区域中心特征,该过程如下述内容描述。Therefore, the computer device can generate the target object based on the pixel characteristics of each pixel in the second image generated by the auxiliary neural network, the prediction accuracy index of the mask area of each pixel in the second image, and the third prediction result. Regional center characteristics, the process is described below.
计算机设备可以从第二图像包含的若干像素点中获取到对应掩码区域的预测准确指数大于指数阈值(该指数阈值可以根据实际应用场景进行设置)的像素点,作为评估像素点,该评估像素点也就是指第二图像中对应掩码区域的预测准确指数大于指数阈值的像素点。该评估像素点的数量可以为至少一个。The computer device can obtain, from several pixels contained in the second image, pixels whose prediction accuracy index of the corresponding mask area is greater than an index threshold (the index threshold can be set according to the actual application scenario) as evaluation pixels, and the evaluation pixels Points refer to pixels in the second image whose prediction accuracy index of the corresponding mask area is greater than the index threshold. The number of evaluation pixels may be at least one.
其中,此处也以目标像素点为例对评估像素点进行说明,由于目标像素点对应的掩码区域的预测准确指数包括目标预测准确指数和背景预测准确指数,因此,目标像素点的掩码区域的预测准确指数大于指数阈值,可以是指目标像素点的掩码区域的目标预测准确指数以及背景预测准确指数都大于该指数阈值,即当目标像素点的掩码区域的目标预测准确指数大于指数阈值、且背景预测准确指数也大于指数阈值时,才可以将目标像素点作为一个评估像素点。通过此种原理,计算机设备可以获取到第二图像包含的像素点中可以作为评估像素点的若干像素点,即可得到上述至少一个评估像素点。Among them, the target pixel is also used as an example to illustrate the evaluation pixel. Since the prediction accuracy index of the mask area corresponding to the target pixel includes the target prediction accuracy index and the background prediction accuracy index, therefore, the mask of the target pixel The prediction accuracy index of the area is greater than the index threshold, which can mean that the target prediction accuracy index and the background prediction accuracy index of the mask area of the target pixel are greater than the index threshold, that is, when the target prediction accuracy index of the mask area of the target pixel is greater than When the index threshold is reached and the background prediction accuracy index is also greater than the index threshold, the target pixel can be used as an evaluation pixel. Through this principle, the computer device can obtain several pixels that can be used as evaluation pixels among the pixels contained in the second image, thereby obtaining at least one evaluation pixel.
因此,计算机设备可以根据至少一个评估像素点的像素特征以及该至少一个评估像素点的第三预测像素信息,生成目标对象的区域中心特征:Therefore, the computer device can generate the regional center feature of the target object based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point:
针对至少一个评估像素点中每个评估像素点,若该评估像素点的第三预测像素信息指示其属于第二特征区域(即第三预测像素信息中目标概率大于背景概率),则将该评估像素点作为第二特征区域的目标评估像素点(即目标对象的目标评估像素点),即该目标评估像素点也就是辅助神经网络所预测出的第二图像中属于目标对象、且对应掩码区域的预测准确指数大于指数阈值的像素点。For each evaluation pixel point in the at least one evaluation pixel point, if the third prediction pixel information of the evaluation pixel point indicates that it belongs to the second feature area (that is, the target probability in the third prediction pixel information is greater than the background probability), then the evaluation pixel point is The pixel is used as the target evaluation pixel of the second feature area (that is, the target evaluation pixel of the target object), that is, the target evaluation pixel is the target object in the second image predicted by the auxiliary neural network and corresponds to the mask. Pixels whose prediction accuracy index of the area is greater than the index threshold.
同理,针对至少一个评估像素点中的每个评估像素点,若该评估像素点的第三预测像素信息指示其不属于第二特征区域(即第三预测像素信息中目标概率小于背景概率),则将该评估像素点作为第二特征区域的背景评估像素点(即目标对象的背景评估像素点),即该背景评估像素点也就是辅助神经网络所预测出的第二图像中不属于目标对象、且对应掩码区域的预测准确指数大于指数阈值的像素点。Similarly, for each evaluation pixel point in at least one evaluation pixel point, if the third prediction pixel information of the evaluation pixel point indicates that it does not belong to the second feature area (that is, the target probability in the third prediction pixel information is less than the background probability) , then the evaluation pixel point is used as the background evaluation pixel point of the second feature area (that is, the background evaluation pixel point of the target object), that is, the background evaluation pixel point is the second image predicted by the auxiliary neural network and does not belong to the target. Objects and pixels whose prediction accuracy index of the corresponding mask area is greater than the index threshold.
因此,计算机设备可以根据目标评估像素点的像素特征和目标评估像素点的第三预测像 素信息中的目标概率(即属于目标对象的概率),生成第二特征区域的目标中心特征,该目标中心特征用于表示第二图像中目标对象的结构特征,即用于表示第二图像中第二特征区域的图像的结构特征。Therefore, the computer device can be based on the pixel characteristics of the target evaluation pixel and the third predicted image of the target evaluation pixel. The target probability in the voxel information (that is, the probability of belonging to the target object) is used to generate the target center feature of the second feature area. The target center feature is used to represent the structural features of the target object in the second image, that is, it is used to represent the structural features of the target object in the second image. Structural features of the image in the second feature area.
其中,可以将目标中心特征记为qobj,因此,如下述公式(2)所示,目标中心特征qobj可以为:
Among them, the target center feature can be recorded as q obj . Therefore, as shown in the following formula (2), the target center feature q obj can be:
需要进行说明的是,通常第二图像中各个像素点的标记信息(为0或者1,0表示不属于目标对象,1表示属于目标对象)是用同一个标签向量来表示的,而在辅助神经网络中所生成的第二图像中各个像素点的像素特征也可以是包含在同一个特征矩阵中的,该特征矩阵中一行就可以表示一个像素点的像素特征,因此,辅助神经网络在生成目标中心特征时,可以是基于对该特征矩阵以及该标签向量的运算来生成,而辅助神经网络所生成的各个像素点的像素特征的维度通常与上述标签向量的维度不同,因此,在通过目标评估像素点的像素特征计算目标中心特征时,可以对目标评估像素点的像素特征通过线性插值法(若像素点是三维的,则该线性插值法可以是三线性插值法)上采样,以将目标评估像素点的像素特征的维度提升到与上述标签向量的维度相同。It should be noted that usually the label information of each pixel in the second image (0 or 1, 0 means not belonging to the target object, 1 means belonging to the target object) is represented by the same label vector, and in the auxiliary neural The pixel features of each pixel in the second image generated by the network can also be included in the same feature matrix. One row in the feature matrix can represent the pixel features of a pixel. Therefore, the auxiliary neural network generates the target The central feature can be generated based on the operation of the feature matrix and the label vector. The dimensions of the pixel features of each pixel generated by the auxiliary neural network are usually different from the dimensions of the above-mentioned label vector. Therefore, after the target evaluation When calculating the target center feature from the pixel features of the pixels, the pixel features of the target evaluation pixels can be upsampled through a linear interpolation method (if the pixels are three-dimensional, the linear interpolation method can be a trilinear interpolation method) to convert the target The dimension of the pixel feature of the evaluated pixel is raised to the same dimension as the above label vector.
因此,A表示所有目标评估像素点的总数量,a小于或等于A,ea表示对辅助神经网络生成的第a个目标评估像素点的像素特征的维度提升到与上述标签向量的维度相同后所得到的像素特征(即第a个目标评估像素点升维后的像素特征),表示第a个目标评估像素点的第三预测像素信息中第a个目标评估像素点属于第二图像中的目标对象的概率(即目标概率),通过在获取目标中心特征时引入各个目标评估像素点属于目标对象的目标概率,可以体现各个目标评估像素点对于目标中心特征的不同贡献,即目标评估像素点的目标概率越大,目标评估像素点对于生成目标中心特征的贡献权重也越大。Therefore, A represents the total number of all target evaluation pixels, a is less than or equal to A, e a represents the dimension of the pixel feature of the a-th target evaluation pixel generated by the auxiliary neural network after it is raised to the same dimension as the above label vector The obtained pixel characteristics (that is, the pixel characteristics of the a-th target evaluation pixel after the dimension is increased), Indicates the probability that the a-th target evaluation pixel point in the third predicted pixel information of the a-th target evaluation pixel point belongs to the target object in the second image (i.e., target probability), by introducing each target evaluation pixel when obtaining the target center feature The target probability that a point belongs to the target object can reflect the different contributions of each target evaluation pixel to the target center feature. That is, the greater the target probability of the target evaluation pixel, the greater the contribution weight of the target evaluation pixel to generating the target center feature.
同理,计算机设备可以根据背景评估像素点的像素特征和背景评估像素点的第三预测像素信息中的背景概率(即不属于目标对象的概率),生成第二特征区域的背景中心特征,该背景中心特征用于表示第二图像中目标对象的背景图像的结构特征,即用于表示第二图像中除第二特征区域的图像之外的图像的结构特征。In the same way, the computer device can generate the background center feature of the second feature area based on the pixel characteristics of the background assessment pixel and the background probability in the third predicted pixel information of the background assessment pixel (that is, the probability of not belonging to the target object). The background center feature is used to represent the structural features of the background image of the target object in the second image, that is, it is used to represent the structural features of the image in the second image except the image of the second feature area.
其中,可以将背景中心特征记为qbg,因此,如下述公式(3)所示,目标中心特征qbg可以为:
Among them, the background center feature can be recorded as q bg . Therefore, as shown in the following formula (3), the target center feature q bg can be:
其中,B表示所有背景评估像素点的总数量,b小于或等于B,同理,eb表示对辅助神经网络生成的第b个背景评估像素点的像素特征的维度提升到与上述标签向量的维度相同后所得到的像素特征(即第b个背景评估像素点升维后的像素特征),表示第b个背景评估像素点的第三预测像素信息中第b个背景评估像素点属于第二图像中目标对象的背景图像的概率(即背景概率),通过在获取背景中心特征时引入各个背景评估像素点属于目标对象的背景图像的背景概率,可以体现各个背景评估像素点对于背景中心特征的不同贡献,即背景评估像素点的背景概率越大,背景评估像素点对于生成背景中心特征的贡献权重也越大。Among them, B represents the total number of all background evaluation pixels, and b is less than or equal to B. Similarly, e b represents the dimension of the pixel feature of the bth background evaluation pixel generated by the auxiliary neural network to be increased to the same as the above label vector. The pixel features obtained after the dimensions are the same (that is, the pixel features of the b-th background evaluation pixel after the dimension is increased), Indicates the probability that the b-th background evaluation pixel in the third predicted pixel information of the b-th background evaluation pixel belongs to the background image of the target object in the second image (i.e., background probability), by introducing each background when obtaining the background center feature The background probability that the assessment pixel belongs to the background image of the target object can reflect the different contributions of each background assessment pixel to the background center feature. That is, the greater the background probability of the background assessment pixel, the contribution of the background assessment pixel to generating the background center feature. The weight is also greater.
因此,计算机设备可以将上述目标中心特征(可以理解为是第二图像中目标对象的目标中心特征)和背景中心特征(可以理解为是第二图像中目标对象的背景中心特征)作为第二特征区域的区域中心特征,该区域中心特征可以理解为是目标对象的对象中心特征。Therefore, the computer device can use the above-mentioned target center feature (which can be understood as the target center feature of the target object in the second image) and the background center feature (which can be understood as the background center feature of the target object in the second image) as the second feature. The region center feature of the region can be understood as the object center feature of the target object.
步骤S202,基于区域中心特征、第二图像中各像素点的像素特征及第二监督数据,确定分类预测结果。Step S202: Determine the classification prediction result based on the regional center feature, the pixel feature of each pixel in the second image, and the second supervision data.
在一些实施例中,计算机设备可以根据上述所生成的目标对象的区域中心特征、第二图像中每个像素点的像素特征以及第二图像中每个像素点的标记信息(即第二监督数据中各个像素点的标记信息),得到第二图像中每个像素点的分类结果,第二图像中任一个像素点的分类结果可以是该像素点属于第一分类的像素点的结果(即该像素点的标记信息是正确标记的)或者是该像素点属于第二分类的像素点的结果(即该像素点的标记信息是错误标记的)。In some embodiments, the computer device can use the generated regional center features of the target object, the pixel features of each pixel in the second image, and the label information of each pixel in the second image (i.e., the second supervisory data). the label information of each pixel in the second image) to obtain the classification result of each pixel in the second image. The classification result of any pixel in the second image can be the result of the pixel belonging to the first category (that is, the pixel The labeling information of the pixel is correctly labeled) or the pixel belongs to the second category of pixels (that is, the labeling information of the pixel is incorrectly labeled).
由于判定第二图像中每个像素点的分类结果的过程相同,因此,此处还是以判定目标像素点(第二图像中任一个像素点)的分类结果为例进行说明。Since the process of determining the classification result of each pixel in the second image is the same, the description here is still based on determining the classification result of the target pixel (any pixel in the second image) as an example.
计算机设备可以获取到目标像素点的像素特征与目标中心特征之间的特征距离,可以将该特征距离称之为是第一特征距离;计算机设备还可以获取到目标像素点的像素特征与背景中心特征之间的特征距离,可以将该特征距离称之为是第二特征距离。The computer device can obtain the characteristic distance between the pixel feature of the target pixel point and the target center feature, which can be called the first feature distance; the computer device can also obtain the pixel feature of the target pixel point and the background center The characteristic distance between features can be called the second characteristic distance.
其中,可以将上述第一特征距离记为将上述第二特征距离记为因此如下述公式所示,第一特征距离和第二特征距离可以是:

Among them, the above first characteristic distance can be recorded as The above second characteristic distance is recorded as Therefore, as shown in the following formula, the first characteristic distance and the second feature distance Can be:

其中,与上述同理,em也可以表示对辅助神经网络生成的目标像素点的像素特征的维度提升到与上述标签向量的维度相同后所得到的像素特征(即目标像素点升维后的像素特征)。qobj表示上述目标中心特征,qbg表示上述背景中心特征。‖·‖2表示二范数。Among them, in the same way as above, em can also represent the pixel features obtained after the dimension of the pixel feature of the target pixel generated by the auxiliary neural network is raised to the same dimension as the above-mentioned label vector (that is, the target pixel after the dimension is raised) pixel features). q obj represents the above-mentioned target center feature, and q bg represents the above-mentioned background center feature. ‖·‖ 2 represents the second norm.
因此,若第一特征距离大于第二特征距离(表明目标像素点更偏向于属于第二图像中目 标对象的背景图像)、且第二监督数据中目标像素点的标记信息用于指示目标像素点是属于第二图像中的目标对象(即属于第二特征区域中的像素点),则可以确定目标像素点的分类结果是用于指示目标像素点是属于第二分类。Therefore, if the first feature distance is greater than the second feature distance (indicating that the target pixel is more likely to belong to the target pixel in the second image background image of the target object), and the label information of the target pixel point in the second supervision data is used to indicate that the target pixel point belongs to the target object in the second image (that is, a pixel point belonging to the second feature area), then it can be determined The classification result of the target pixel is used to indicate that the target pixel belongs to the second category.
若第一特征距离大于第二特征距离(表明目标像素点更偏向于属于第二图像中目标对象的背景图像)、且第二监督数据中目标像素点的标记信息用于指示目标像素点不属于第二图像中的目标对象(即不属于第二特征区域中的像素点,也即属于第二图像中目标对象的背景图像),则可以确定目标像素点的分类结果是用于指示目标像素点是属于第一分类。If the first feature distance is greater than the second feature distance (indicating that the target pixel is more inclined to belong to the background image of the target object in the second image), and the label information of the target pixel in the second supervision data is used to indicate that the target pixel does not belong to The target object in the second image (that is, the pixel point that does not belong to the second feature area, that is, the background image of the target object in the second image), then it can be determined that the classification result of the target pixel point is used to indicate the target pixel point It belongs to the first category.
若第一特征距离小于第二特征距离(表明目标像素点更偏向于属于第二图像中目标对象)、且第二监督数据中目标像素点的标记信息用于指示目标像素点属于第二图像中的目标对象,则可以确定目标像素点的分类结果是用于指示目标像素点是属于第一分类。If the first feature distance is smaller than the second feature distance (indicating that the target pixel is more likely to belong to the target object in the second image), and the label information of the target pixel in the second supervision data is used to indicate that the target pixel belongs to the second image of the target object, it can be determined that the classification result of the target pixel is used to indicate that the target pixel belongs to the first category.
若第一特征距离小于第二特征距离(表明目标像素点更偏向于属于第二图像中目标对象)、且第二监督数据中目标像素点的标记信息用于指示目标像素点不属于第二图像中的目标对象(即属于第二图像中目标对象的背景图像),则可以确定目标像素点的分类结果是用于指示目标像素点是属于第二分类。If the first feature distance is smaller than the second feature distance (indicating that the target pixel is more likely to belong to the target object in the second image), and the label information of the target pixel in the second supervision data is used to indicate that the target pixel does not belong to the second image The target object in the second image (that is, the background image belonging to the target object in the second image), then it can be determined that the classification result of the target pixel point is used to indicate that the target pixel point belongs to the second classification.
综上,可以理解的是,若目标像素点的像素特征偏向于的特征类型(如目标对象的特征类型或目标对象的背景图像的特征类型)与目标像素点的标记信息所指示的特征类型(如目标对象的特征类型或目标对象的背景图像的特征类型)不一致(如一个是目标对象的特征类型,一个是目标对象的背景图像的特征类型),则可以认为目标像素点的标记信息是错误标记的,即目标像素点的分类结果用于指示目标像素点属于第二图像中第二分类;反之,若目标像素点的像素特征偏向于的特征类型与目标像素点的标记信息所指示的特征类型一致(如两个都是目标对象的特征类型,或者两个都是目标对象的背景图像的特征类型),则可以认为目标像素点的标记信息是正确标记的,即目标像素点的分类结果用于指示目标像素点属于第二图像中第一分类。In summary, it can be understood that if the pixel feature of the target pixel points is biased towards the feature type (such as the feature type of the target object or the feature type of the background image of the target object) and the feature type indicated by the label information of the target pixel point (such as If the feature type of the target object or the feature type of the background image of the target object) are inconsistent (for example, one is the feature type of the target object and the other is the feature type of the background image of the target object), it can be considered that the marking information of the target pixel is wrong. Marked, that is, the classification result of the target pixel is used to indicate that the target pixel belongs to the second category in the second image; conversely, if the pixel features of the target pixel tend to be of a feature type that is consistent with the features indicated by the marking information of the target pixel The types are consistent (for example, both are feature types of the target object, or both are feature types of the background image of the target object), then it can be considered that the marking information of the target pixel is correctly marked, that is, the classification result of the target pixel Used to indicate that the target pixel belongs to the first category in the second image.
通过上述第二图像中各个像素点的分类结果即可得到针对第二图像的分类预测结果,该分类预测结果就包括第二图像中各个像素点的分类结果。The classification prediction result for the second image can be obtained through the classification result of each pixel point in the second image, and the classification prediction result includes the classification result of each pixel point in the second image.
请参见图6,图6是本申请实施例提供的一种确定分类结果的场景示意图。如图6所示,第二图像中的像素点可以包括像素点1~像素点W,W为正整数,W的具体取值根据实际应用场景确定。计算机设备可以获取到第二图像中各个像素点的像素特征与目标中心特征之间的特征距离,包括像素点1的像素特征与目标中心特征之间的特征距离(即第一特征距离1)、像素点2的像素特征与目标中心特征之间的特征距离(即第一特征距离2)、像素点3的像素特征与目标中心特征之间的特征距离(即第一特征距离3)、…及像素点W的像素特征与目 标中心特征之间的特征距离(即第一特征距离W),并可以获取到各个像素点与背景中心特征之间的特征距离,包括像素点1的像素特征与背景中心特征之间的特征距离(即第二特征距离1)、像素点2的像素特征与背景中心特征之间的特征距离(即第二特征距离2)、像素点3的像素特征与背景中心特征之间的特征距离(即第二特征距离3)、…及像素点W的像素特征与背景中心特征之间的特征距离(即第二特征距离W)。Please refer to FIG. 6 , which is a schematic diagram of a scenario for determining a classification result provided by an embodiment of the present application. As shown in Figure 6, the pixel points in the second image may include pixel point 1 to pixel point W, where W is a positive integer, and the specific value of W is determined according to the actual application scenario. The computer device can obtain the feature distance between the pixel feature of each pixel point in the second image and the target center feature, including the feature distance between the pixel feature of pixel point 1 and the target center feature (i.e., the first feature distance 1), The characteristic distance between the pixel feature of pixel point 2 and the target center feature (i.e., the first feature distance 2), the feature distance between the pixel feature of pixel point 3 and the target center feature (i.e., the first feature distance 3), ... and Pixel characteristics and purpose of pixel point W The characteristic distance between the mark center features (i.e. the first feature distance W), and the feature distance between each pixel point and the background center feature can be obtained, including the feature distance between the pixel feature of pixel 1 and the background center feature (i.e., the second feature distance 1), the feature distance between the pixel feature of pixel point 2 and the background center feature (i.e., the second feature distance 2), the feature distance between the pixel feature of pixel point 3 and the background center feature (i.e. The second feature distance 3), ... and the feature distance between the pixel feature of the pixel point W and the background center feature (ie, the second feature distance W).
因此,计算机设备可以根据像素点1的第一特征距离1、标记信息和第二特征距离1得到像素点1的分类结果,并可以根据像素点2的第一特征距离2、标记信息和第二特征距离2得到像素点2的分类结果,并可以根据像素点3的第一特征距离3、标记信息和第二特征距离3得到像素点3的分类结果,…,并可以根据像素点W的第一特征距离W、标记信息和第二特征距离W得到像素点W的分类结果。Therefore, the computer device can obtain the classification result of the pixel point 1 according to the first characteristic distance 1, the label information and the second characteristic distance 1 of the pixel point 1, and can obtain the classification result of the pixel point 1 according to the first characteristic distance 2, the label information and the second characteristic distance 1 of the pixel point 2. The feature distance 2 obtains the classification result of pixel 2, and the classification result of pixel 3 can be obtained based on the first feature distance 3, label information and second feature distance 3 of pixel 3,..., and can be based on the first feature distance 3 of pixel W. The first feature distance W, the label information and the second feature distance W obtain the classification result of the pixel point W.
通过上述过程,对于分割区域中各个局部区域(如属于目标对象所在图像区域中各个像素点的区域)之间的结构特征高度相关(如高度相似)和噪声容限较高的场景,本申请通过区域中心特征可以实现对第二图像中各个像素点的分类结果的准确判断。Through the above process, for scenes where the structural features between various local areas in the segmentation area (such as areas belonging to each pixel point in the image area where the target object is located) are highly correlated (such as highly similar) and the noise tolerance is high, this application passes The regional center feature can accurately determine the classification results of each pixel in the second image.
本申请实施例重点通过均值教师模型(即上述辅助神经网络)的辅助,利用特征原型(可以通过区域中心特征体现)对噪声标注训练更鲁棒的特性来进行标注分离。本申请实施例的模型框架中可以使用V-Net(一种图像分割网络)、U-Net(一种语义分割网络)、DenseNet(一种稠密连接网络)、或ResNet(一种残差网络)等网络进行训练和预测。The embodiments of this application focus on using the feature prototype (which can be reflected by the regional center feature) to train more robust characteristics of noise labeling to perform label separation with the assistance of the mean teacher model (that is, the above-mentioned auxiliary neural network). V-Net (an image segmentation network), U-Net (a semantic segmentation network), DenseNet (a dense connection network), or ResNet (a residual network) can be used in the model framework of the embodiment of the present application. Wait for the network to be trained and predicted.
请参见图7,图7是本申请实施例提供的一种确定预测偏差的流程示意图。本申请实施例中的执行主体可以与上述图3中的执行主体相同,如图7所示,该方法可以包括:Please refer to FIG. 7 , which is a schematic flowchart of determining a prediction deviation provided by an embodiment of the present application. The execution subject in the embodiment of the present application may be the same as the execution subject in Figure 3 above. As shown in Figure 7, the method may include:
步骤S301,根据第一预测结果与第一图像的第一监督数据生成预测神经网络的第一预测偏差。Step S301: Generate a first prediction deviation of the prediction neural network based on the first prediction result and the first supervision data of the first image.
在一些实施例中,计算机设备可以根据第一预测结果中第一图像的每个像素点的第一预测像素信息和第一监督数据中每个像素点的标记信息,生成预测神经网络针对第一图像的交叉熵损失以及图像分割损失(Dice损失),进而通过该交叉熵损失以及图像分割损失得到预测神经网络针对第一图像的预测损失,可以将该预测损失称为是第一预测偏差。In some embodiments, the computer device may generate a prediction neural network for the first image based on the first predicted pixel information of each pixel point of the first image in the first prediction result and the label information of each pixel point in the first supervision data. The cross entropy loss and the image segmentation loss (Dice loss) of the image are used to obtain the prediction loss of the prediction neural network for the first image through the cross entropy loss and the image segmentation loss. This prediction loss can be called the first prediction deviation.
可以将预测神经网络针对第一图像的交叉熵损失记为Ls1,如下述公式(6)所示,该交叉熵损失Ls1为:
The cross-entropy loss of the prediction neural network for the first image can be recorded as L s1 , as shown in the following formula (6), the cross-entropy loss L s1 is:
其中,ytrue.i表示第一图像中第i个像素点的标记信息,即ytrue.i表示第i个像素点真实的标签,i小于等于N,N可以是第一图像中所有像素点的总数量。若第i个像素点的标记信息用 于真实表示第i个像素点属于第一图像中的目标对象,则ytrue.i可以等于1,否则,即若第i个像素点的标记信息用于真实表示第i个像素点不属于第一图像中的目标对象,则ytrue.i可以等于0。ypred.i表示预测神经网络预测的第i个像素点的第一预测像素信息中第i个像素点属于目标对象的概率(即目标概率)。Among them, y true.i represents the label information of the i-th pixel in the first image, that is, y true.i represents the true label of the i-th pixel, i is less than or equal to N, and N can be all pixels in the first image. total quantity. If the label information of the i-th pixel is To truly represent that the i-th pixel belongs to the target object in the first image, then y true.i can be equal to 1, otherwise, that is, if the label information of the i-th pixel is used to truly represent that the i-th pixel does not belong to the For a target object in an image, y true.i can be equal to 0. y pred.i represents the probability that the i-th pixel in the first predicted pixel information of the i-th pixel predicted by the prediction neural network belongs to the target object (i.e., target probability).
可以将预测神经网络针对第一图像的图像分割损失记为LDice1,如下述公式(7)所示,该图像分割损失LDice1为:
The image segmentation loss of the prediction neural network for the first image can be recorded as L Dice1 , as shown in the following formula (7), the image segmentation loss L Dice1 is:
与上述相同,此处ytrue.i表示第一图像中第i个像素点的标记信息,ytrue.i为1或0,ypred.i表示第i个像素点的第一预测像素信息中第i个像素点属于目标对象的概率(即目标概率)。The same as above, here y true.i represents the label information of the i-th pixel in the first image, y true.i is 1 or 0, and y pred.i represents the first predicted pixel information of the i-th pixel. The probability that the i-th pixel belongs to the target object (i.e., target probability).
因此,上述预测神经网络的第一预测偏差可以记为LHQ,如下述公式(8)所示,第一预测偏差LHQ就为交叉熵损失Ls1和图像分割损失LDice1之和:
LHQ=Ls1+LDice1          (8)
Therefore, the first prediction deviation of the above prediction neural network can be recorded as L HQ . As shown in the following formula (8), the first prediction deviation L HQ is the sum of the cross entropy loss L s1 and the image segmentation loss L Dice1 :
L HQ =L s1 +L Dice1 (8)
其中,由于第一图像中的像素点具有准确的标记信息,因此通过预测神经网络针对第一图像的第一预测偏差LHQ可以对预测神经网络起到正向监督训练的作用。Among them, since the pixels in the first image have accurate label information, the first prediction deviation L HQ of the prediction neural network for the first image can play a role in forward supervision training of the prediction neural network.
步骤S302,根据第二预测图像中属于第一分类的像素点的第二预测像素信息与第二图像的第二监督数据中,生成预测神经网络的第二预测偏差。Step S302: Generate a second prediction deviation of the prediction neural network based on the second prediction pixel information of the pixels belonging to the first category in the second prediction image and the second supervision data of the second image.
在一些实施例中,可以将第二监督数据中第一分类的像素点的标记信息称之为是第一标记信息,即第一标记信息就包括:预先设置的第二监督数据中用于标记第二图像上属于第一分类的像素点是否属于第二特征区域的标记信息。计算机设备可以根据第二图像中属于第一分类的像素点的第二预测像素信息和该第一标记信息生成预测神经网络针对第一分类的像素点的预测损失,可以将该预测损失称之为是第二预测偏差。In some embodiments, the marking information of the first classified pixels in the second supervision data can be called the first marking information, that is, the first marking information includes: the preset second supervision data for marking Marking information of whether the pixels belonging to the first category on the second image belong to the second feature area. The computer device may generate a prediction loss of the prediction neural network for the pixels of the first classification based on the second predicted pixel information of the pixels belonging to the first classification in the second image and the first label information, and the prediction loss may be referred to as is the second prediction bias.
具体的,计算机设备可以根据第二图像中每个第一分类的像素点的第二预测像素信息和每个第一分类的像素点的标记信息生成预测神经网络针对第一分类的像素点的交叉熵损失以及图像分割损失(Dice损失),进而通过该交叉熵损失以及图像分割损失得到预测神经网络针对第一分类的像素点的第二预测偏差。Specifically, the computer device may generate the intersection of the prediction neural network for the pixels of the first category based on the second predicted pixel information of each pixel of the first category in the second image and the label information of each pixel of the first category. Entropy loss and image segmentation loss (Dice loss), and then through the cross entropy loss and image segmentation loss, the second prediction deviation of the prediction neural network for the pixels of the first classification is obtained.
可以将预测神经网络针对第一分类的像素点的交叉熵损失记为Ls2,如下述公式(9)所示,该交叉熵损失Ls2为:
The cross-entropy loss of the prediction neural network for the first classification pixels can be recorded as L s2 , as shown in the following formula (9), the cross-entropy loss L s2 is:
其中,ytrue.j表示第一分类的像素点中第j个像素点的标记信息,即ytrue.j表示第j个像素 点真实的标签,j小于等于M,M可以是属于第一分类的所有像素点的总数量。若第j个像素点的标记信息用于真实表示第j个像素点属于第二图像中的目标对象,则ytrue.j可以等于1,否则,即若第j个像素点的标记信息用于真实表示第j个像素点不属于第二图像中的目标对象,则ytrue.j可以等于0。ypred.j表示第j个像素点的第二预测像素信息中第j个像素点属于目标对象的概率(即目标概率)。Among them, y true.j represents the label information of the j-th pixel in the first category of pixels, that is, y true.j represents the j-th pixel. The true label of the point, j is less than or equal to M, and M can be the total number of all pixels belonging to the first category. If the label information of the j-th pixel is used to truly represent that the j-th pixel belongs to the target object in the second image, then y true.j can be equal to 1, otherwise, that is, if the label information of the j-th pixel is used True means that the j-th pixel does not belong to the target object in the second image, then y true.j can be equal to 0. y pred.j represents the probability that the j-th pixel in the second predicted pixel information of the j-th pixel belongs to the target object (ie, target probability).
可以将预测神经网络针对第一分类的像素点的图像分割损失记为LDice2,如下述公式(10)所示,该图像分割损失LDice2为:
The image segmentation loss of the predictive neural network for the first category pixels can be recorded as L Dice2 . As shown in the following formula (10), the image segmentation loss L Dice2 is:
与上述相同,此处ytrue.j表示第j个像素点的标记信息,ytrue.j为1或0,ypred.j表示第j个像素点的第二预测像素信息中第j个像素点属于目标对象的概率(即目标概率)。The same as above, here y true.j represents the label information of the j-th pixel, y true.j is 1 or 0, and y pred.j represents the j-th pixel in the second predicted pixel information of the j-th pixel. The probability that a point belongs to the target object (i.e., target probability).
因此,上述预测神经网络的第二预测偏差可以记为Lls,如下述公式(11)所示,第二预测偏差Lls就为交叉熵损失Ls2和图像分割损失LDice2之和:
Lls=Ls2+LDice2           (11)
Therefore, the second prediction deviation of the above prediction neural network can be recorded as L ls , as shown in the following formula (11). The second prediction deviation L ls is the sum of the cross entropy loss L s2 and the image segmentation loss L Dice2 :
L ls =L s2 +L Dice2 (11)
上述得到的第二预测偏差Lls就为预测神经网络针对正确标记的像素点(即第一分类的像素点)的预测损失。The second prediction deviation L ls obtained above is the prediction loss of the prediction neural network for correctly marked pixels (that is, pixels of the first category).
步骤S303,根据第二图像中属于第二分类的像素点的第二预测像素信息与第二图像的第二监督数据,生成预测神经网络的第三预测偏差。Step S303: Generate a third prediction deviation of the prediction neural network based on the second prediction pixel information of the pixels belonging to the second category in the second image and the second supervision data of the second image.
在一些实施例中,可以将第二监督数据中第二分类的像素点的标记信息称之为是第二标记信息,即第二标记信息就包括:预先设置的第二监督数据中用于标记第二图像上属于第二分类的像素点是否属于第二特征区域的标记信息。计算机设备可以根据第二图像中属于第二分类的像素点的第二预测像素信息和该第二标记信息生成预测神经网络针对第二分类的像素点的预测损失,可以将该预测损失称之为是第三预测偏差。In some embodiments, the labeling information of the second classification of pixels in the second supervision data can be called the second labeling information, that is, the second labeling information includes: the preset second supervision data for labeling. Marking information of whether the pixels belonging to the second category on the second image belong to the second feature area. The computer device may generate a prediction loss of the prediction neural network for the pixels of the second classification based on the second predicted pixel information and the second label information of the pixels belonging to the second classification in the second image, and the prediction loss may be referred to as is the third prediction bias.
本申请实施例中,由于第二分类的像素点是预测出的错误标记的像素点,因此,可以以熵最小化损失来得到预测神经网络针对第二分类的像素点的第三预测偏差,即可以以具有较小影响(较小的熵)的第三预测偏差来发挥第二分类的像素点对于预测神经网络的训练作用。In the embodiment of the present application, since the pixels of the second classification are predicted mislabeled pixels, the third prediction deviation of the prediction neural network for the pixels of the second classification can be obtained by entropy minimization loss, that is, The training effect of the pixel points of the second classification on the prediction neural network can be used as a third prediction deviation with smaller influence (smaller entropy).
可以将第三预测偏差记为Lent,如下述公式(12),第三预测偏差Lent可以为:
The third prediction deviation can be recorded as L ent , as shown in the following formula (12). The third prediction deviation L ent can be:
其中,Fobj.g表示第二分类的像素点中第g个像素点的第二预测像素信息中该第g个像素点属于第二图像中的目标对象的概率(即目标概率),Fbg.g表示表示第g个像素点的第二预测 像素信息中该第g个像素点不属于第二图像中的目标对象的概率(即背景概率),G为第二分类的所有像素点的总数量,g小于或等于G。Among them, F obj.g represents the probability that the g-th pixel in the second predicted pixel information of the second classification pixel belongs to the target object in the second image (i.e., target probability), F bg .g represents the second prediction of the g-th pixel The probability that the g-th pixel in the pixel information does not belong to the target object in the second image (ie, background probability), G is the total number of all pixels in the second category, and g is less than or equal to G.
通过上述可知,计算第二预测偏差和计算第三预测偏差的方式不同,从而实现对第二图像中第一分类的像素点和第二分类的像素点进行区别训练的目的。From the above, it can be seen that the methods of calculating the second prediction deviation and the third prediction deviation are different, thereby achieving the purpose of differentially training the pixels of the first category and the pixels of the second category in the second image.
步骤S304,根据第一预测偏差、第二预测偏差和第三预测偏差生成预测神经网络的预测偏差。Step S304: Generate the prediction deviation of the prediction neural network based on the first prediction deviation, the second prediction deviation and the third prediction deviation.
在一些实施例中,计算机设备可以根据上述获取到第一预测偏差、第二预测偏差和第三预测偏差,生成预测神经网络最终的预测损失,该预测损失也就是预测神经网络的预测偏差(即预测神经网络最终的预测偏差)。该预测偏差是指预测神经网络对像素点(包括第一图像的像素点和第二图像的像素点)的预测像素信息的偏差。In some embodiments, the computer device can generate the final prediction loss of the prediction neural network based on the first prediction deviation, the second prediction deviation and the third prediction deviation obtained above, and the prediction loss is also the prediction deviation of the prediction neural network (i.e. Predict the final prediction bias of the neural network). The prediction deviation refers to the deviation of the prediction neural network's predicted pixel information for pixels (including pixels of the first image and pixels of the second image).
其中,计算机设备可以根据上述获取到的预测神经网络针对第一分类的像素点的第二预测偏差以及预测神经网络针对第二分类的像素点的第三预测偏差,得到预测神经网络针对第二图像最终的预测损失,可以将该预测损失称之为是预测神经网络针对第二图像的综合预测偏差。Wherein, the computer device can obtain the prediction neural network for the second image based on the second prediction deviation of the prediction neural network for the pixels of the first category and the third prediction deviation of the prediction neural network for the pixels of the second category obtained above. The final prediction loss can be called the comprehensive prediction deviation of the prediction neural network for the second image.
可以将综合预测偏差记为LLQ,如下述公式(13)所示,综合预测偏差LLQ就可以为第二预测偏差Lls和第三预测偏差Lent之和:
LLQ=Lls+Lent          (13)
The comprehensive prediction deviation can be recorded as L LQ . As shown in the following formula (13), the comprehensive prediction deviation L LQ can be the sum of the second prediction deviation L ls and the third prediction deviation L ent :
L LQ =L ls +L ent (13)
进而,计算机设备还获取到针对综合预测偏差的加权系数,进而可以根据该加权系数对综合预测偏差进行加权,得到加权后的综合预测偏差,进而计算机设备就可以根据第一预测偏差和加权后的综合预测偏差生成预测神经网络最终的预测偏差(即最终的预测损失值)。Furthermore, the computer device also obtains the weighting coefficient for the comprehensive prediction deviation, and then can weight the comprehensive prediction deviation according to the weighting coefficient to obtain the weighted comprehensive prediction deviation, and then the computer device can calculate the weighted coefficient according to the first prediction deviation and the weighted comprehensive prediction deviation. The comprehensive prediction deviation generates the final prediction deviation of the prediction neural network (that is, the final prediction loss value).
其中,综合预测偏差的加权系数可以由随训练时间(次数)增加而斜升高斯函数组成,由于可以对预测神经网络进行多次迭代训练,因此,可以将在对预测神经网络进行第t次迭代训练的过程中综合预测偏差的加权系数记为λ(t),如下述公式(14)所示,加权系数λ(t)可以为:
Among them, the weighted coefficient of the comprehensive prediction deviation can be composed of a Gaussian function that slopes as the training time (number of times) increases. Since the prediction neural network can be trained for multiple iterations, the t-th iteration of the prediction neural network can be During the training process, the weighting coefficient of the comprehensive prediction deviation is recorded as λ(t). As shown in the following formula (14), the weighting coefficient λ(t) can be:
其中,tmax表示预设的对预测神经网络进行迭代训练的最大次数,可以将该最大次数称之为最大迭代次数,e表示自然常数。Among them, t max represents the preset maximum number of iterative training times for the prediction neural network, which can be called the maximum number of iterations, and e represents a natural constant.
举个例子,本申请实施例中获取预测神经网络的当前迭代训练过程中针对综合预测偏差的加权系数的方式可以是:计算机可以获取到当前对预测神经网络的网络参数进行迭代修正(即当前迭代训练)的迭代次数(可以称为当前迭代次数),并可以获取预设的对预测神经网 络的网络参数进行迭代修正的最大迭代次数,进而,计算机设备就可以将该当前迭代次数代入上述公式(14)中的t,并将最大迭代次数代入上述公式(14)中的tmax,即可得到当前迭代训练过程中针对综合预测偏差的加权系数。For example, in the embodiment of the present application, the method of obtaining the weighting coefficient for the comprehensive prediction deviation in the current iterative training process of the predictive neural network can be: the computer can obtain the current iterative correction of the network parameters of the predictive neural network (i.e., the current iteration training) (can be called the current iteration number), and can obtain the preset prediction neural network The maximum number of iterations for iteratively correcting the network parameters of the network, and then the computer device can substitute the current number of iterations into t in the above formula (14), and substitute the maximum number of iterations into t max in the above formula (14), that is The weighting coefficient for the comprehensive prediction deviation in the current iterative training process can be obtained.
因此,可以将预测神经网络的预测损失(即预测偏差)记为Lz,如下述公式(15)所示,预测偏差Lz可以为:
Lz=LHQ+λ(t)LLQ          (15)
Therefore, the prediction loss (i.e., prediction deviation) of the prediction neural network can be recorded as L z . As shown in the following formula (15), the prediction deviation L z can be:
L z =L HQ +λ(t)L LQ (15)
其中,可以理解的是,预测神经网络的每次迭代训练过程中都会有对应的预测损失,若当前是第t次迭代训练,则公式(14)中的LHQ就是第t次迭代训练过程中的第一预测偏差,LLQ就是第t次迭代训练过程中的综合预测偏差,所得到的Lz也就是第t次迭代训练过程中的预测损失。可以通过每次训练过程中所得到的预测偏差Lz对预测神经网络的网络参数进行迭代优化,以得到最终参数优化完成的预测神经网络(即训练好的预测神经网络)。Among them, it can be understood that there will be a corresponding prediction loss in each iterative training process of the predictive neural network. If the current iteration is the t-th iterative training, then L HQ in formula (14) is the t-th iterative training process. The first prediction deviation, L LQ is the comprehensive prediction deviation in the t-th iterative training process, and the obtained L z is the prediction loss in the t-th iterative training process. The network parameters of the predictive neural network can be iteratively optimized through the prediction deviation L z obtained during each training process to obtain the predictive neural network with final parameter optimization completed (ie, the trained predictive neural network).
通过上述过程,可以理解的是,迭代训练次数t越小,综合预测偏差的加权系数也会越小,反之,迭代训练次数t越大,综合预测偏差的加权系数也会越大。这是为了在最开始(如迭代训练次数t比较小的时候)对预测神经网络进行训练时,可以减少第二图像对于预测神经网络的训练干扰,而随着迭代训练次数t越大,预测神经网络也就越来越准确,因此,就可以有更大的加权系数,以加大第二图像对于预测神经网络的训练效果,这可以提升预测神经网络的训练准确性。Through the above process, it can be understood that the smaller the number of iterative training times t, the smaller the weighted coefficient of the comprehensive prediction deviation will be. On the contrary, the larger the number of iterative training times t, the larger the weighted coefficient of the comprehensive prediction deviation will be. This is to reduce the training interference of the second image on the predictive neural network when training the predictive neural network at the beginning (for example, when the number of iterative training times t is relatively small). As the number of iterative training times t increases, the predictive neural network The network becomes more and more accurate, so it can have a larger weighting coefficient to increase the training effect of the second image on the predictive neural network, which can improve the training accuracy of the predictive neural network.
请参见图8,图8是本申请实施例提供的一种模型训练的场景示意图。如图8所示,计算机设备可以通过预测神经网络基于第一图像生成第一预测偏差。计算机设备还可以通过辅助神经网络对第二图像中的像素点进行标注分离,即将第二图像中的像素点区分为第一分类的像素点和第二分类的像素点。Please refer to Figure 8, which is a schematic diagram of a model training scenario provided by an embodiment of the present application. As shown in FIG. 8, the computer device may generate a first prediction deviation based on the first image through a prediction neural network. The computer device may also use an auxiliary neural network to label and separate the pixels in the second image, that is, to distinguish the pixels in the second image into pixels of the first category and pixels of the second category.
进而,计算机设备可以通过预测神经网络基于第一分类的像素点生成第二预测偏差,并基于第二分类的像素点生成第三预测偏差。计算机设备可以根据第二预测偏差和第三预测偏差生成针对第二图像的综合预测偏差,并可以根据加权系数对该综合预测偏差进行加权,得到加权后的综合预测偏差。Furthermore, the computer device can generate a second prediction deviation based on the pixel points of the first classification through a prediction neural network, and generate a third prediction deviation based on the pixel points of the second classification. The computer device can generate a comprehensive prediction deviation for the second image based on the second prediction deviation and the third prediction deviation, and can weight the comprehensive prediction deviation according to the weighting coefficient to obtain a weighted comprehensive prediction deviation.
最后,计算机设备即可根据上述第一预测偏差和加权后的综合预测偏差得到预测神经网络最终的预测损失(即上述预测偏差或者预测偏差),预测神经网络即可根据该预测偏差进行网络参数的修正,得到训练好的预测神经网络(即进行参数优化后的预测神经网络)。Finally, the computer device can obtain the final prediction loss of the prediction neural network (i.e., the above-mentioned prediction deviation or prediction deviation) based on the above-mentioned first prediction deviation and the weighted comprehensive prediction deviation, and the prediction neural network can perform network parameter optimization based on the prediction deviation. Modify and obtain the trained predictive neural network (that is, the predictive neural network after parameter optimization).
本申请实施例中,可以使用具有高质量的标记信息的第一图像对预测神经网络进行监督训练,而对于具有低质量的标记信息的第二图像中,不管是预测出的正确标记的像素点(即第一分类的像素点)还是预测出的错误标记的像素点(即第二分类的像素点)都可以参与对 预测神经网络的训练,只是正确标记的像素点可以对预测神经网络产生较大的训练效果,而错误标记的像素点可以对预测神经网络产生较小的训练效果,实现充分利用第二图像对预测神经网络进行训练,因此,可以训练出非常准确的预测神经网络。In the embodiment of the present application, the first image with high-quality label information can be used to perform supervised training of the prediction neural network, and for the second image with low-quality label information, regardless of the predicted correctly labeled pixels (i.e., pixels of the first category) or predicted mislabeled pixels (i.e., pixels of the second category) can participate in the comparison. For the training of predictive neural network, only correctly labeled pixels can have a greater training effect on the predictive neural network, while incorrectly labeled pixels can have a smaller training effect on the predictive neural network, making full use of the second image for prediction. Neural networks are trained, and therefore, very accurate predictive neural networks can be trained.
并且,采用本申请实施例可以对混合质量的样本数据(包括第一图像和第二图像)进行区别学习,即实现了对混合质量的样本数据进行标记隔离学习,以充分学习到样本数据正确的特征,进而训练得到准确的预测神经网络。Moreover, the embodiments of the present application can be used to perform differential learning on mixed-quality sample data (including the first image and the second image), that is, label-isolated learning of mixed-quality sample data can be implemented to fully learn the correct identity of the sample data. Features, and then train an accurate prediction neural network.
此外,本申请实施例还针对所提供的方法进行了精密的实验。实验使用了左心房(LA)分割数据集来进行实验。左心房分割数据集提供了100个带有专家标签(可以理解为第一图像中像素点的标记信息)的3D磁共振图像(可以理解为三维的第一图像)。图像的分辨率可以是0.625*0.625*0.625m^3。所有图像都被裁剪到心脏区域的中心,并归一化为零均值和单位方差,为了模拟实际场景,本申请实施例研究了极端设置和常见的柔和设置。In addition, the embodiments of this application also conducted precise experiments on the method provided. The left atrium (LA) segmentation data set was used for the experiment. The left atrial segmentation data set provides 100 3D magnetic resonance images (which can be understood as the three-dimensional first image) with expert labels (which can be understood as the label information of the pixels in the first image). The resolution of the image can be 0.625*0.625*0.625m^3. All images are cropped to the center of the heart region and normalized to zero mean and unit variance. In order to simulate actual scenarios, the embodiments of this application study extreme settings and common soft settings.
其中,在极端设置下,只有2个样本(代码实现中的最小数量HQ标记批大小(batch size))用作HQ标记信息(即具有高质量标记信息的样本,可以理解为第一图像),而柔和设置则使用了8个(10%)样本作为HQ标记信息。其余样本被视为非专家低质量标注数据(即具有低质量标记信息的样本,可以理解为是第二图像),这些样本通过常用仿真标签损坏方案进行处理,包括3-15个体素的随机腐蚀和膨胀等处理。Among them, under extreme settings, only 2 samples (the minimum number of HQ label batch sizes in the code implementation) are used as HQ label information (i.e. samples with high-quality label information, which can be understood as the first image), The soft setting uses 8 (10%) samples as HQ marker information. The remaining samples are treated as non-expert low-quality annotated data (i.e., samples with low-quality label information, which can be understood as second images), and these samples are processed through commonly used simulated label corruption schemes, including random erosion of 3-15 voxels. and expansion processing.
实验框架使用24GB(计算机存储单位)内存的NVIDIA GeForce RTX(图形处理平台)3090GPU(图形处理器),用Python(一种计算机编程语言)和PyTorch(一个开源的机器学习库)实现。在所有实验中,采用相同的3D V-Net(基于全卷积神经网络设计的3D(3维)医学图像分割网络)作为主干进行公平的比较。此外,使用SGD(一种优化器)训练网络(权重衰减=0.0001,动量=0.9)。Batch(批次)大小设置为4,包括2个高质量标注图像(可以理解为第一图像)和2个低质量标注图像(可以理解为第二图像)。最大训练步骤都设置为8000。学习速率被初始化为0.01,并且在每一步之后以0.9的幂衰减。本申请随机裁剪了112×112×80体素的切块作为网络输入,还应用了标准的数据扩充,包括随机裁剪、翻转和旋转,并使用18×18×4体素步长的滑动窗口策略用于测试阶段。The experimental framework uses NVIDIA GeForce RTX (graphics processing platform) 3090GPU (graphics processing unit) with 24GB (computer storage unit) memory, and is implemented in Python (a computer programming language) and PyTorch (an open source machine learning library). In all experiments, the same 3D V-Net (a 3D (3-dimensional) medical image segmentation network designed based on a fully convolutional neural network) was used as the backbone for fair comparison. Additionally, the network was trained using SGD (an optimizer) (weight decay = 0.0001, momentum = 0.9). The Batch size is set to 4, including 2 high-quality annotated images (can be understood as the first image) and 2 low-quality annotated images (can be understood as the second image). The maximum training steps are all set to 8000. The learning rate is initialized to 0.01 and decays by a power of 0.9 after each step. This application randomly crops 112×112×80 voxel blocks as network input, and also applies standard data expansion, including random cropping, flipping and rotation, and uses a sliding window strategy of 18×18×4 voxel step size. for testing phase.
在此基础上,本申请实施例采用了四个指标来进行综合评价,包括:Dice(一种图像分割评价指标)、Jaccard(一种数据挖掘指标)、ASD(平均表面距离)、95HD(一种医学图像分割评价指标),本申请实施例在医学场景的实验数据如下表1所示:On this basis, the embodiment of this application uses four indicators for comprehensive evaluation, including: Dice (an image segmentation evaluation indicator), Jaccard (a data mining indicator), ASD (average surface distance), 95HD (a (evaluation index for medical image segmentation). The experimental data of the embodiments of this application in medical scenarios are shown in Table 1 below:
表1

Table 1

其中,本申请实施例进行了两个实验,表1中第2行~第9行是一个实验的实验数据,表1中第10行~第17行是另一个实验的实验数据。Set-HQ表示训练所用的高质量标注数据的数量,Set-LQ表示训练所用的低质量标注数据的数量。“HQ-LQ分离?”表示对应方法是否是对低质量标注数据和高质量标注数据进行分离训练。Dice和Jaccard越高,表示效果越好,而ASD和95HD越小,表示效果越好,表1中括号内的数值表示对应方法下指标的标准差。Among them, the embodiment of the present application conducted two experiments. Rows 2 to 9 in Table 1 are the experimental data of one experiment, and rows 10 to 17 of Table 1 are the experimental data of another experiment. Set-HQ represents the number of high-quality annotated data used for training, and Set-LQ represents the number of low-quality annotated data used for training. "HQ-LQ separation?" indicates whether the corresponding method performs separate training on low-quality annotated data and high-quality annotated data. The higher Dice and Jaccard are, the better the effect is, while the smaller ASD and 95HD are, the better the effect is. The values in brackets in Table 1 represent the standard deviation of the indicator under the corresponding method.
H-Sup表示仅用高质量标注数据进行监督训练,HL-Sup表示用高质量和低质量标注数据混合监督训练,TriNet表示使用由三个网络组成的联合学习框架,该网络通过集成来自两个网络的预测来监督第三个网络,2RnT表示一种两阶段方法,通过估计用于标签修正的混淆矩阵来提升标注质量,PNL表示引入了图像级标签质量评估模块,以识别具有干净标签的图像来调整网络,KDEM表示引入了知识蒸馏技术和熵最小化优化项来训练网络,Decoupled表示使用两个分开的解码器(一个对应于高质量标注数据,一个对应于低质量标注数据)隐式解耦来训练网络。H-Sup represents supervised training with only high-quality annotated data, HL-Sup represents mixed supervised training with high-quality and low-quality annotated data, and TriNet represents the use of a joint learning framework composed of three networks that integrates data from two network’s predictions to supervise the third network, 2RnT represents a two-stage method to improve annotation quality by estimating a confusion matrix for label correction, and PNL represents the introduction of an image-level label quality assessment module to identify images with clean labels To adjust the network, KDEM means that knowledge distillation technology and entropy minimization optimization terms are introduced to train the network, and Decoupled means that two separate decoders (one corresponding to high-quality annotated data and one corresponding to low-quality annotated data) are used to implicitly solve the problem coupled to train the network.
通过上述实验数据可见,对于上述所进行的两个实验,本申请实施例所提供的方法都具有最佳的综合训练效果,这也证明了采用本申请所提供方法的优越性和鲁棒性。It can be seen from the above experimental data that for the two experiments conducted above, the method provided by the embodiment of the present application has the best comprehensive training effect, which also proves the superiority and robustness of the method provided by the present application.
请参见图9,图9是本申请实施例提供的一种数据处理装置的结构示意图。该数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该数据处理装置为一个应用软件,该数据处理装置可以用于执行本申请实施例提供的方法中的相应步骤。如图9所示,该数据处理装置1可以包括:获取模块11、第一调用模块12、第二调用模块13、分类模块14和优化模块15。Please refer to FIG. 9 , which is a schematic structural diagram of a data processing device provided by an embodiment of the present application. The data processing device may be a computer program (including program code) running in a computer device. For example, the data processing device may be an application software. The data processing device may be used to execute corresponding steps in the method provided by the embodiments of the present application. . As shown in FIG. 9 , the data processing device 1 may include: an acquisition module 11 , a first calling module 12 , a second calling module 13 , a classification module 14 and an optimization module 15 .
获取模块11,用于获取包含目标对象的第一图像和第二图像;其中,所述第一图像中目标对象所在的图像区域为第一特征区域;所述第二图像中目标对象所在的图像区域为第二特征区域;Acquisition module 11 is used to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image where the target object is located in the second image is The area is the second characteristic area;
第一处理模块12,用于将所述第一图像输入预测神经网络,得到第一预测结果;第一预测结果包括:指示第一图像的各像素点是否属于第一特征区域的第一预测像素信息;The first processing module 12 is used to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: a first prediction pixel indicating whether each pixel of the first image belongs to the first feature area. information;
第二处理模块13,用于将所述第二图像输入预测神经网络,得到第二预测结果;第二预测结果包括:指示第二图像的各像素点是否属于第二特征区域的第二预测像素信息;The second processing module 13 is used to input the second image into a prediction neural network to obtain a second prediction result; the second prediction result includes: a second prediction pixel indicating whether each pixel of the second image belongs to the second feature area. information;
分类模块14,用于通过辅助神经网络对第二图像的各像素点进行分类预测,得到分类预测结果;分类预测结果用于指示第二图像中属于第一分类的像素点、及属于第二分类的像素点;其中,所述第一分类的像素点为所述辅助神经网络预测出的第二图像中具有正确标记信息的像素点;所述第二分类的像素点为所述辅助神经网络预测出的第二图像中具有错误标记信息的像素点; The classification module 14 is used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points in the second image that belong to the first classification and those that belong to the second classification. pixels of pixels with incorrect label information in the second image;
优化模块15,用于根据第一预测结果、第二预测结果、及分类预测结果,对所述预测神经网络进行网络参数优化,得到训练后的预测神经网络,所述训练后的预测神经网络用于对目标图像进行图像分割。The optimization module 15 is used to optimize the network parameters of the predictive neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained predictive neural network. The trained predictive neural network is For image segmentation of target images.
根据本申请的一些实施例,图3所示的数据处理方法所涉及的步骤可由图9所示的数据处理装置1中的各个模块来执行。例如,图3中所示的步骤S101可由图9中的获取模块11来执行,图3中所示的步骤S102可由图9中的第一处理模块12来执行;图3中所示的步骤S103可由图9中的第二处理模块13来执行,图3中所示的步骤S104可由图9中的分类模块14来执行,图3中所示的步骤S105可由图9中的优化模块15来执行。According to some embodiments of the present application, the steps involved in the data processing method shown in FIG. 3 may be performed by various modules in the data processing device 1 shown in FIG. 9 . For example, step S101 shown in FIG. 3 can be performed by the acquisition module 11 in FIG. 9 , step S102 shown in FIG. 3 can be performed by the first processing module 12 in FIG. 9 ; step S103 shown in FIG. 3 The step S104 shown in Figure 3 can be performed by the second processing module 13 in Figure 9 , the step S104 shown in Figure 3 can be performed by the classification module 14 in Figure 9 , and the step S105 shown in Figure 3 can be performed by the optimization module 15 in Figure 9 .
本申请实施例可以获取具有第一特征区域的第一图像和具有第二特征区域的第二图像;通过预测神经网络对第一图像进行预测,得到第一预测结果;第一预测结果包括:指示第一图像的各像素点是否属于第一特征区域的第一预测像素信息;通过预测神经网络对第二图像进行预测,得到第二预测结果;第二预测结果包括:指示第二图像的各像素点是否属于第二特征区域的第二预测像素信息;通过辅助神经网络对第二图像的像素点进行分类预测,得到分类预测结果;分类预测结果用于指示在第二图像上属于第一分类的像素点、及属于第二分类的像素点;根据第一预测结果、第二预测结果、及分类预测结果,对预测神经网络进行网络参数优化。由此可见,本申请实施例提出的装置可以通过辅助神经网络对第二图像中的像素点进行分类,后续可以通过辅助神经网络对第二图像中各像素点的分类结果来对预测神经网络进行参数优化,这可以提升对预测神经网络进行参数优化的准确性,后续通过参数优化后的预测神经网络也可以准确地对图像中的特征区域进行分割。Embodiments of the present application can acquire a first image with a first characteristic area and a second image with a second characteristic area; predict the first image through a prediction neural network to obtain a first prediction result; the first prediction result includes: indication First predicted pixel information of whether each pixel of the first image belongs to the first feature area; predicting the second image through a prediction neural network to obtain a second prediction result; the second prediction result includes: indicating each pixel of the second image The second predicted pixel information of whether the point belongs to the second feature area; classify and predict the pixel points of the second image through the auxiliary neural network to obtain the classification prediction result; the classification prediction result is used to indicate that the pixels belonging to the first classification on the second image pixel points, and pixel points belonging to the second category; according to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized. It can be seen that the device proposed in the embodiment of the present application can classify the pixels in the second image through the auxiliary neural network, and can subsequently perform the prediction neural network on the classification results of each pixel in the second image through the auxiliary neural network. Parameter optimization, which can improve the accuracy of parameter optimization of the predictive neural network. Subsequently, the predictive neural network after parameter optimization can also accurately segment the feature areas in the image.
根据本申请的一些实施例,图9所示的数据处理装置1中的各个模块可以分别或全部合并为一个或若干个单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个子单元,可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述模块是基于逻辑功能划分的,在实际应用中,一个模块的功能也可以由多个单元来实现,或者多个模块的功能由一个单元实现。在本申请的其它实施例中,数据处理装置1也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。According to some embodiments of the present application, each module in the data processing device 1 shown in Figure 9 can be separately or entirely combined into one or several units, or some of the units can be further divided into Multiple subunits with smaller functions can implement the same operation without affecting the realization of the technical effects of the embodiments of the present application. The above modules are divided based on logical functions. In practical applications, the function of one module can also be realized by multiple units, or the functions of multiple modules can be realized by one unit. In other embodiments of the present application, the data processing device 1 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
根据本申请的一些实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算机设备上运行能够执行如图3中所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图9中所示的数据处理装置1,以及来实现本申请实施例的数据处理方法。上述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。 According to some embodiments of the present application, the method can be implemented by running on a general-purpose computer device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and other processing elements and storage elements. A computer program (including program code) capable of executing the steps involved in the corresponding method as shown in Figure 3 to construct the data processing device 1 as shown in Figure 9 and to implement the data processing method of the embodiment of the present application . The above-mentioned computer program can be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and run therein.
请参见图10,图10是本申请实施例提供的一种计算机设备的结构示意图。如图10所示,计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图10所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。Please refer to FIG. 10 , which is a schematic structural diagram of a computer device provided by an embodiment of the present application. As shown in Figure 10, the computer device 1000 may include: a processor 1001, a network interface 1004 and a memory 1005. In addition, the computer device 1000 may also include: a user interface 1003, and at least one communication bus 1002. Among them, the communication bus 1002 is used to realize connection communication between these components. Among them, the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may include standard wired interfaces and wireless interfaces (such as WI-FI interfaces). The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001. As shown in Figure 10, memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
在图10所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现上述实施例所述的数据处理方法。In the computer device 1000 shown in Figure 10, the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the data processing method described in the above embodiment.
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图3对应实施例中对上述数据处理方法的描述,也可执行前文图9所对应实施例中对上述数据处理装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。It should be understood that the computer device 1000 described in the embodiment of the present application can execute the description of the above-mentioned data processing method in the embodiment corresponding to FIG. 3, and can also execute the description of the above-mentioned data processing device 1 in the embodiment corresponding to FIG. 9. , which will not be described in detail here. In addition, the description of the beneficial effects of using the same method will not be described again.
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且计算机可读存储介质中存储有前文提及的数据处理装置1所执行的计算机程序,且计算机程序包括程序指令,当处理器执行程序指令时,能够执行前文图3所对应实施例中对数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。In addition, it should be pointed out here that the embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores the computer program executed by the aforementioned data processing device 1, and the computer program includes Program instructions, when the processor executes the program instructions, can execute the description of the data processing method in the embodiment corresponding to Figure 3. Therefore, the details will not be described here. In addition, the description of the beneficial effects of using the same method will not be described again. For technical details not disclosed in the computer storage medium embodiments involved in this application, please refer to the description of the method embodiments in this application.
作为示例,上述程序指令可被部署在一个计算机设备上执行,或者被部署位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行,分布在多个地点且通过通信网络互连的多个计算机设备可以组成区块链网络。As an example, the above program instructions may be deployed on one computer device for execution, or on multiple computer devices located at one location, or on multiple computer devices distributed at multiple locations and interconnected through a communication network. Multiple computer devices distributed in multiple locations and interconnected through communication networks can form a blockchain network.
上述计算机可读存储介质可以是前述任一实施例提供的数据处理装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。 The above-mentioned computer-readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned computer equipment, such as the hard disk or memory of the computer equipment. The computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the computer device, Flash card, etc. Further, the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行前文图3对应实施例中对上述数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。Embodiments of the present application provide a computer program product or computer program. The computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the description of the above-mentioned data processing method in the corresponding embodiment of Figure 3. Therefore, the description will not be repeated here. Elaborate. In addition, the description of the beneficial effects of using the same method will not be described again. For technical details not disclosed in the computer-readable storage medium embodiments involved in this application, please refer to the description of the method embodiments in this application.
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。The terms “first”, “second”, etc. in the description, claims, and drawings of the embodiments of this application are used to distinguish different objects, rather than describing a specific sequence. Furthermore, the term "includes" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, device, product or equipment that includes a series of steps or units is not limited to the listed steps or modules, but optionally also includes unlisted steps or modules, or optionally also includes Other step units inherent to such processes, methods, apparatus, products or equipment.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the relationship between hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
本申请实施例提供的方法及相关装置是参照本申请实施例提供的方法流程图和/或结构示意图来描述的,具体可由计算机程序指令实现方法流程图和/或结构示意图的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。这些计算机程序指令可提供到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或结构示意一个方框或多个方框中指定的功能的步骤。The methods and related devices provided by the embodiments of the present application are described with reference to the method flowcharts and/or structural schematic diagrams provided by the embodiments of the present application. Specifically, each process and/or the method flowcharts and/or structural schematic diagrams can be implemented by computer program instructions. or blocks, and combinations of processes and/or blocks in flowcharts and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the structural diagram. These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in one process or multiple processes in the flowchart and/or in one block or multiple blocks in the structural diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart and/or a block or blocks of a structural representation.
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因 此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。 What is disclosed above is only the preferred embodiment of the present application. Of course, it cannot be used to limit the scope of rights of the present application. Such equivalent changes based on the claims of this application will still fall within the scope of this application.

Claims (16)

  1. 一种数据处理方法,由计算机设备执行,所述方法包括:A data processing method, executed by computer equipment, the method includes:
    获取包含目标对象的第一图像和第二图像;其中,所述第一图像中目标对象所在的图像区域为第一特征区域;所述第二图像中目标对象所在的图像区域为第二特征区域;Obtain the first image and the second image containing the target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image area where the target object is located in the second image is the second characteristic area. ;
    将所述第一图像输入预测神经网络,得到第一预测结果;所述第一预测结果包括:分别指示所述第一图像的各像素点是否属于所述第一特征区域的第一预测像素信息;The first image is input into a prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel of the first image belongs to the first feature area. ;
    将所述第二图像输入所述预测神经网络,得到第二预测结果;所述第二预测结果包括:分别指示所述第二图像的各像素点是否属于所述第二特征区域的第二预测像素信息;The second image is input into the prediction neural network to obtain a second prediction result; the second prediction result includes: a second prediction indicating whether each pixel of the second image belongs to the second feature area. Pixel information;
    通过辅助神经网络对所述第二图像的各像素点进行分类预测,得到分类预测结果;所述分类预测结果用于指示所述第二图像中属于第一分类的像素点、及属于第二分类的像素点;其中,所述第一分类的像素点为所述辅助神经网络预测出的第二图像中具有正确标记信息的像素点;所述第二分类的像素点为所述辅助神经网络预测出的第二图像中具有错误标记信息的像素点;Classification prediction is performed on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixels in the second image that belong to the first classification and the pixels that belong to the second classification. pixels of pixels with incorrect label information in the second image;
    根据所述第一预测结果、所述第二预测结果、以及所述分类预测结果,对所述预测神经网络进行网络参数优化,得到训练后的预测神经网络,所述训练后的预测神经网络用于对目标图像进行图像分割。According to the first prediction result, the second prediction result, and the classification prediction result, network parameters are optimized for the prediction neural network to obtain a trained prediction neural network. The trained prediction neural network is For image segmentation of target images.
  2. 如权利要求1所述的方法,其中,对所述预测神经网络进行网络参数优化包括:The method of claim 1, wherein optimizing network parameters of the predictive neural network includes:
    根据所述第一预测结果与所述第一图像的第一监督数据,获取所述预测神经网络的第一预测偏差,所述第一监督数据包括指示第一图像中每个像素点是否属于第一特征区域的标记信息;The first prediction deviation of the prediction neural network is obtained according to the first prediction result and the first supervision data of the first image, where the first supervision data includes an indication of whether each pixel in the first image belongs to the first Marking information of a characteristic area;
    根据所述第二图像中属于所述第一分类的各像素点的第二预测像素信息与所述第二图像的第二监督数据,获取所述预测神经网络的第二预测偏差,及Obtain the second prediction deviation of the prediction neural network according to the second prediction pixel information of each pixel belonging to the first classification in the second image and the second supervision data of the second image, and
    根据所述第二图像中属于所述第二分类的各像素点的第二预测像素信息与所述第二监督数据,获取所述预测神经网络的第三预测偏差,Obtain the third prediction deviation of the prediction neural network according to the second prediction pixel information of each pixel belonging to the second classification in the second image and the second supervision data,
    根据所述第一预测偏差、第二预测偏差、以及第三预测偏差,对所述预测神经网络进行参数优化;Perform parameter optimization on the predictive neural network according to the first prediction deviation, the second prediction deviation, and the third prediction deviation;
    其中,所述第二监督数据包括指示第二图像中每个像素点是否属于第二特征区域的标记信息;Wherein, the second supervision data includes label information indicating whether each pixel in the second image belongs to the second feature area;
    所述第二预测偏差和所述第三预测偏差通过不同的计算方式获取。 The second prediction deviation and the third prediction deviation are obtained through different calculation methods.
  3. 如权利要求2所述的方法,其中,通过所述第一预测偏差、所述第二预测偏差和所述第三预测偏差对所述预测神经网络进行参数优化,包括:The method of claim 2, wherein parameter optimization of the prediction neural network through the first prediction deviation, the second prediction deviation and the third prediction deviation includes:
    根据所述第二预测偏差和所述第三预测偏差确定所述预测神经网络针对所述第二图像的综合预测偏差;Determine the comprehensive prediction deviation of the prediction neural network for the second image according to the second prediction deviation and the third prediction deviation;
    获取针对所述综合预测偏差的加权系数,并根据所述加权系数对所述综合预测偏差进行加权,得到加权后的综合预测偏差;Obtain a weighting coefficient for the comprehensive prediction deviation, and weight the comprehensive prediction deviation according to the weighting coefficient to obtain a weighted comprehensive prediction deviation;
    根据所述第一预测偏差和所述加权后的综合预测偏差对所述预测神经网络进行参数优化。Parameter optimization of the prediction neural network is performed according to the first prediction deviation and the weighted comprehensive prediction deviation.
  4. 如权利要求1所述的方法,其中,所述通过辅助神经网络对所述第二图像的各像素点进行分类预测,得到分类预测结果,包括:The method of claim 1, wherein the classification prediction is performed on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result, including:
    将所述第二图像输入所述辅助神经网络,得到所述第二特征区域的区域中心特征及所述第二图像中各像素点的像素特征;Input the second image into the auxiliary neural network to obtain the regional center feature of the second feature region and the pixel features of each pixel in the second image;
    基于所述区域中心特征、所述第二图像中各像素点的像素特征及所述第二图像的第二监督数据,将所述第二图像中具有正确标记信息的像素点确定为第一分类,将所述第二图像中具有错误标记信息的像素点确定为第二分类;所述第二监督数据包括指示所述第二图像中各像素点是否属于第二特征区域的标记信息。Based on the region center feature, the pixel features of each pixel in the second image, and the second supervision data of the second image, the pixels with correct label information in the second image are determined as the first category , determining the pixels with wrong label information in the second image as the second category; the second supervision data includes label information indicating whether each pixel in the second image belongs to the second feature area.
  5. 如权利要求4所述的方法,其中,所述将所述第二图像输入所述辅助神经网络,得到所述第二特征区域的区域中心特征及所述第二图像中各像素点的像素特征,包括:The method of claim 4, wherein the second image is input into the auxiliary neural network to obtain the regional center feature of the second feature region and the pixel features of each pixel in the second image. ,include:
    将所述第二图像输入所述辅助神经网络,得到所述第二图像中各像素点的像素特征;Input the second image into the auxiliary neural network to obtain the pixel characteristics of each pixel in the second image;
    通过所述辅助神经网络预测所述第二图像中各像素点的掩码区域,并确定所述第二图像中各像素点的掩码区域的预测准确指数;Use the auxiliary neural network to predict the mask area of each pixel in the second image, and determine the prediction accuracy index of the mask area of each pixel in the second image;
    通过所述辅助神经网络基于所述第二图像中各像素点的掩码区域对所述第二图像进行预测,得到第三预测结果;所述第三预测结果包括:用于指示所述第二图像的各像素点是否属于所述第二特征区域的第三预测像素信息;The second image is predicted by the auxiliary neural network based on the mask area of each pixel in the second image to obtain a third prediction result; the third prediction result includes: used to indicate the second Whether each pixel point of the image belongs to the third predicted pixel information of the second feature area;
    根据所述第二图像中各像素点的像素特征、所述第二图像中各像素点的掩码区域的预测准确指数及所述第三预测像素信息,生成所述区域中心特征。The region center feature is generated based on the pixel characteristics of each pixel in the second image, the prediction accuracy index of the mask area of each pixel in the second image, and the third predicted pixel information.
  6. 如权利要求5所述的方法,其中,所述根据所述第二图像中各像素点的像素特征、所述第二图像中各像素点的掩码区域的预测准确指数及所述第三预测像素信息,生成所述 区域中心特征,包括:The method of claim 5, wherein the prediction accuracy index is based on the pixel characteristics of each pixel in the second image, the prediction accuracy index of the mask area of each pixel in the second image, and the third prediction. pixel information, generated as described in Regional center features include:
    将所述第二图像中对应掩码区域的预测准确指数大于指数阈值的像素点确定为评估像素点,得到至少一个评估像素点;Determine pixels whose prediction accuracy index of the corresponding mask area in the second image is greater than the index threshold as evaluation pixels, and obtain at least one evaluation pixel;
    根据至少一个评估像素点的像素特征和所述至少一个评估像素点的第三预测像素信息,生成所述区域中心特征。The region center feature is generated based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point.
  7. 如权利要求6所述的方法,其中,所述根据至少一个评估像素点的像素特征和所述至少一个评估像素点的第三预测像素信息,生成所述区域中心特征,包括:The method of claim 6, wherein generating the region center feature based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point includes:
    根据第二特征区域的目标评估像素点的像素特征和所述目标评估像素点的第三预测像素信息,生成所述第二特征区域的目标中心特征;所述目标中心特征用于表示所述第二特征区域的图像的结构特征;所述目标评估像素点为:所述至少一个评估像素点中,第三预测像素信息指示其属于所述第二特征区域的评估像素点;The target center feature of the second feature area is generated according to the pixel characteristics of the target evaluation pixel point in the second feature area and the third predicted pixel information of the target evaluation pixel point; the target center feature is used to represent the third Structural features of the image in the two characteristic areas; the target evaluation pixel point is: among the at least one evaluation pixel point, the third predicted pixel information indicates that it belongs to the evaluation pixel point in the second characteristic area;
    根据第二特征区域的背景评估像素点的像素特征和所述背景评估像素点的第三预测像素信息,生成所述第二特征区域的背景中心特征;所述背景中心特征用于表示所述第二图像中所述第二特征区域的背景图像的结构特征;所述第二特征区域的背景评估像素点,包括:所述至少一个评估像素点中,第三预测像素信息指示其不属于所述第二特征区域的评估像素点;The background center feature of the second feature area is generated according to the pixel characteristics of the background evaluation pixel of the second feature area and the third predicted pixel information of the background assessment pixel; the background center feature is used to represent the third The structural characteristics of the background image of the second characteristic area in the second image; the background evaluation pixel points of the second characteristic area include: among the at least one evaluation pixel point, the third predicted pixel information indicates that it does not belong to the Evaluation pixels of the second feature area;
    将所述目标中心特征和所述背景中心特征确定为所述区域中心特征。The target center feature and the background center feature are determined as the area center features.
  8. 如权利要求7所述的方法,其中,所述基于所述区域中心特征、所述第二图像中各像素点的像素特征及所述第二监督数据,确定所述分类预测结果,包括:The method of claim 7, wherein determining the classification prediction result based on the region center feature, the pixel feature of each pixel in the second image and the second supervision data includes:
    针对所述第二图像中各像素点,分别执行以下操作:For each pixel in the second image, perform the following operations:
    获取所述像素点的像素特征与所述目标中心特征之间的第一特征距离,并获取所述像素点的像素特征与所述背景中心特征之间的第二特征距离;Obtain the first feature distance between the pixel feature of the pixel point and the target center feature, and obtain the second feature distance between the pixel feature of the pixel point and the background center feature;
    若所述第一特征距离大于所述第二特征距离、且所述第二监督数据中所述像素点的标记信息指示所述像素点不属于所述第二特征区域,则确定所述像素点属于所述第一分类;If the first feature distance is greater than the second feature distance, and the label information of the pixel in the second supervision data indicates that the pixel does not belong to the second feature area, then determine that the pixel Belongs to the first category;
    若所述第一特征距离大于所述第二特征距离、且所述第二监督数据中所述像素点的标记信息指示所述像素点属于所述第二特征区域,则确定所述像素点属于所述第二分类;If the first feature distance is greater than the second feature distance, and the label information of the pixel in the second supervision data indicates that the pixel belongs to the second feature area, it is determined that the pixel belongs to The second category;
    若所述第一特征距离小于所述第二特征距离、且所述第二监督数据中所述像素点的标记信息指示所述像素点属于所述第二特征区域,则确定所述像素点属于所述第一分类;If the first feature distance is less than the second feature distance, and the label information of the pixel in the second supervision data indicates that the pixel belongs to the second feature area, it is determined that the pixel belongs to The first category;
    若所述第一特征距离小于所述第二特征距离、且所述第二监督数据中所述像素点的标 记信息指示所述像素点不属于所述第二特征区域,则确定所述像素点所述第二分类。If the first feature distance is less than the second feature distance and the label of the pixel in the second supervision data is If the record information indicates that the pixel does not belong to the second feature area, the second classification of the pixel is determined.
  9. 如权利要求5所述的方法,其中,所述确定所述第二图像中各像素点的掩码区域的预测准确指数,包括:The method of claim 5, wherein determining the prediction accuracy index of the mask area of each pixel in the second image includes:
    针对所述第二图像中的各像素点,分别执行以下操作:For each pixel in the second image, perform the following operations:
    分别对所述辅助神经网络的网络参数进行K次随机丢弃,得到所述辅助神经网络的K个变形网络;K为正整数;The network parameters of the auxiliary neural network are randomly discarded K times to obtain K deformed networks of the auxiliary neural network; K is a positive integer;
    分别通过每个变形网络基于所述像素点的掩码区域,获取针对所述像素点的K个变形预测像素信息;每个变形预测像素信息包括:用于指示所述像素点是否属于所述第二特征区域的信息;K deformation prediction pixel information for the pixel is obtained through each deformation network based on the mask area of the pixel; each deformation prediction pixel information includes: used to indicate whether the pixel belongs to the first Two feature area information;
    根据所述K个变形预测像素信息,确定所述像素点的掩码区域的预测准确指数。According to the K deformation prediction pixel information, the prediction accuracy index of the mask area of the pixel point is determined.
  10. 如权利要求9所述的方法,其中,每个变形预测像素信息包括所述像素点属于所述第二特征区域的第一预测概率、及所述像素点属于所述第二图像中所述第二特征区域的背景图像的第二预测概率;The method of claim 9, wherein each deformation prediction pixel information includes a first prediction probability that the pixel point belongs to the second feature area, and the pixel point belongs to the first prediction probability in the second image. The second prediction probability of the background image of the two feature areas;
    所述根据K个变形预测像素信息,确定所述像素点的掩码区域的预测准确指数,包括:Determining the prediction accuracy index of the mask area of the pixel point based on K deformation prediction pixel information includes:
    获取所述K个变形预测像素信息的K个第一预测概率之间的标准差,作为针对所述像素点的目标预测准确指数;Obtain the standard deviation between the K first prediction probabilities of the K deformation prediction pixel information as the target prediction accuracy index for the pixel point;
    获取所述K个变形预测像素信息的K个第二预测概率之间的标准差,作为针对所述像素点的背景预测准确指数;Obtain the standard deviation between the K second prediction probabilities of the K deformation prediction pixel information as the background prediction accuracy index for the pixel point;
    将所述目标预测准确指数和所述背景预测准确指数确定为所述目标像素点的掩码区域的预测准确指数。The target prediction accuracy index and the background prediction accuracy index are determined as the prediction accuracy index of the mask area of the target pixel point.
  11. 如权利要求10所述的方法,其中,所述将所述第二图像中对应掩码区域的预测准确指数大于指数阈值的像素点确定为评估像素点,包括:The method of claim 10, wherein determining the pixels whose prediction accuracy index of the corresponding mask area in the second image is greater than an index threshold as evaluation pixels includes:
    若所述目标预测准确指数和所述背景预测准确指数均大于所述指数阈值,则将所述目标像素点确定为评估像素点。If both the target prediction accuracy index and the background prediction accuracy index are greater than the index threshold, the target pixel is determined as an evaluation pixel.
  12. 如权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, further comprising:
    获取目标图像;所述目标图像包括目标特征区域;Obtain a target image; the target image includes a target feature area;
    将所述目标图像输入训练后的预测神经网络,得到目标预测结果;所述目标预测结果 包括:指示所述目标图像的各像素点是否属于所述目标特征区域的目标预测像素信息;Input the target image into the trained prediction neural network to obtain the target prediction result; the target prediction result It includes: target prediction pixel information indicating whether each pixel of the target image belongs to the target feature area;
    基于所述目标预测像素信息在所述目标图像中对所述目标特征区域的图像进行分割。Segment the image of the target feature area in the target image based on the target predicted pixel information.
  13. 一种数据处理装置,所述装置包括:A data processing device, the device includes:
    获取模块,用于获取包含目标对象的第一图像和第二图像;其中,所述第一图像中目标对象所在的图像区域为第一特征区域;所述第二图像中目标对象所在的图像区域为第二特征区域;An acquisition module, configured to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is is the second characteristic area;
    第一处理模块,用于将所述第一图像输入预测神经网络,得到第一预测结果;所述第一预测结果包括:指示所述第一图像的各像素点是否属于所述第一特征区域的第一预测像素信息;A first processing module, configured to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: indicating whether each pixel of the first image belongs to the first feature area The first predicted pixel information;
    第二处理模块,用于将所述第二图像输入所述预测神经网络,得到第二预测结果;所述第二预测结果包括:指示所述第二图像的各像素点是否属于所述第二特征区域的第二预测像素信息;A second processing module, configured to input the second image into the prediction neural network to obtain a second prediction result; the second prediction result includes: indicating whether each pixel of the second image belongs to the second Second predicted pixel information of the feature area;
    分类模块,用于通过辅助神经网络对所述第二图像的各像素点进行分类预测,得到分类预测结果;所述分类预测结果用于指示所述第二图像中属于第一分类的像素点、及属于第二分类的像素点;其中,所述第一分类的像素点为所述辅助神经网络预测出的第二图像中具有正确标记信息的像素点;所述第二分类的像素点为所述辅助神经网络预测出的第二图像中具有错误标记信息的像素点;A classification module used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points belonging to the first classification in the second image, and pixels belonging to the second category; wherein, the pixels of the first category are pixels with correct label information in the second image predicted by the auxiliary neural network; and the pixels of the second category are the pixels of the second category. pixels with incorrect label information in the second image predicted by the auxiliary neural network;
    优化模块,用于根据所述第一预测结果、所述第二预测结果、及所述分类预测结果,对所述预测神经网络进行网络参数优化,得到训练后的预测神经网络,所述训练后的预测神经网络用于对目标图像进行图像分割。An optimization module, configured to optimize the network parameters of the predictive neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained predictive neural network. The predictive neural network is used for image segmentation of target images.
  14. 一种计算机程序产品,包括计算机程序指令,该计算机程序指令被处理器执行时实现权利要求1-12任一项所述方法的步骤。A computer program product, comprising computer program instructions, which when executed by a processor, implement the steps of the method described in any one of claims 1-12.
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行权利要求1-12中任一项所述方法的步骤。A computer device, including a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor performs the method of any one of claims 1-12. step.
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序适用于由处理器加载并执行权利要求1-12任一项所述的方法。 A computer-readable storage medium stores a computer program, and the computer program is adapted to be loaded by a processor and execute the method described in any one of claims 1-12.
PCT/CN2023/081603 2022-04-29 2023-03-15 Data processing method and apparatus, program product, computer device, and medium WO2023207389A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210466331.9 2022-04-29
CN202210466331.9A CN115115828A (en) 2022-04-29 2022-04-29 Data processing method, apparatus, program product, computer device and medium

Publications (1)

Publication Number Publication Date
WO2023207389A1 true WO2023207389A1 (en) 2023-11-02

Family

ID=83326538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081603 WO2023207389A1 (en) 2022-04-29 2023-03-15 Data processing method and apparatus, program product, computer device, and medium

Country Status (2)

Country Link
CN (1) CN115115828A (en)
WO (1) WO2023207389A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115828A (en) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 Data processing method, apparatus, program product, computer device and medium
CN115908823B (en) * 2023-03-09 2023-05-12 南京航空航天大学 Semantic segmentation method based on difficulty distillation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210216825A1 (en) * 2020-01-09 2021-07-15 International Business Machines Corporation Uncertainty guided semi-supervised neural network training for image classification
CN113538480A (en) * 2020-12-15 2021-10-22 腾讯科技(深圳)有限公司 Image segmentation processing method and device, computer equipment and storage medium
CN113822851A (en) * 2021-06-15 2021-12-21 腾讯科技(深圳)有限公司 Image segmentation method, device, equipment and storage medium
CN114359563A (en) * 2022-03-21 2022-04-15 深圳思谋信息科技有限公司 Model training method and device, computer equipment and storage medium
US20220129731A1 (en) * 2021-05-27 2022-04-28 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for training image recognition model, and method and apparatus for recognizing image
CN115115828A (en) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 Data processing method, apparatus, program product, computer device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210216825A1 (en) * 2020-01-09 2021-07-15 International Business Machines Corporation Uncertainty guided semi-supervised neural network training for image classification
CN113538480A (en) * 2020-12-15 2021-10-22 腾讯科技(深圳)有限公司 Image segmentation processing method and device, computer equipment and storage medium
US20220129731A1 (en) * 2021-05-27 2022-04-28 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for training image recognition model, and method and apparatus for recognizing image
CN113822851A (en) * 2021-06-15 2021-12-21 腾讯科技(深圳)有限公司 Image segmentation method, device, equipment and storage medium
CN114359563A (en) * 2022-03-21 2022-04-15 深圳思谋信息科技有限公司 Model training method and device, computer equipment and storage medium
CN115115828A (en) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 Data processing method, apparatus, program product, computer device and medium

Also Published As

Publication number Publication date
CN115115828A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
WO2023207389A1 (en) Data processing method and apparatus, program product, computer device, and medium
CN108985334B (en) General object detection system and method for improving active learning based on self-supervision process
CN114332135B (en) Semi-supervised medical image segmentation method and device based on dual-model interactive learning
CN112116090B (en) Neural network structure searching method and device, computer equipment and storage medium
CN110503074A (en) Information labeling method, apparatus, equipment and the storage medium of video frame
US20230153622A1 (en) Method, Apparatus, and Computing Device for Updating AI Model, and Storage Medium
CN110245550B (en) Human face noise data set CNN training method based on total cosine distribution
CN110910391A (en) Video object segmentation method with dual-module neural network structure
CN109447096B (en) Glance path prediction method and device based on machine learning
CN113743474B (en) Digital picture classification method and system based on collaborative semi-supervised convolutional neural network
CN111508000A (en) Deep reinforcement learning target tracking method based on parameter space noise network
CN110826581A (en) Animal number identification method, device, medium and electronic equipment
CN113920170A (en) Pedestrian trajectory prediction method and system combining scene context and pedestrian social relationship and storage medium
CN118196410A (en) Remote sensing image semantic segmentation method, system, equipment and storage medium
CN115861333A (en) Medical image segmentation model training method and device based on doodling annotation and terminal
CN114742840A (en) Image segmentation method and device, terminal equipment and readable storage medium
CN117765432A (en) Motion boundary prediction-based middle school physical and chemical life experiment motion detection method
CN111429414B (en) Artificial intelligence-based focus image sample determination method and related device
CN112668710B (en) Model training, tubular object extraction and data recognition method and equipment
Wu et al. FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise
CN116484868A (en) Cross-domain named entity recognition method and device based on diffusion model generation
CN111160170A (en) Self-learning human behavior identification and anomaly detection method
CN112862840A (en) Image segmentation method, apparatus, device and medium
US20240160842A1 (en) Confidence-based interactable neural-symbolic visual question answering
CN117253097B (en) Semi-supervision domain adaptive image classification method, system, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794825

Country of ref document: EP

Kind code of ref document: A1