WO2023207389A1

WO2023207389A1 - Data processing method and apparatus, program product, computer device, and medium

Info

Publication number: WO2023207389A1
Application number: PCT/CN2023/081603
Authority: WO
Inventors: 徐哲; 卢东焕; 郑冶枫
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2022-04-29
Filing date: 2023-03-15
Publication date: 2023-11-02
Also published as: CN115115828A

Abstract

Disclosed in embodiments of the present application are a data processing method and apparatus, a program product, a computer device, and a medium. The method comprises: acquiring a first image and a second image comprising a target object; generating first prediction pixel information of pixel points in the first image by means of a prediction neural network; generating second prediction pixel information of pixel points in the second image by means of the prediction neural network; classifying and predicting the pixel points of the second image by means of an auxiliary neural network to obtain a classification and prediction result, the classification and prediction result being used for indicating pixel points in the second image belonging to a first class and pixel points belonging to a second class; and performing network parameter optimization on the prediction neural network according to the first prediction pixel information, second prediction pixel information, and classification and prediction result of the pixel points.

Description

Data processing methods, devices, program products, computer equipment and media

This application claims the priority of the Chinese patent application submitted to the China Patent Office on April 29, 2022, with the application number 202210466331.9, and the invention name is "data processing method, device, program product, computer equipment and medium", and its entire content is approved by This reference is incorporated into this application.

Technical field

The present application relates to the field of computer technology, and in particular, to a data processing method, device, program product, computer equipment and medium.

Background technique

Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers simulate or realize human learning behavior. To acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve its performance.

Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

Technical content

Embodiments of the present application provide a data processing method, device, program product, computer equipment and medium, which can improve the accuracy of the trained predictive neural network. Subsequently, the trained predictive neural network can also be used to detect feature areas in the image. perform accurate segmentation.

The embodiment of this application provides a data processing method, which method includes:

Obtain the first image and the second image containing the target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image area where the target object is located in the second image is the second characteristic area. ;

The first image is input into the prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel point of the first image belongs to the first feature area;

The second image is input into the prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information indicating whether each pixel point of the second image belongs to the second feature area;

Classification prediction is performed on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points belonging to the first classification and the pixel points belonging to the second classification in the second image; where, The mentioned The pixels of one category are pixels with correct label information in the second image predicted by the auxiliary neural network; the pixels of the second category are pixels with incorrect labels in the second image predicted by the auxiliary neural network. pixels of information;

According to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized to obtain a trained prediction neural network. The trained prediction neural network is used to perform image segmentation on the target image.

The embodiment of the present application also provides a data processing device, which includes:

An acquisition module, configured to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is is the second characteristic area;

A first processing module configured to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel of the first image belongs to the first feature area. ;

A second processing module, configured to input the second image into a prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information indicating whether each pixel of the second image belongs to the second feature area. ;

The classification module is used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points in the second image that belong to the first classification and the pixel points that belong to the second classification. Pixel points; wherein, the pixel points of the first classification are pixel points with correct label information in the second image predicted by the auxiliary neural network; and the pixel points of the second classification are pixel points predicted by the auxiliary neural network. Pixels with incorrect label information in the second image;

The optimization module is used to optimize the network parameters of the prediction neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained prediction neural network. The trained prediction neural network is used to predict the target Image segmentation.

An embodiment of the present application also provides a computer device, including a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, it causes the processor to execute the method in one aspect of the present application.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. The computer program includes program instructions. When executed by a processor, the program instructions cause the processor to perform the above-mentioned aspect. Methods.

Embodiments of the present application also provide a computer program product or computer program. The computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided in various optional ways such as the above aspect.

Brief description of the drawings

Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application;

Figure 2 is a schematic diagram of a model training scenario provided by an embodiment of the present application;

Figure 3 is a schematic flow chart of a data processing method provided by an embodiment of the present application;

Figure 4 is a schematic diagram of a network training scenario provided by an embodiment of the present application;

Figure 5 is a schematic flowchart of determining classification prediction results provided by an embodiment of the present application;

Figure 6 is a schematic diagram of a scenario for determining classification results provided by an embodiment of the present application;

Figure 7 is a schematic flowchart of determining a prediction deviation provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a model training scenario provided by an embodiment of the present application;

Figure 9 is a schematic structural diagram of a data processing device provided by an embodiment of the present application;

Figure 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

Detailed ways

The technical solutions in this application will be clearly and completely described below with reference to the accompanying drawings in this application. Obviously, the described embodiments are only some of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

This application involves artificial intelligence related technologies. When performing machine learning on a model, a large amount of sample data is often required, and this large amount of sample data often has differences in sample quality. Therefore, how to train the model more accurately through sample data of varying quality has become a Problems to be solved.

The machine learning involved in the embodiments of this application mainly refers to how to train a predictive neural network, and then use the predictive neural network obtained through training to accurately segment the feature areas in the image. For details, please refer to the corresponding implementation in Figure 3 below. Description in the example.

Please refer to Figure 1. Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application. As shown in Figure 1, the network architecture may include a server 200 and a terminal device cluster. The terminal device cluster may include one or more terminal devices. There will be no limit on the number of terminal devices here. As shown in Figure 1, multiple terminal devices may specifically include terminal device 100a, terminal device 101a, terminal device 102a,..., terminal device 103a; as shown in Figure 1, terminal device 100a, terminal device 101a, terminal device 102a,... , the terminal device 103a can all have a network connection with the server 200, so that each terminal device can perform data interaction with the server 200 through the network connection.

The server 200 shown in Figure 1 can be an independent server or a server cluster composed of multiple servers. Or a distributed system, which can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms, etc. Cloud server for basic cloud computing services. Terminal devices can be: smart phones, tablets, laptops, desktop computers, smart TVs and other smart terminals. The following takes the communication between the terminal device 100a and the server 200 as an example to provide a detailed description of the embodiment of the present application.

Please refer to Figure 2. Figure 2 is a schematic diagram of a model training scenario provided by an embodiment of the present application. As shown in Figure 2, the server 200 can be used to train a student model (which can also be called a model to be trained). The process can be: the server 200 can obtain sample data for model training, and the sample data can include A small number of samples labeled by experts and a large number of samples labeled by non-experts. Any sample data can be image data. Any image data can contain several pixels. Any pixel in any image data has a label information. This data can contain target objects. The label information of any pixel is used to indicate whether the pixel belongs to the target object in the image. The image area where the target object is located in the image can be called a feature area. The server 200 can input sample data marked by experts and sample data marked by non-experts into the student model, and input sample data marked by non-experts into the teacher model (which can also be called a trained model, used to assist the training of the student model).

Among them, the student model can generate a mask for each pixel in the sample data labeled by experts, and based on the mask, generate predicted pixel information for judging whether each pixel in the sample data labeled by experts belongs to the target object. The student model can also Generate a mask for each pixel in the sample data labeled by non-experts, and generate predicted pixel information based on the mask to determine whether each pixel in the sample data labeled by non-experts belongs to the target object.

What's more, the teacher model can generate a mask for each pixel in the sample data labeled by non-experts, and then obtain the characteristics between each pixel in the sample data labeled by non-experts and the target prototype/background prototype based on this mask. distance. The target prototype can be used to represent the characteristics of the target object in the sample data. The background prototype can be used to represent the characteristics of the background image of the target object in the sample data. Then the teacher model can use the characteristic distance and the sample data labeled by non-experts to The labeling information of each pixel in the sample data labeled by non-experts is used to determine whether the labeling information of each pixel in the sample data labeled by non-experts is correctly labeled or incorrectly labeled, and the judgment results are given to the student model.

Then, the student model can use the judgment results of the teacher model (i.e., the correctly labeled pixels and incorrectly labeled pixels in the sample data labeled by non-experts) and the predicted pixel information for the sample data (including the pixels labeled by experts). The predicted pixel information of each pixel in the sample data, and the predicted pixel information of each pixel in the sample data labeled by non-experts) generates a prediction deviation, and corrects the network parameters of the student model based on the prediction deviation to obtain the trained student model .

Subsequently, the server 200 can use the trained student model to segment the target object in the image. The server 200 can provide the segmentation result to the terminal device 100a. The terminal device 100a can display the segmentation result on the terminal interface. Contact relevant technical personnel for analysis.

Wherein, the sample data marked by experts may be the following first image, the sample data marked by non-experts may be the second image described below, the mask may be the mask area described below, and the student model may be the predictive neural network described below. , the teacher network can be the following auxiliary neural network, the predicted pixel information of the student model for each pixel in the sample data can be included in the following first predicted pixel information and the second predicted pixel information, and the above-mentioned target prototype can be the following target center feature , the above-mentioned background prototype may be the following central feature of the background. Therefore, the specific process of how to train the student model through the teacher model can be referred to the description in the corresponding embodiments of Figure 3 and Figure 5 below.

Using the method provided by the embodiments of this application, the teacher model determines whether the labeling information of each pixel in the sample data labeled by non-experts is correctly labeled or incorrectly labeled based on the target prototype and background prototype of the target object, and then based on the judgment As a result, the student model can perform differential training on pixels in the sample data labeled by non-experts, and supervise the training of the student model through the sample data labeled by experts to improve the training accuracy of the student model, and then train accurate students. Model.

Please refer to Figure 3. Figure 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application. The execution subject in the embodiment of this application may be a computer device or a computer device cluster composed of multiple computer devices. The computer device can be a server or a terminal device. In the following description, the execution subjects in the embodiments of the present application are collectively referred to as computer devices as an example. As shown in Figure 3, the method may include:

Step S101, obtain a first image and a second image containing a target object, wherein the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is the third feature area. Two characteristic areas.

In some embodiments, the computer device can obtain the first image and the second image. The number of the first image and the second image is determined according to the actual application scenario, and there is no limitation on this. The first image and the second image are used for Sample data for training a predictive neural network.

Wherein, both the first image and the second image may contain a target object, and the target object may be any object that needs to be segmented from the image data. The target object may be determined according to the actual application scenario. The method provided by this application can be applied to any image segmentation scene, which may be a two-dimensional segmentation scene or a three-dimensional segmentation scene. The display form of the target object in the first image (for example, the object category is different, or the object category is the same but the posture is different, or the environment in which the object is located is different, etc.) and the display form of the target object in the second image may be different. For example, the target object may be the left ventricle, and both the first image and the second image may contain images of the left ventricle, but the images of the left ventricle contained in the first image and the images of the left ventricle contained in the second image may be different.

Among them, the image area where the target object in the image is located can be called a feature area. Further, the image area where the target object is located in the first image can be called the first feature area. The target in the second image can be called a feature area. The image area where the object is located is called the second feature area.

Both the first image and the second image may contain several pixels.

For example, if the method provided by the embodiment of the present application is applied to a two-dimensional image segmentation scenario, the first image and the second image may be two-dimensional images, and the pixels in the first image and the second image may be two-dimensional, The target object can be any object that needs to be segmented from a two-dimensional image. The object can be specifically determined according to the actual application scenario. The target object can be an object whose local structural features are highly correlated or similar. For example, the target object can be a two-dimensional object. The overall texture and structure characteristics in the dimensional images are relatively similar to plants, etc.

For another example, if the method provided by the embodiment of the present application is applied to a three-dimensional image segmentation scenario, the first image and the second image may be three-dimensional images, and the pixels in the first image and the second image may be three-dimensional (in this case The pixels in the first image and the second image can also be called voxels). The target object can be any object that needs to be segmented from the three-dimensional image. The specific object can also be determined according to the actual application scenario. For example, the implementation of this application For example, it can be applied to medical image segmentation scenarios. The target object can be an object whose local structural features are relatively relevant or similar. For example, the target object can be a human organ (can be called a part) that needs to be segmented in three-dimensional image data, such as the organ ( site) can be any organ such as the left ventricle.

Furthermore, the supervision data set for the above-mentioned first image can be called first supervision data. The first supervision data is used to indicate whether each pixel point in the first image belongs to the first characteristic area. In other words, The first supervision data is used to indicate whether each pixel in the first image belongs to the target object. The first supervision data may include: label information of each pixel in the first image. The mark information of each pixel in the first image is used to respectively indicate whether each pixel belongs to the target object in the first image or to the background image of the target object in the first image. In other words, the label information of each pixel point in the first image is used to indicate whether each pixel point belongs to a pixel point in the first characteristic area or belongs to an area in the first image other than the first characteristic area (i.e., the first characteristic area). The pixels in the area of the background image of the target object in the image). The background image of the target object in the first image may also be called the background image of the first feature area in the first image.

In the same way, the supervision data set for the above-mentioned second image can be called second supervision data. The second supervision data is used to indicate whether each pixel point in the second image belongs to the second feature area. In other words, the second supervision data The second supervision data is used to indicate whether each pixel in the second image belongs to the target object. The second supervision data may include: label information of each pixel in the second image. Wherein, the label information of each pixel point in the second image is used to respectively indicate whether each pixel point belongs to the target object in the second image or belongs to the background image of the target object in the second image. In other words, the label information of each pixel point in the second image is used to indicate whether each pixel point belongs to a pixel point in the second characteristic area or belongs to an area in the second image other than the second characteristic area (i.e., the second characteristic area). The pixels in the area of the background image of the target object in the image). The background image of the target object in the second image may also be called the background image of the second feature area in the second image.

In other words, the label information of any pixel is used to indicate the position between the pixel and the target object in the image. The ownership relationship can be that the pixel belongs to the target object (that is, the pixel belongs to the pixel included in the image of the target object in the image), or the ownership relationship can be that the pixel does not belong to the target object. relationship (that is, the pixel does not belong to the pixel contained in the image of the target object in the image).

It should be noted that the above-mentioned first image may be high-quality annotated sample data, and the above-mentioned second image may be low-quality annotated sample data. This may be reflected in the accuracy of the first supervision data set for the first image. (i.e., the accuracy of the labeling information of each pixel in the first image) is higher than the accuracy of the second supervision data set for the second image (i.e., the accuracy of the labeling information of each pixel in the second image), which accuracy Sex can be subjective in the sense of accuracy. For example, the first image may be sample data marked by experts, that is, the first supervision data of the first image may be marked by professionals in the technical field; and the second image may be samples marked by non-experts. The data, ie the second supervision data of the second image, may be labeled by a person in a non-technical field.

For example, if the method provided by the embodiment of the present application is applied to a medical segmentation scenario, the first image and the second image may be image data containing organs that need to be segmented, and the label information of the pixels in the first image may be Marked by professionals in the medical field, the marking information of the pixels in the second image can be marked by amateurs. Therefore, usually the accuracy of the marking information of the pixels in the first image will be higher than that of the pixels in the second image. Point labeling information.

Moreover, it should be noted that the cost of obtaining a large number of high-quality annotated samples (such as the first image) is very high, especially in the field of medical imaging that relies on expert knowledge. Therefore, in order to save the cost of obtaining samples, this paper The first image in the application can be a small amount, and the second image can be a large amount. This application can effectively utilize a small amount of high-quality annotation data (such as the first image) and a large amount of low-quality annotation data (such as the second image). Accurately train models (such as predictive neural networks).

In some embodiments, the label information of a pixel (such as the label information of a pixel in the first image or the label information of a pixel in the second image) can be recorded as 0 or 1. If the label information of a pixel is 0, It indicates that the pixel does not belong to the target object in the image. On the contrary, if the label information of a pixel is 1, it indicates that the pixel belongs to the target object in the image.

For example, if the label information of a certain pixel in the first image is 0, it can indicate that the pixel does not belong to the pixel of the first feature area in the first image. On the contrary, if the label information of a certain pixel in the first image If the information is 1, it can indicate that the pixel belongs to the pixel of the first feature area in the first image.

For another example, if the label information of a certain pixel in the second image is 0, it can indicate that the pixel does not belong to the pixel of the second feature area in the second image. On the contrary, if the label information of a certain pixel in the second image is If the label information is 1, it can indicate that the pixel belongs to the pixel of the second feature area in the second image.

Step S102, input the first image into the prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel point of the first image belongs to the first feature area.

In some embodiments, the computer device can call the prediction neural network to predict the first image, that is, predict the relationship between each pixel in the first image and the target object, and the prediction neural network can predict the relationship between the first image and the target object. The prediction result of each pixel in an image is called the first prediction result. The first prediction result includes: first predicted pixel information of each pixel in the first image, used to indicate whether the corresponding pixel belongs to the first feature area. The first predicted pixel information of each pixel may include: the probability that the pixel belongs to the target object in the first image (which may be called the target probability), and the pixel does not belong to the target object in the first image (that is, it belongs to the target object in the first image). The probability of the background image of the target object in an image (can be called the background probability), and the sum of the target probability and the background probability corresponding to any pixel in the first image can be 1.

Wherein, the process of generating the first prediction result of the first image through the prediction neural network may include: the computer device may generate the mask area of each pixel in the first image through the prediction neural network, wherein the pixel The mask area may refer to the area used to select the main features of the pixel.

Furthermore, the computer device can predict the target probability that each pixel point in the first image belongs to the target object and the background probability that it does not belong to the target object based on the characteristics of each pixel point in the corresponding mask area in the first image through the prediction neural network, In this way, the first predicted pixel information of each pixel in the first image is obtained. The first predicted pixel information of any pixel in the first image includes the target probability that the pixel belongs to the target object (such as the mask of the pixel). The probability that the image features in the mask area belong to the target object) and the background probability that the pixel does not belong to the target object but belongs to the background image of the target object (for example, the image features in the mask area of the pixel belong to the target the probability of features of the background image of the object).

In summary, the first predicted pixel information of each pixel in the first image can constitute the first prediction result.

Step S103, input the second image into the prediction neural network to obtain a second prediction result; the second prediction result includes: second prediction pixel information respectively indicating whether each pixel point of the second image belongs to the second feature area.

In the same way, the computer device can call the prediction neural network to predict the second image, that is, predict the relationship between each pixel in the second image and the target object, and can use the prediction neural network to predict the second image. The prediction result of each pixel is called the second prediction result. The second prediction result includes: second predicted pixel information indicating whether each pixel point in the second image belongs to the second feature area. The second predicted pixel information may include second predicted pixel information for each pixel point in the second image. . The second predicted pixel information of each pixel point may include: a target probability that the pixel point belongs to the target object in the second image, and that the pixel point does not belong to the target object in the second image (that is, belongs to the target object in the second image). background probability of the second image), the sum of the target probability and the background probability corresponding to any pixel in the second image can be 1.

Wherein, the process of generating the second prediction result of the second image through the prediction neural network may include: the computer device may generate the mask area of each pixel in the second image through the prediction neural network, wherein the pixel of The mask area may refer to the area used to select the main features of the pixel.

Furthermore, the computer device can predict the target probability that each pixel in the second image belongs to the target object and the background probability that it does not belong to the target object based on the characteristics of each pixel in the corresponding mask area in the second image through the prediction neural network. In this way, the second predicted pixel information of each pixel in the second image is obtained. The second predicted pixel information of any pixel in the second image includes the target probability that the pixel belongs to the target object (such as the mask of the pixel). The probability that the image features in the mask area belong to the target object) and the background probability that the pixel does not belong to the target object but belongs to the background image of the target object (for example, the image features in the mask area of the pixel belong to the target the probability of features of the background image of the object).

The second predicted pixel information of each pixel in the second image can constitute the second prediction result.

The process by which the above-mentioned prediction neural network predicts the predicted pixel information of the pixels in the first image or the second image is the same as the process by which the following auxiliary neural network predicts the predicted pixel information of the pixels in the second image.

Step S104, perform classification prediction on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixels belonging to the first classification and the pixels belonging to the second classification in the second image. .

In some embodiments, the auxiliary neural network can be used in the embodiments of the present application to determine which pixels in the second image are correctly labeled (that is, have accurate labeling information) and which pixels are incorrectly labeled (that is, have inaccurate labeling information). marking information), and then allow the predictive neural network to perform differential training on correctly marked pixels and incorrectly marked pixels in the second image. For details, please refer to the following description.

Furthermore, the pixels with correct labels in the second image predicted by the auxiliary neural network can be called pixels of the first category. The pixels of the first category include the pixels predicted by the auxiliary neural network. pixels with correct label information in the second image. Similarly, the pixels with wrong label information in the second image predicted by the auxiliary neural network can be called pixels of the second category. The pixels of the second category include the pixels predicted by the auxiliary neural network. Pixels with incorrect label information in the second image.

The process of auxiliary neural network predicting whether each pixel in the second image is correctly labeled or incorrectly labeled may include:

The computer device can call the auxiliary neural network to generate the area center feature of the second feature area (which can be understood as the object center feature of the target object in the second image) and the pixel features of each pixel point in the second image based on the second image.

The area center feature may include the target center feature of the second feature area and the background center feature of the second feature area. The target center feature is used to characterize the structural features of the target object in the second image (that is, the structural features of the image within the second feature area in the second image, such as the texture structure, color structure and edge structure of the target object in the second image) and other features), in other words, the target center feature can be used to represent the features of the target object in the second image. The background center feature is used to characterize the structural features of the background image of the target object in the second image (i.e., the second characteristic area of the second image in the second image). Structural features of the background image, such as texture structure, color structure, edge structure and other features of the background image of the target object in the second image). In other words, the background center feature can be used to represent the background image of the target object in the second image. Characteristics.

The target center feature of the target object is obtained by integrating the pixel features of the predicted pixels belonging to the target object in the second image, and the background center feature of the target object is obtained by integrating the predicted pixel features of the target object in the second image. The pixel characteristics of the pixels in the background image are obtained; the pixel characteristics of each pixel in the second image generated by the above-mentioned auxiliary neural network can be the relatively accurate (can be determined based on experimental experience) second image generated by the auxiliary neural network. The intermediate features of each pixel in the image and the pixel features of any pixel in the second image generated by the auxiliary neural network can be used to represent the structural features of the pixel. For the specific generation process of the pixel features of each pixel in the second image and the area center features of the second feature area (including target center features and background center features), please refer to the description in the corresponding embodiment of FIG. 5 below.

Furthermore, the computer device can obtain the classification prediction result for the pixels in the second image through the above-generated region center features, pixel features of each pixel in the second image, and label information of each pixel in the second image. , the classification prediction result is used to indicate pixels belonging to the first category (ie, correctly labeled pixels) and pixels belonging to the second category (ie, incorrectly labeled pixels) on the second image.

Among them, the specific process of obtaining the classification prediction results for the pixels in the second image through the regional center features, the pixel features of each pixel in the second image, and the label information of each pixel in the second image can also be found below. Figure 5 corresponds to the description in the embodiment.

From the above, it can be known that the pixels in the second image can be divided into two categories through the auxiliary neural network, one category is the pixel points of the first category, and the other category is the pixel points of the second category. Among them, the pixels of the first category include pixels with accurate label information (that is, correct labeling) in the second image predicted by the auxiliary neural network, and the pixels of the second category include the second pixels predicted by the auxiliary neural network. Pixels in the image that do not have accurate labeling information (i.e., are incorrectly labeled).

In addition, it should be noted that the embodiments of the present application can continuously iteratively train the predictive neural network through several first images and several second images. During the process of iteratively training the predictive neural network, the network parameters of the auxiliary neural network are also It will be updated iteratively by predicting the network parameters updated (that is, optimized) by the neural network. The specific principle of iteratively updating the network parameters of the auxiliary neural network by predicting the network parameters after iterative update (ie, iterative optimization) of the neural network can be found in the description in step S105 below.

Step S105: According to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized to obtain a trained prediction neural network. The trained prediction neural network is used to perform prediction on the target image. Image segmentation.

In some embodiments, the computer device may predict based on the first prediction result, the second prediction result, and the classification prediction As a result, the final prediction deviation (which can be called a prediction deviation) of the prediction pixel information of each pixel point (including each pixel point in the first image and each pixel point in the second image) of the prediction neural network is generated, and the prediction deviation is expressed by In order to represent the deviation between the predicted pixel information of each pixel predicted by the prediction neural network and the label information of the pixel, the prediction deviation can also be understood as the prediction loss of the prediction neural network.

For the specific process of generating the prediction deviation of the predicted pixel information of the pixel point by the prediction neural network, please refer to the relevant description in the corresponding embodiment of FIG. 8 below.

The computer equipment can perform network parameter optimization based on the above prediction deviation. The network parameter optimization includes any one of parameter optimization of the predictive neural network and parameter optimization of the auxiliary neural network, or a combination of both.

Among them, since the predictive neural network can be continuously iteratively updated (that is, the network parameters of the predictive neural network are iteratively optimized), each iterative update of the predictive neural network will generate the corresponding prediction deviation through the above process. The prediction deviation generated during each iterative training process of the network continuously updates the network parameters of the predictive neural network iteratively (i.e., iteratively corrects or optimizes). Until the network parameters of the predictive neural network are updated, the trained predictive neural network can be obtained. network (which can be called a predictive neural network after parameter optimization). The trained predictive neural network includes network parameters that have been corrected (that is, optimized).

In some embodiments, the completion of updating the network parameters of the predictive neural network may refer to updating the network parameters of the predictive neural network to a convergence state, or may refer to the number of iterative updates of the network parameters of the predictive neural network (ie, iterative training). times) reaches a certain number threshold, which can be set according to the actual application scenario.

Please refer to Figure 4. Figure 4 is a schematic diagram of a network training scenario provided by an embodiment of the present application. As shown in Figure 4, the first image contains multiple pixels, the first image has first supervision data, and the first supervision data includes the label information of each pixel in the first image; similarly, in the second image Also containing a plurality of pixels, the second image has second supervision data, and the second supervision data includes label information of each pixel in the second image.

The computer device can call the auxiliary neural network to generate a classification prediction result for each pixel point in the second image. The classification prediction result includes a classification result for each pixel point in the second image. The classification result of any pixel point in the second image indicates It determines whether the marking information of the pixel is correctly marked or incorrectly marked, that is, whether the pixel belongs to the first category or the second category.

Therefore, the correctly marked pixels in the second image can be regarded as the pixels of the first category, and the incorrectly marked pixels in the second image can be regarded as the pixels of the second category. Therefore, the prediction neural network can perform differential training on the pixels in the first image, the pixels of the first category in the second image, and the pixels of the second category in the second image, thereby obtaining a trained prediction neural network. Moreover, during the training process of the predictive neural network, the network parameters of the predictive neural network will also be passed to the auxiliary neural network, so that during the training process of the predictive neural network, the auxiliary neural network that continuously optimizes parameters can also The classification results of each pixel in the second image can be determined more accurately by classifying the second image More accurate determination of the classification results of each pixel in the image can also enable more accurate training of the predictive neural network.

From the above, we can know that in the process of iterative optimization of the network parameters of the predictive neural network, the network parameters of the auxiliary neural network will also be iteratively updated through the network parameters of the predictive neural network. The predictive neural network can be understood as a student network (i.e. student model), the auxiliary neural network can be understood as a teacher network (i.e. teacher model). This application can use a design similar to the Mean-Teacher (MT) architecture to update the network parameters of the auxiliary neural network by predicting the network parameters of the neural network, because the weighted average self-integration strategy of MT can effectively improve the intermediate feature representation and final The stability and smoothness of the prediction, which is very suitable for the labeling separation strategy based on the feature prototype (the above-mentioned target center feature can characterize the feature prototype of the target object) (that is, distinguishing whether the labeling information of each pixel in the second image is correctly labeled or incorrectly labeled) ), because this can obtain a more stable and smooth feature space (such as a feature space composed of the pixel features of each pixel in the second image), the process can be shown as the following formula (1):

Among them, t and t-1 both represent the number of iterative trainings of the predictive neural network (which can also be understood as the number of iterative optimizations of the network parameters of the auxiliary neural network), t represents the t-th iterative training, and t-1 represents the t-th iterative training. t-1 iterations of training. Represents the network parameters of the auxiliary neural network after the t-th iteration training, represents the network parameters of the auxiliary neural network after the t-1 iteration of training, and θ _t represents the network parameters of the predictive neural network after the t-th iteration of training.

More, α represents the EMA (exponential moving average) decay rate, and α can be set to 0.99. By using the exponential moving average decay rate, the prediction results of the feature and network can be smoothed in the adjacent iterative training process, thus It is conducive to the generation of feature prototypes of target objects (such as target center features) to achieve robust annotation separation.

Through the above process, it can be understood that in the process of continuous iterative updating of the network parameters of the predictive neural network, the network parameters of the auxiliary neural network can also be iteratively updated through the network parameters after each iterative update of the predictive neural network, so that the network parameters of the auxiliary neural network can be updated The features learned by the predictive neural network from the first image and the second image are passed to the auxiliary neural network, and then through the auxiliary neural network, the classification results of the pixels in the second image can be more accurately classified during each iterative training process. judgment.

In the embodiment of the present application, the trained predictive neural network can be used to segment the target object in the image data. For example, the trained predictive neural network can identify the pixels belonging to the target object in the image. The pixels of the target object can be used to segment the image area where the target object is located (i.e., the feature area) from the image.

For example, the computer device can obtain a target image. The target image can include a target object. The image area where the target object is located in the target image can be called a target feature area. The target image can be any object that needs to be segmented. image.

Therefore, the computer device can call a trained predictive neural network (i.e., a predictive neural network with optimized parameters) to predict the target image, that is, predict the relationship between each pixel in the target image and the target object, and obtain the target image. Image target prediction results.

The target prediction result includes: a target prediction result used to determine whether each pixel point in the target image belongs to the target feature area. The target prediction result includes the predicted pixel information of each pixel point in the target image. The predicted pixel information of any pixel in the target image may include the target probability that the pixel belongs to the target object in the target image and the background probability that the pixel does not belong to the target object in the target image. The background probability is the pixel. The probability of a background image belonging to the target object in the target image.

Therefore, the computer device can regard the pixels in the target image whose corresponding target probability is greater than the background probability as the identified pixels belonging to the target object (that is, the identified pixels belonging to the target feature area). Therefore, the identified pixels can be The pixel points belonging to the target object are segmented from the target image, that is, the segmentation of the target object in the target image is achieved, and the image segmentation of the target feature area in the target image is also achieved.

Among them, in some scenarios, high-quality annotation data (such as the first image) is difficult to obtain and usually requires experts to annotate it, while low-quality annotation data (such as the second image) is relatively easy to obtain. Therefore, high-quality annotation data (such as the second image) is relatively easy to obtain. The amount of labeled data is very small, and the labeled data of low quality is very large. Therefore, the problem of inaccurate model (network) training in this scenario can be solved by using the method provided by the embodiments of this application. The method provided by the application embodiment can separate and learn samples in a mixed sample composed of a small amount of high-quality annotated data and a large amount of low-quality annotated data, thereby accurately learning the accurate characteristics of the sample, and then training to obtain an accurate model. (such as predictive neural network).

In the embodiment of the present application, since the same type of segmentation area (that is, the area where the target object is located) often has highly correlated structural features, low-quality images can be detected through the feature prototype of the target object (such as the target center feature). The classification results of the pixels in the second image are accurately judged. Subsequently, the prediction neural network is trained differentially through the pixels of different categories in the second image, and combined with the high-quality first image as supervised training data, they are trained together. The predictive neural network can improve the training effect of the predictive neural network, and then train a more accurate predictive neural network. Through the trained accurate predictive neural network, it can also achieve accurate segmentation of target objects in the image.

Embodiments of the present application can acquire a first image with a first characteristic area and a second image with a second characteristic area; predict the first image through a prediction neural network to obtain a first prediction result; the first prediction result includes: indication First predicted pixel information of whether each pixel of the first image belongs to the first feature area; predicting the second image through a prediction neural network to obtain a second prediction result; the second prediction result includes: indicating each pixel of the second image The second predicted pixel information of whether the point belongs to the second feature area; classify and predict the pixel points of the second image through the auxiliary neural network to obtain the classification prediction result; the classification prediction result is used to indicate that the pixels belonging to the first classification on the second image Pixel points, and pixel points belonging to the second category; perform network parameter optimization based on the first prediction result, the second prediction result, and the classification prediction result. The network parameter optimization includes parameter optimization of the prediction neural network, and auxiliary neural network Perform any one or a combination of both parameter optimizations. It can be seen that the method proposed in the embodiment of the present application can use the auxiliary neural network to analyze the second image Classify the pixels, and then use the auxiliary neural network to classify the pixels in the second image to optimize the parameters of the predictive neural network or the auxiliary neural network. This can improve the accuracy of parameter optimization of the predictive neural network. Subsequently, the predictive neural network with optimized parameters can also accurately segment the feature areas in the image.

Please refer to Figure 5. Figure 5 is a schematic flowchart of determining a classification prediction result provided by an embodiment of the present application. The execution subject in the embodiment of the present application may be the same as the execution subject in Figure 3 above. As shown in Figure 5, the method may include:

Step S201: Generate regional center features of the second feature region and pixel features of each pixel in the second image based on the second image through an auxiliary neural network.

In some embodiments, the computer device can input the second image into the auxiliary neural network to perform feature learning on the second image, and thereby generate pixel features for each pixel in the second image.

In some embodiments, the auxiliary neural network may include multiple convolutional layers for feature learning on the second image. Therefore, the pixel feature of each pixel may be the penultimate among the multiple convolutional layers. The smooth features of each pixel generated by the convolution layer, because experiments have proven that the smooth features of the pixels generated by the penultimate convolution layer are better features.

Furthermore, the computer device can also predict the mask area of each pixel in the second image through an auxiliary neural network. The mask area is used to select the main features of each pixel in the second image. The computer device can also generate a prediction accuracy index of the mask area of each pixel in the second image. The prediction accuracy index reflects the uncertainty of the mask area of each pixel in the generated second image. As the name suggests, the prediction accuracy index The prediction accuracy index of the mask area of any pixel in the image represents the accuracy of the mask area of the pixel.

In some embodiments, this application can perform Bayesian approximation through Monte Carlo dropout (Monte Carlo) to generate the prediction accuracy index of the mask area of each pixel in the second image. The process can be:

Computer equipment can randomly drop (i.e., dropout) K times the network parameters of the auxiliary neural network (which can be called neurons), and then K deformed networks of the auxiliary neural network can be obtained. K is a positive integer, and the specific value of K is It can be determined according to the actual application scenario. Any discarding of network parameters is performed on the auxiliary neural network with complete network parameters. Any deformation network is obtained by randomly discarding the network parameters of the auxiliary neural network. For the auxiliary neural network, Randomly discarding the network parameters of the neural network can mean randomly setting some network parameters of the auxiliary neural network to 0. The network parameters set to 0 are also the discarded network parameters. The network parameters set to 0 will not play a role in the subsequent prediction process. .

Among them, it can be understood that here, the network parameters of the auxiliary neural network are randomly discarded to obtain the deformation network of the auxiliary neural network, mainly for the subsequent generation of the prediction accuracy index of the mask area of pixels through the deformation network, and The above-mentioned pixel features of each pixel and the mask area of each pixel are predicted by the auxiliary neural network without discarding network parameters.

Among them, any pixel in the second image can be represented as a target pixel. Since the mask of each pixel is obtained, The process of predicting the accuracy index of a region is the same. Therefore, obtaining the prediction accuracy index of the mask region of the target pixel is used as an example for explanation.

Therefore, the computer device can separately predict the predicted pixel information for the target pixel point according to the mask area of the target pixel point through each deformation network (ie, obtain the prediction based on the image features at the mask area of the target pixel point in the second image). Pixel information), the predicted pixel information of a pixel predicted by the deformation network can be called deformation prediction pixel information, and any deformation network can predict a deformation prediction pixel information for the target pixel. The above process can be understood as performing K times of forward random inference on the target pixel, and letting K deformation networks perform K times of softmax (logistic regression) prediction on the target pixel to obtain the predicted pixel information of each deformation of the target pixel.

Wherein, any deformation prediction pixel information may include the target probability that the target pixel predicted by the corresponding deformation network belongs to the target in the second feature area in the second image (can be called the first prediction probability, that is, the target pixel in the second image belongs to target probability of the object), and includes the background probability that the target pixel predicted by the corresponding deformation network does not belong to the second feature area in the second image (can be called the second prediction probability, that is, the target pixel that belongs to the target object in the second image probability of pixels in the background image). Wherein, the sum of the first prediction probability and the second prediction probability may be 1. The background image of the target object in the second image refers to the image in the second image other than the image of the target object.

Therefore, the computer device can determine the prediction accuracy index of the mask area of the target pixel point based on the K deformation prediction pixel information obtained by the above K deformation networks. For details, please refer to the following description.

Since any deformation prediction pixel information includes a first prediction probability, the K deformation prediction pixel information includes a total of K first prediction probabilities, and the computer device can obtain the standard deviation between the K first prediction probabilities, and The standard deviation serves as the target prediction accuracy index for the target pixel, and the target prediction accuracy index also indicates the accuracy of predicting the target pixel as belonging to the target object.

Furthermore, any deformation prediction pixel information includes a second prediction probability. Therefore, the K deformation prediction pixel information includes a total of K second prediction probabilities, and the computer device can obtain the standard deviation between the K second prediction probabilities. , the standard deviation is used as the background prediction accuracy index for the target pixel. The background prediction accuracy index indicates the accuracy of predicting the target pixel as a background image belonging to the target object.

Therefore, both the target prediction accuracy index and the background prediction accuracy index for the target pixel can be used as the prediction accuracy index of the mask area of the target pixel. The computer device can obtain the prediction accuracy index of the mask area of each pixel in the second image in the same manner as the prediction accuracy index of the mask area of the target pixel.

Furthermore, the computer equipment can also use the above-mentioned auxiliary neural network (the auxiliary neural network here refers to a network with complete network parameters, and the above-mentioned K deformed networks can be obtained by randomly discarding the network parameters of the auxiliary neural network) according to The generated mask area of each pixel point in the second image is predicted to obtain a third prediction result for the second image. The third prediction result may include: third prediction pixel information of each pixel point of the second image, The third of each pixel The predicted pixel information indicates whether the pixel belongs to the second feature area.

The third predicted pixel information of each pixel includes the probability that the pixel belongs to the second feature area in the second image predicted by the auxiliary neural network (which can be called the target probability), and includes the probability of the pixel predicted by the auxiliary neural network. The probability of not belonging to the second feature area in the second image (can be called background probability).

Therefore, the computer device can generate the target object based on the pixel characteristics of each pixel in the second image generated by the auxiliary neural network, the prediction accuracy index of the mask area of each pixel in the second image, and the third prediction result. Regional center characteristics, the process is described below.

The computer device can obtain, from several pixels contained in the second image, pixels whose prediction accuracy index of the corresponding mask area is greater than an index threshold (the index threshold can be set according to the actual application scenario) as evaluation pixels, and the evaluation pixels Points refer to pixels in the second image whose prediction accuracy index of the corresponding mask area is greater than the index threshold. The number of evaluation pixels may be at least one.

Among them, the target pixel is also used as an example to illustrate the evaluation pixel. Since the prediction accuracy index of the mask area corresponding to the target pixel includes the target prediction accuracy index and the background prediction accuracy index, therefore, the mask of the target pixel The prediction accuracy index of the area is greater than the index threshold, which can mean that the target prediction accuracy index and the background prediction accuracy index of the mask area of the target pixel are greater than the index threshold, that is, when the target prediction accuracy index of the mask area of the target pixel is greater than When the index threshold is reached and the background prediction accuracy index is also greater than the index threshold, the target pixel can be used as an evaluation pixel. Through this principle, the computer device can obtain several pixels that can be used as evaluation pixels among the pixels contained in the second image, thereby obtaining at least one evaluation pixel.

Therefore, the computer device can generate the regional center feature of the target object based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point:

For each evaluation pixel point in the at least one evaluation pixel point, if the third prediction pixel information of the evaluation pixel point indicates that it belongs to the second feature area (that is, the target probability in the third prediction pixel information is greater than the background probability), then the evaluation pixel point is The pixel is used as the target evaluation pixel of the second feature area (that is, the target evaluation pixel of the target object), that is, the target evaluation pixel is the target object in the second image predicted by the auxiliary neural network and corresponds to the mask. Pixels whose prediction accuracy index of the area is greater than the index threshold.

Similarly, for each evaluation pixel point in at least one evaluation pixel point, if the third prediction pixel information of the evaluation pixel point indicates that it does not belong to the second feature area (that is, the target probability in the third prediction pixel information is less than the background probability) , then the evaluation pixel point is used as the background evaluation pixel point of the second feature area (that is, the background evaluation pixel point of the target object), that is, the background evaluation pixel point is the second image predicted by the auxiliary neural network and does not belong to the target. Objects and pixels whose prediction accuracy index of the corresponding mask area is greater than the index threshold.

Therefore, the computer device can be based on the pixel characteristics of the target evaluation pixel and the third predicted image of the target evaluation pixel. The target probability in the voxel information (that is, the probability of belonging to the target object) is used to generate the target center feature of the second feature area. The target center feature is used to represent the structural features of the target object in the second image, that is, it is used to represent the structural features of the target object in the second image. Structural features of the image in the second feature area.

Among them, the target center feature can be recorded as q ^obj . Therefore, as shown in the following formula (2), the target center feature q ^obj can be:

It should be noted that usually the label information of each pixel in the second image (0 or 1, 0 means not belonging to the target object, 1 means belonging to the target object) is represented by the same label vector, and in the auxiliary neural The pixel features of each pixel in the second image generated by the network can also be included in the same feature matrix. One row in the feature matrix can represent the pixel features of a pixel. Therefore, the auxiliary neural network generates the target The central feature can be generated based on the operation of the feature matrix and the label vector. The dimensions of the pixel features of each pixel generated by the auxiliary neural network are usually different from the dimensions of the above-mentioned label vector. Therefore, after the target evaluation When calculating the target center feature from the pixel features of the pixels, the pixel features of the target evaluation pixels can be upsampled through a linear interpolation method (if the pixels are three-dimensional, the linear interpolation method can be a trilinear interpolation method) to convert the target The dimension of the pixel feature of the evaluated pixel is raised to the same dimension as the above label vector.

Therefore, A represents the total number of all target evaluation pixels, a is less than or equal to A, e _a represents the dimension of the pixel feature of the a-th target evaluation pixel generated by the auxiliary neural network after it is raised to the same dimension as the above label vector The obtained pixel characteristics (that is, the pixel characteristics of the a-th target evaluation pixel after the dimension is increased), Indicates the probability that the a-th target evaluation pixel point in the third predicted pixel information of the a-th target evaluation pixel point belongs to the target object in the second image (i.e., target probability), by introducing each target evaluation pixel when obtaining the target center feature The target probability that a point belongs to the target object can reflect the different contributions of each target evaluation pixel to the target center feature. That is, the greater the target probability of the target evaluation pixel, the greater the contribution weight of the target evaluation pixel to generating the target center feature.

In the same way, the computer device can generate the background center feature of the second feature area based on the pixel characteristics of the background assessment pixel and the background probability in the third predicted pixel information of the background assessment pixel (that is, the probability of not belonging to the target object). The background center feature is used to represent the structural features of the background image of the target object in the second image, that is, it is used to represent the structural features of the image in the second image except the image of the second feature area.

Among them, the background center feature can be recorded as q ^bg . Therefore, as shown in the following formula (3), the target center feature q ^bg can be:

Among them, B represents the total number of all background evaluation pixels, and b is less than or equal to B. Similarly, e _b represents the dimension of the pixel feature of the bth background evaluation pixel generated by the auxiliary neural network to be increased to the same as the above label vector. The pixel features obtained after the dimensions are the same (that is, the pixel features of the b-th background evaluation pixel after the dimension is increased), Indicates the probability that the b-th background evaluation pixel in the third predicted pixel information of the b-th background evaluation pixel belongs to the background image of the target object in the second image (i.e., background probability), by introducing each background when obtaining the background center feature The background probability that the assessment pixel belongs to the background image of the target object can reflect the different contributions of each background assessment pixel to the background center feature. That is, the greater the background probability of the background assessment pixel, the contribution of the background assessment pixel to generating the background center feature. The weight is also greater.

Therefore, the computer device can use the above-mentioned target center feature (which can be understood as the target center feature of the target object in the second image) and the background center feature (which can be understood as the background center feature of the target object in the second image) as the second feature. The region center feature of the region can be understood as the object center feature of the target object.

Step S202: Determine the classification prediction result based on the regional center feature, the pixel feature of each pixel in the second image, and the second supervision data.

In some embodiments, the computer device can use the generated regional center features of the target object, the pixel features of each pixel in the second image, and the label information of each pixel in the second image (i.e., the second supervisory data). the label information of each pixel in the second image) to obtain the classification result of each pixel in the second image. The classification result of any pixel in the second image can be the result of the pixel belonging to the first category (that is, the pixel The labeling information of the pixel is correctly labeled) or the pixel belongs to the second category of pixels (that is, the labeling information of the pixel is incorrectly labeled).

Since the process of determining the classification result of each pixel in the second image is the same, the description here is still based on determining the classification result of the target pixel (any pixel in the second image) as an example.

The computer device can obtain the characteristic distance between the pixel feature of the target pixel point and the target center feature, which can be called the first feature distance; the computer device can also obtain the pixel feature of the target pixel point and the background center The characteristic distance between features can be called the second characteristic distance.

Among them, the above first characteristic distance can be recorded as The above second characteristic distance is recorded as Therefore, as shown in the following formula, the first characteristic distance and the second feature distance Can be:

Among them, in the same way as above, _em can also represent the pixel features obtained after the dimension of the pixel feature of the target pixel generated by the auxiliary neural network is raised to the same dimension as the above-mentioned label vector (that is, the target pixel after the dimension is raised) pixel features). q ^obj represents the above-mentioned target center feature, and q ^bg represents the above-mentioned background center feature. ‖·‖ ₂ represents the second norm.

Therefore, if the first feature distance is greater than the second feature distance (indicating that the target pixel is more likely to belong to the target pixel in the second image background image of the target object), and the label information of the target pixel point in the second supervision data is used to indicate that the target pixel point belongs to the target object in the second image (that is, a pixel point belonging to the second feature area), then it can be determined The classification result of the target pixel is used to indicate that the target pixel belongs to the second category.

If the first feature distance is greater than the second feature distance (indicating that the target pixel is more inclined to belong to the background image of the target object in the second image), and the label information of the target pixel in the second supervision data is used to indicate that the target pixel does not belong to The target object in the second image (that is, the pixel point that does not belong to the second feature area, that is, the background image of the target object in the second image), then it can be determined that the classification result of the target pixel point is used to indicate the target pixel point It belongs to the first category.

If the first feature distance is smaller than the second feature distance (indicating that the target pixel is more likely to belong to the target object in the second image), and the label information of the target pixel in the second supervision data is used to indicate that the target pixel belongs to the second image of the target object, it can be determined that the classification result of the target pixel is used to indicate that the target pixel belongs to the first category.

If the first feature distance is smaller than the second feature distance (indicating that the target pixel is more likely to belong to the target object in the second image), and the label information of the target pixel in the second supervision data is used to indicate that the target pixel does not belong to the second image The target object in the second image (that is, the background image belonging to the target object in the second image), then it can be determined that the classification result of the target pixel point is used to indicate that the target pixel point belongs to the second classification.

In summary, it can be understood that if the pixel feature of the target pixel points is biased towards the feature type (such as the feature type of the target object or the feature type of the background image of the target object) and the feature type indicated by the label information of the target pixel point (such as If the feature type of the target object or the feature type of the background image of the target object) are inconsistent (for example, one is the feature type of the target object and the other is the feature type of the background image of the target object), it can be considered that the marking information of the target pixel is wrong. Marked, that is, the classification result of the target pixel is used to indicate that the target pixel belongs to the second category in the second image; conversely, if the pixel features of the target pixel tend to be of a feature type that is consistent with the features indicated by the marking information of the target pixel The types are consistent (for example, both are feature types of the target object, or both are feature types of the background image of the target object), then it can be considered that the marking information of the target pixel is correctly marked, that is, the classification result of the target pixel Used to indicate that the target pixel belongs to the first category in the second image.

The classification prediction result for the second image can be obtained through the classification result of each pixel point in the second image, and the classification prediction result includes the classification result of each pixel point in the second image.

Please refer to FIG. 6 , which is a schematic diagram of a scenario for determining a classification result provided by an embodiment of the present application. As shown in Figure 6, the pixel points in the second image may include pixel point 1 to pixel point W, where W is a positive integer, and the specific value of W is determined according to the actual application scenario. The computer device can obtain the feature distance between the pixel feature of each pixel point in the second image and the target center feature, including the feature distance between the pixel feature of pixel point 1 and the target center feature (i.e., the first feature distance 1), The characteristic distance between the pixel feature of pixel point 2 and the target center feature (i.e., the first feature distance 2), the feature distance between the pixel feature of pixel point 3 and the target center feature (i.e., the first feature distance 3), ... and Pixel characteristics and purpose of pixel point W The characteristic distance between the mark center features (i.e. the first feature distance W), and the feature distance between each pixel point and the background center feature can be obtained, including the feature distance between the pixel feature of pixel 1 and the background center feature (i.e., the second feature distance 1), the feature distance between the pixel feature of pixel point 2 and the background center feature (i.e., the second feature distance 2), the feature distance between the pixel feature of pixel point 3 and the background center feature (i.e. The second feature distance 3), ... and the feature distance between the pixel feature of the pixel point W and the background center feature (ie, the second feature distance W).

Therefore, the computer device can obtain the classification result of the pixel point 1 according to the first characteristic distance 1, the label information and the second characteristic distance 1 of the pixel point 1, and can obtain the classification result of the pixel point 1 according to the first characteristic distance 2, the label information and the second characteristic distance 1 of the pixel point 2. The feature distance 2 obtains the classification result of pixel 2, and the classification result of pixel 3 can be obtained based on the first feature distance 3, label information and second feature distance 3 of pixel 3,..., and can be based on the first feature distance 3 of pixel W. The first feature distance W, the label information and the second feature distance W obtain the classification result of the pixel point W.

Through the above process, for scenes where the structural features between various local areas in the segmentation area (such as areas belonging to each pixel point in the image area where the target object is located) are highly correlated (such as highly similar) and the noise tolerance is high, this application passes The regional center feature can accurately determine the classification results of each pixel in the second image.

The embodiments of this application focus on using the feature prototype (which can be reflected by the regional center feature) to train more robust characteristics of noise labeling to perform label separation with the assistance of the mean teacher model (that is, the above-mentioned auxiliary neural network). V-Net (an image segmentation network), U-Net (a semantic segmentation network), DenseNet (a dense connection network), or ResNet (a residual network) can be used in the model framework of the embodiment of the present application. Wait for the network to be trained and predicted.

Please refer to FIG. 7 , which is a schematic flowchart of determining a prediction deviation provided by an embodiment of the present application. The execution subject in the embodiment of the present application may be the same as the execution subject in Figure 3 above. As shown in Figure 7, the method may include:

Step S301: Generate a first prediction deviation of the prediction neural network based on the first prediction result and the first supervision data of the first image.

In some embodiments, the computer device may generate a prediction neural network for the first image based on the first predicted pixel information of each pixel point of the first image in the first prediction result and the label information of each pixel point in the first supervision data. The cross entropy loss and the image segmentation loss (Dice loss) of the image are used to obtain the prediction loss of the prediction neural network for the first image through the cross entropy loss and the image segmentation loss. This prediction loss can be called the first prediction deviation.

The cross-entropy loss of the prediction neural network for the first image can be recorded as L _s1 , as shown in the following formula (6), the cross-entropy loss L _s1 is:

Among them, y _true.i represents the label information of the i-th pixel in the first image, that is, y _true.i represents the true label of the i-th pixel, i is less than or equal to N, and N can be all pixels in the first image. total quantity. If the label information of the i-th pixel is To truly represent that the i-th pixel belongs to the target object in the first image, then y _true.i can be equal to 1, otherwise, that is, if the label information of the i-th pixel is used to truly represent that the i-th pixel does not belong to the For a target object in an image, y _true.i can be equal to 0. y _pred.i represents the probability that the i-th pixel in the first predicted pixel information of the i-th pixel predicted by the prediction neural network belongs to the target object (i.e., target probability).

The image segmentation loss of the prediction neural network for the first image can be recorded as L _Dice1 , as shown in the following formula (7), the image segmentation loss L _Dice1 is:

The same as above, here y _true.i represents the label information of the i-th pixel in the first image, y _true.i is 1 or 0, and y _pred.i represents the first predicted pixel information of the i-th pixel. The probability that the i-th pixel belongs to the target object (i.e., target probability).

Therefore, the first prediction deviation of the above prediction neural network can be recorded as L _HQ . As shown in the following formula (8), the first prediction deviation L _HQ is the sum of the cross entropy loss L _s1 and the image segmentation loss L _Dice1 :
L _HQ ＝L _s1 +L _Dice1 (8)

Among them, since the pixels in the first image have accurate label information, the first prediction deviation L _HQ of the prediction neural network for the first image can play a role in forward supervision training of the prediction neural network.

Step S302: Generate a second prediction deviation of the prediction neural network based on the second prediction pixel information of the pixels belonging to the first category in the second prediction image and the second supervision data of the second image.

In some embodiments, the marking information of the first classified pixels in the second supervision data can be called the first marking information, that is, the first marking information includes: the preset second supervision data for marking Marking information of whether the pixels belonging to the first category on the second image belong to the second feature area. The computer device may generate a prediction loss of the prediction neural network for the pixels of the first classification based on the second predicted pixel information of the pixels belonging to the first classification in the second image and the first label information, and the prediction loss may be referred to as is the second prediction bias.

Specifically, the computer device may generate the intersection of the prediction neural network for the pixels of the first category based on the second predicted pixel information of each pixel of the first category in the second image and the label information of each pixel of the first category. Entropy loss and image segmentation loss (Dice loss), and then through the cross entropy loss and image segmentation loss, the second prediction deviation of the prediction neural network for the pixels of the first classification is obtained.

The cross-entropy loss of the prediction neural network for the first classification pixels can be recorded as L _s2 , as shown in the following formula (9), the cross-entropy loss L _s2 is:

Among them, y _true.j represents the label information of the j-th pixel in the first category of pixels, that is, y _true.j represents the j-th pixel. The true label of the point, j is less than or equal to M, and M can be the total number of all pixels belonging to the first category. If the label information of the j-th pixel is used to truly represent that the j-th pixel belongs to the target object in the second image, then y _true.j can be equal to 1, otherwise, that is, if the label information of the j-th pixel is used True means that the j-th pixel does not belong to the target object in the second image, then y _true.j can be equal to 0. y _pred.j represents the probability that the j-th pixel in the second predicted pixel information of the j-th pixel belongs to the target object (ie, target probability).

The image segmentation loss of the predictive neural network for the first category pixels can be recorded as L _Dice2 . As shown in the following formula (10), the image segmentation loss L _Dice2 is:

The same as above, here y _true.j represents the label information of the j-th pixel, y _true.j is 1 or 0, and y _pred.j represents the j-th pixel in the second predicted pixel information of the j-th pixel. The probability that a point belongs to the target object (i.e., target probability).

Therefore, the second prediction deviation of the above prediction neural network can be recorded as L _ls , as shown in the following formula (11). The second prediction deviation L _ls is the sum of the cross entropy loss L _s2 and the image segmentation loss L _Dice2 :
L _ls ＝L _s2 +L _Dice2 (11)

The second prediction deviation L _ls obtained above is the prediction loss of the prediction neural network for correctly marked pixels (that is, pixels of the first category).

Step S303: Generate a third prediction deviation of the prediction neural network based on the second prediction pixel information of the pixels belonging to the second category in the second image and the second supervision data of the second image.

In some embodiments, the labeling information of the second classification of pixels in the second supervision data can be called the second labeling information, that is, the second labeling information includes: the preset second supervision data for labeling. Marking information of whether the pixels belonging to the second category on the second image belong to the second feature area. The computer device may generate a prediction loss of the prediction neural network for the pixels of the second classification based on the second predicted pixel information and the second label information of the pixels belonging to the second classification in the second image, and the prediction loss may be referred to as is the third prediction bias.

In the embodiment of the present application, since the pixels of the second classification are predicted mislabeled pixels, the third prediction deviation of the prediction neural network for the pixels of the second classification can be obtained by entropy minimization loss, that is, The training effect of the pixel points of the second classification on the prediction neural network can be used as a third prediction deviation with smaller influence (smaller entropy).

The third prediction deviation can be recorded as L _ent , as shown in the following formula (12). The third prediction deviation L _ent can be:

Among them, F _obj.g represents the probability that the g-th pixel in the second predicted pixel information of the second classification pixel belongs to the target object in the second image (i.e., target probability), F _{bg .g} represents the second prediction of the g-th pixel The probability that the g-th pixel in the pixel information does not belong to the target object in the second image (ie, background probability), G is the total number of all pixels in the second category, and g is less than or equal to G.

From the above, it can be seen that the methods of calculating the second prediction deviation and the third prediction deviation are different, thereby achieving the purpose of differentially training the pixels of the first category and the pixels of the second category in the second image.

Step S304: Generate the prediction deviation of the prediction neural network based on the first prediction deviation, the second prediction deviation and the third prediction deviation.

In some embodiments, the computer device can generate the final prediction loss of the prediction neural network based on the first prediction deviation, the second prediction deviation and the third prediction deviation obtained above, and the prediction loss is also the prediction deviation of the prediction neural network (i.e. Predict the final prediction bias of the neural network). The prediction deviation refers to the deviation of the prediction neural network's predicted pixel information for pixels (including pixels of the first image and pixels of the second image).

Wherein, the computer device can obtain the prediction neural network for the second image based on the second prediction deviation of the prediction neural network for the pixels of the first category and the third prediction deviation of the prediction neural network for the pixels of the second category obtained above. The final prediction loss can be called the comprehensive prediction deviation of the prediction neural network for the second image.

The comprehensive prediction deviation can be recorded as L _LQ . As shown in the following formula (13), the comprehensive prediction deviation L _LQ can be the sum of the second prediction deviation L _ls and the third prediction deviation L _ent :
L _LQ ＝L _ls +L _ent (13)

Furthermore, the computer device also obtains the weighting coefficient for the comprehensive prediction deviation, and then can weight the comprehensive prediction deviation according to the weighting coefficient to obtain the weighted comprehensive prediction deviation, and then the computer device can calculate the weighted coefficient according to the first prediction deviation and the weighted comprehensive prediction deviation. The comprehensive prediction deviation generates the final prediction deviation of the prediction neural network (that is, the final prediction loss value).

Among them, the weighted coefficient of the comprehensive prediction deviation can be composed of a Gaussian function that slopes as the training time (number of times) increases. Since the prediction neural network can be trained for multiple iterations, the t-th iteration of the prediction neural network can be During the training process, the weighting coefficient of the comprehensive prediction deviation is recorded as λ(t). As shown in the following formula (14), the weighting coefficient λ(t) can be:

Among them, t _max represents the preset maximum number of iterative training times for the prediction neural network, which can be called the maximum number of iterations, and e represents a natural constant.

For example, in the embodiment of the present application, the method of obtaining the weighting coefficient for the comprehensive prediction deviation in the current iterative training process of the predictive neural network can be: the computer can obtain the current iterative correction of the network parameters of the predictive neural network (i.e., the current iteration training) (can be called the current iteration number), and can obtain the preset prediction neural network The maximum number of iterations for iteratively correcting the network parameters of the network, and then the computer device can substitute the current number of iterations into t in the above formula (14), and substitute the maximum number of iterations into t _max in the above formula (14), that is The weighting coefficient for the comprehensive prediction deviation in the current iterative training process can be obtained.

Therefore, the prediction loss (i.e., prediction deviation) of the prediction neural network can be recorded as L _z . As shown in the following formula (15), the prediction deviation L _z can be:
L _z =L _HQ +λ(t)L _LQ (15)

Among them, it can be understood that there will be a corresponding prediction loss in each iterative training process of the predictive neural network. If the current iteration is the t-th iterative training, then L _HQ in formula (14) is the t-th iterative training process. The first prediction deviation, L _LQ is the comprehensive prediction deviation in the t-th iterative training process, and the obtained L _z is the prediction loss in the t-th iterative training process. The network parameters of the predictive neural network can be iteratively optimized through the prediction deviation L _z obtained during each training process to obtain the predictive neural network with final parameter optimization completed (ie, the trained predictive neural network).

Through the above process, it can be understood that the smaller the number of iterative training times t, the smaller the weighted coefficient of the comprehensive prediction deviation will be. On the contrary, the larger the number of iterative training times t, the larger the weighted coefficient of the comprehensive prediction deviation will be. This is to reduce the training interference of the second image on the predictive neural network when training the predictive neural network at the beginning (for example, when the number of iterative training times t is relatively small). As the number of iterative training times t increases, the predictive neural network The network becomes more and more accurate, so it can have a larger weighting coefficient to increase the training effect of the second image on the predictive neural network, which can improve the training accuracy of the predictive neural network.

Please refer to Figure 8, which is a schematic diagram of a model training scenario provided by an embodiment of the present application. As shown in FIG. 8, the computer device may generate a first prediction deviation based on the first image through a prediction neural network. The computer device may also use an auxiliary neural network to label and separate the pixels in the second image, that is, to distinguish the pixels in the second image into pixels of the first category and pixels of the second category.

Furthermore, the computer device can generate a second prediction deviation based on the pixel points of the first classification through a prediction neural network, and generate a third prediction deviation based on the pixel points of the second classification. The computer device can generate a comprehensive prediction deviation for the second image based on the second prediction deviation and the third prediction deviation, and can weight the comprehensive prediction deviation according to the weighting coefficient to obtain a weighted comprehensive prediction deviation.

Finally, the computer device can obtain the final prediction loss of the prediction neural network (i.e., the above-mentioned prediction deviation or prediction deviation) based on the above-mentioned first prediction deviation and the weighted comprehensive prediction deviation, and the prediction neural network can perform network parameter optimization based on the prediction deviation. Modify and obtain the trained predictive neural network (that is, the predictive neural network after parameter optimization).

In the embodiment of the present application, the first image with high-quality label information can be used to perform supervised training of the prediction neural network, and for the second image with low-quality label information, regardless of the predicted correctly labeled pixels (i.e., pixels of the first category) or predicted mislabeled pixels (i.e., pixels of the second category) can participate in the comparison. For the training of predictive neural network, only correctly labeled pixels can have a greater training effect on the predictive neural network, while incorrectly labeled pixels can have a smaller training effect on the predictive neural network, making full use of the second image for prediction. Neural networks are trained, and therefore, very accurate predictive neural networks can be trained.

Moreover, the embodiments of the present application can be used to perform differential learning on mixed-quality sample data (including the first image and the second image), that is, label-isolated learning of mixed-quality sample data can be implemented to fully learn the correct identity of the sample data. Features, and then train an accurate prediction neural network.

In addition, the embodiments of this application also conducted precise experiments on the method provided. The left atrium (LA) segmentation data set was used for the experiment. The left atrial segmentation data set provides 100 3D magnetic resonance images (which can be understood as the three-dimensional first image) with expert labels (which can be understood as the label information of the pixels in the first image). The resolution of the image can be 0.625*0.625*0.625m^3. All images are cropped to the center of the heart region and normalized to zero mean and unit variance. In order to simulate actual scenarios, the embodiments of this application study extreme settings and common soft settings.

Among them, under extreme settings, only 2 samples (the minimum number of HQ label batch sizes in the code implementation) are used as HQ label information (i.e. samples with high-quality label information, which can be understood as the first image), The soft setting uses 8 (10%) samples as HQ marker information. The remaining samples are treated as non-expert low-quality annotated data (i.e., samples with low-quality label information, which can be understood as second images), and these samples are processed through commonly used simulated label corruption schemes, including random erosion of 3-15 voxels. and expansion processing.

The experimental framework uses NVIDIA GeForce RTX (graphics processing platform) 3090GPU (graphics processing unit) with 24GB (computer storage unit) memory, and is implemented in Python (a computer programming language) and PyTorch (an open source machine learning library). In all experiments, the same 3D V-Net (a 3D (3-dimensional) medical image segmentation network designed based on a fully convolutional neural network) was used as the backbone for fair comparison. Additionally, the network was trained using SGD (an optimizer) (weight decay = 0.0001, momentum = 0.9). The Batch size is set to 4, including 2 high-quality annotated images (can be understood as the first image) and 2 low-quality annotated images (can be understood as the second image). The maximum training steps are all set to 8000. The learning rate is initialized to 0.01 and decays by a power of 0.9 after each step. This application randomly crops 112×112×80 voxel blocks as network input, and also applies standard data expansion, including random cropping, flipping and rotation, and uses a sliding window strategy of 18×18×4 voxel step size. for testing phase.

On this basis, the embodiment of this application uses four indicators for comprehensive evaluation, including: Dice (an image segmentation evaluation indicator), Jaccard (a data mining indicator), ASD (average surface distance), 95HD (a (evaluation index for medical image segmentation). The experimental data of the embodiments of this application in medical scenarios are shown in Table 1 below:

Table 1

Among them, the embodiment of the present application conducted two experiments. Rows 2 to 9 in Table 1 are the experimental data of one experiment, and rows 10 to 17 of Table 1 are the experimental data of another experiment. Set-HQ represents the number of high-quality annotated data used for training, and Set-LQ represents the number of low-quality annotated data used for training. "HQ-LQ separation?" indicates whether the corresponding method performs separate training on low-quality annotated data and high-quality annotated data. The higher Dice and Jaccard are, the better the effect is, while the smaller ASD and 95HD are, the better the effect is. The values in brackets in Table 1 represent the standard deviation of the indicator under the corresponding method.

H-Sup represents supervised training with only high-quality annotated data, HL-Sup represents mixed supervised training with high-quality and low-quality annotated data, and TriNet represents the use of a joint learning framework composed of three networks that integrates data from two network’s predictions to supervise the third network, 2RnT represents a two-stage method to improve annotation quality by estimating a confusion matrix for label correction, and PNL represents the introduction of an image-level label quality assessment module to identify images with clean labels To adjust the network, KDEM means that knowledge distillation technology and entropy minimization optimization terms are introduced to train the network, and Decoupled means that two separate decoders (one corresponding to high-quality annotated data and one corresponding to low-quality annotated data) are used to implicitly solve the problem coupled to train the network.

It can be seen from the above experimental data that for the two experiments conducted above, the method provided by the embodiment of the present application has the best comprehensive training effect, which also proves the superiority and robustness of the method provided by the present application.

Please refer to FIG. 9 , which is a schematic structural diagram of a data processing device provided by an embodiment of the present application. The data processing device may be a computer program (including program code) running in a computer device. For example, the data processing device may be an application software. The data processing device may be used to execute corresponding steps in the method provided by the embodiments of the present application. . As shown in FIG. 9 , the data processing device 1 may include: an acquisition module 11 , a first calling module 12 , a second calling module 13 , a classification module 14 and an optimization module 15 .

Acquisition module 11 is used to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image where the target object is located in the second image is The area is the second characteristic area;

The first processing module 12 is used to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: a first prediction pixel indicating whether each pixel of the first image belongs to the first feature area. information;

The second processing module 13 is used to input the second image into a prediction neural network to obtain a second prediction result; the second prediction result includes: a second prediction pixel indicating whether each pixel of the second image belongs to the second feature area. information;

The classification module 14 is used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points in the second image that belong to the first classification and those that belong to the second classification. pixels of pixels with incorrect label information in the second image;

The optimization module 15 is used to optimize the network parameters of the predictive neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained predictive neural network. The trained predictive neural network is For image segmentation of target images.

According to some embodiments of the present application, the steps involved in the data processing method shown in FIG. 3 may be performed by various modules in the data processing device 1 shown in FIG. 9 . For example, step S101 shown in FIG. 3 can be performed by the acquisition module 11 in FIG. 9 , step S102 shown in FIG. 3 can be performed by the first processing module 12 in FIG. 9 ; step S103 shown in FIG. 3 The step S104 shown in Figure 3 can be performed by the second processing module 13 in Figure 9 , the step S104 shown in Figure 3 can be performed by the classification module 14 in Figure 9 , and the step S105 shown in Figure 3 can be performed by the optimization module 15 in Figure 9 .

Embodiments of the present application can acquire a first image with a first characteristic area and a second image with a second characteristic area; predict the first image through a prediction neural network to obtain a first prediction result; the first prediction result includes: indication First predicted pixel information of whether each pixel of the first image belongs to the first feature area; predicting the second image through a prediction neural network to obtain a second prediction result; the second prediction result includes: indicating each pixel of the second image The second predicted pixel information of whether the point belongs to the second feature area; classify and predict the pixel points of the second image through the auxiliary neural network to obtain the classification prediction result; the classification prediction result is used to indicate that the pixels belonging to the first classification on the second image pixel points, and pixel points belonging to the second category; according to the first prediction result, the second prediction result, and the classification prediction result, the network parameters of the prediction neural network are optimized. It can be seen that the device proposed in the embodiment of the present application can classify the pixels in the second image through the auxiliary neural network, and can subsequently perform the prediction neural network on the classification results of each pixel in the second image through the auxiliary neural network. Parameter optimization, which can improve the accuracy of parameter optimization of the predictive neural network. Subsequently, the predictive neural network after parameter optimization can also accurately segment the feature areas in the image.

According to some embodiments of the present application, each module in the data processing device 1 shown in Figure 9 can be separately or entirely combined into one or several units, or some of the units can be further divided into Multiple subunits with smaller functions can implement the same operation without affecting the realization of the technical effects of the embodiments of the present application. The above modules are divided based on logical functions. In practical applications, the function of one module can also be realized by multiple units, or the functions of multiple modules can be realized by one unit. In other embodiments of the present application, the data processing device 1 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.

According to some embodiments of the present application, the method can be implemented by running on a general-purpose computer device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and other processing elements and storage elements. A computer program (including program code) capable of executing the steps involved in the corresponding method as shown in Figure 3 to construct the data processing device 1 as shown in Figure 9 and to implement the data processing method of the embodiment of the present application . The above-mentioned computer program can be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and run therein.

Please refer to FIG. 10 , which is a schematic structural diagram of a computer device provided by an embodiment of the present application. As shown in Figure 10, the computer device 1000 may include: a processor 1001, a network interface 1004 and a memory 1005. In addition, the computer device 1000 may also include: a user interface 1003, and at least one communication bus 1002. Among them, the communication bus 1002 is used to realize connection communication between these components. Among them, the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may include standard wired interfaces and wireless interfaces (such as WI-FI interfaces). The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001. As shown in Figure 10, memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in Figure 10, the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the data processing method described in the above embodiment.

It should be understood that the computer device 1000 described in the embodiment of the present application can execute the description of the above-mentioned data processing method in the embodiment corresponding to FIG. 3, and can also execute the description of the above-mentioned data processing device 1 in the embodiment corresponding to FIG. 9. , which will not be described in detail here. In addition, the description of the beneficial effects of using the same method will not be described again.

In addition, it should be pointed out here that the embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores the computer program executed by the aforementioned data processing device 1, and the computer program includes Program instructions, when the processor executes the program instructions, can execute the description of the data processing method in the embodiment corresponding to Figure 3. Therefore, the details will not be described here. In addition, the description of the beneficial effects of using the same method will not be described again. For technical details not disclosed in the computer storage medium embodiments involved in this application, please refer to the description of the method embodiments in this application.

As an example, the above program instructions may be deployed on one computer device for execution, or on multiple computer devices located at one location, or on multiple computer devices distributed at multiple locations and interconnected through a communication network. Multiple computer devices distributed in multiple locations and interconnected through communication networks can form a blockchain network.

The above-mentioned computer-readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned computer equipment, such as the hard disk or memory of the computer equipment. The computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the computer device, Flash card, etc. Further, the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application provide a computer program product or computer program. The computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the description of the above-mentioned data processing method in the corresponding embodiment of Figure 3. Therefore, the description will not be repeated here. Elaborate. In addition, the description of the beneficial effects of using the same method will not be described again. For technical details not disclosed in the computer-readable storage medium embodiments involved in this application, please refer to the description of the method embodiments in this application.

The terms “first”, “second”, etc. in the description, claims, and drawings of the embodiments of this application are used to distinguish different objects, rather than describing a specific sequence. Furthermore, the term "includes" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, device, product or equipment that includes a series of steps or units is not limited to the listed steps or modules, but optionally also includes unlisted steps or modules, or optionally also includes Other step units inherent to such processes, methods, apparatus, products or equipment.

Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the relationship between hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

The methods and related devices provided by the embodiments of the present application are described with reference to the method flowcharts and/or structural schematic diagrams provided by the embodiments of the present application. Specifically, each process and/or the method flowcharts and/or structural schematic diagrams can be implemented by computer program instructions. or blocks, and combinations of processes and/or blocks in flowcharts and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the structural diagram. These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in one process or multiple processes in the flowchart and/or in one block or multiple blocks in the structural diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart and/or a block or blocks of a structural representation.

What is disclosed above is only the preferred embodiment of the present application. Of course, it cannot be used to limit the scope of rights of the present application. Such equivalent changes based on the claims of this application will still fall within the scope of this application.

Claims

A data processing method, executed by computer equipment, the method includes:

Obtain the first image and the second image containing the target object; wherein, the image area where the target object is located in the first image is the first characteristic area; and the image area where the target object is located in the second image is the second characteristic area. ;

The first image is input into a prediction neural network to obtain a first prediction result; the first prediction result includes: first prediction pixel information indicating whether each pixel of the first image belongs to the first feature area. ;

The second image is input into the prediction neural network to obtain a second prediction result; the second prediction result includes: a second prediction indicating whether each pixel of the second image belongs to the second feature area. Pixel information;

Classification prediction is performed on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixels in the second image that belong to the first classification and the pixels that belong to the second classification. pixels of pixels with incorrect label information in the second image;

According to the first prediction result, the second prediction result, and the classification prediction result, network parameters are optimized for the prediction neural network to obtain a trained prediction neural network. The trained prediction neural network is For image segmentation of target images.
The method of claim 1, wherein optimizing network parameters of the predictive neural network includes:

The first prediction deviation of the prediction neural network is obtained according to the first prediction result and the first supervision data of the first image, where the first supervision data includes an indication of whether each pixel in the first image belongs to the first Marking information of a characteristic area;

Obtain the second prediction deviation of the prediction neural network according to the second prediction pixel information of each pixel belonging to the first classification in the second image and the second supervision data of the second image, and

Obtain the third prediction deviation of the prediction neural network according to the second prediction pixel information of each pixel belonging to the second classification in the second image and the second supervision data,

Perform parameter optimization on the predictive neural network according to the first prediction deviation, the second prediction deviation, and the third prediction deviation;

Wherein, the second supervision data includes label information indicating whether each pixel in the second image belongs to the second feature area;

The second prediction deviation and the third prediction deviation are obtained through different calculation methods.
The method of claim 2, wherein parameter optimization of the prediction neural network through the first prediction deviation, the second prediction deviation and the third prediction deviation includes:

Determine the comprehensive prediction deviation of the prediction neural network for the second image according to the second prediction deviation and the third prediction deviation;

Obtain a weighting coefficient for the comprehensive prediction deviation, and weight the comprehensive prediction deviation according to the weighting coefficient to obtain a weighted comprehensive prediction deviation;

Parameter optimization of the prediction neural network is performed according to the first prediction deviation and the weighted comprehensive prediction deviation.
The method of claim 1, wherein the classification prediction is performed on each pixel of the second image through an auxiliary neural network to obtain a classification prediction result, including:

Input the second image into the auxiliary neural network to obtain the regional center feature of the second feature region and the pixel features of each pixel in the second image;

Based on the region center feature, the pixel features of each pixel in the second image, and the second supervision data of the second image, the pixels with correct label information in the second image are determined as the first category , determining the pixels with wrong label information in the second image as the second category; the second supervision data includes label information indicating whether each pixel in the second image belongs to the second feature area.
The method of claim 4, wherein the second image is input into the auxiliary neural network to obtain the regional center feature of the second feature region and the pixel features of each pixel in the second image. ,include:

Input the second image into the auxiliary neural network to obtain the pixel characteristics of each pixel in the second image;

Use the auxiliary neural network to predict the mask area of each pixel in the second image, and determine the prediction accuracy index of the mask area of each pixel in the second image;

The second image is predicted by the auxiliary neural network based on the mask area of each pixel in the second image to obtain a third prediction result; the third prediction result includes: used to indicate the second Whether each pixel point of the image belongs to the third predicted pixel information of the second feature area;

The region center feature is generated based on the pixel characteristics of each pixel in the second image, the prediction accuracy index of the mask area of each pixel in the second image, and the third predicted pixel information.
The method of claim 5, wherein the prediction accuracy index is based on the pixel characteristics of each pixel in the second image, the prediction accuracy index of the mask area of each pixel in the second image, and the third prediction. pixel information, generated as described in Regional center features include:

Determine pixels whose prediction accuracy index of the corresponding mask area in the second image is greater than the index threshold as evaluation pixels, and obtain at least one evaluation pixel;

The region center feature is generated based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point.
The method of claim 6, wherein generating the region center feature based on the pixel feature of at least one evaluation pixel point and the third predicted pixel information of the at least one evaluation pixel point includes:

The target center feature of the second feature area is generated according to the pixel characteristics of the target evaluation pixel point in the second feature area and the third predicted pixel information of the target evaluation pixel point; the target center feature is used to represent the third Structural features of the image in the two characteristic areas; the target evaluation pixel point is: among the at least one evaluation pixel point, the third predicted pixel information indicates that it belongs to the evaluation pixel point in the second characteristic area;

The background center feature of the second feature area is generated according to the pixel characteristics of the background evaluation pixel of the second feature area and the third predicted pixel information of the background assessment pixel; the background center feature is used to represent the third The structural characteristics of the background image of the second characteristic area in the second image; the background evaluation pixel points of the second characteristic area include: among the at least one evaluation pixel point, the third predicted pixel information indicates that it does not belong to the Evaluation pixels of the second feature area;

The target center feature and the background center feature are determined as the area center features.
The method of claim 7, wherein determining the classification prediction result based on the region center feature, the pixel feature of each pixel in the second image and the second supervision data includes:

For each pixel in the second image, perform the following operations:

Obtain the first feature distance between the pixel feature of the pixel point and the target center feature, and obtain the second feature distance between the pixel feature of the pixel point and the background center feature;

If the first feature distance is greater than the second feature distance, and the label information of the pixel in the second supervision data indicates that the pixel does not belong to the second feature area, then determine that the pixel Belongs to the first category;

If the first feature distance is greater than the second feature distance, and the label information of the pixel in the second supervision data indicates that the pixel belongs to the second feature area, it is determined that the pixel belongs to The second category;

If the first feature distance is less than the second feature distance, and the label information of the pixel in the second supervision data indicates that the pixel belongs to the second feature area, it is determined that the pixel belongs to The first category;

If the first feature distance is less than the second feature distance and the label of the pixel in the second supervision data is If the record information indicates that the pixel does not belong to the second feature area, the second classification of the pixel is determined.
The method of claim 5, wherein determining the prediction accuracy index of the mask area of each pixel in the second image includes:

For each pixel in the second image, perform the following operations:

The network parameters of the auxiliary neural network are randomly discarded K times to obtain K deformed networks of the auxiliary neural network; K is a positive integer;

K deformation prediction pixel information for the pixel is obtained through each deformation network based on the mask area of the pixel; each deformation prediction pixel information includes: used to indicate whether the pixel belongs to the first Two feature area information;

According to the K deformation prediction pixel information, the prediction accuracy index of the mask area of the pixel point is determined.
The method of claim 9, wherein each deformation prediction pixel information includes a first prediction probability that the pixel point belongs to the second feature area, and the pixel point belongs to the first prediction probability in the second image. The second prediction probability of the background image of the two feature areas;

Determining the prediction accuracy index of the mask area of the pixel point based on K deformation prediction pixel information includes:

Obtain the standard deviation between the K first prediction probabilities of the K deformation prediction pixel information as the target prediction accuracy index for the pixel point;

Obtain the standard deviation between the K second prediction probabilities of the K deformation prediction pixel information as the background prediction accuracy index for the pixel point;

The target prediction accuracy index and the background prediction accuracy index are determined as the prediction accuracy index of the mask area of the target pixel point.
The method of claim 10, wherein determining the pixels whose prediction accuracy index of the corresponding mask area in the second image is greater than an index threshold as evaluation pixels includes:

If both the target prediction accuracy index and the background prediction accuracy index are greater than the index threshold, the target pixel is determined as an evaluation pixel.
The method of claim 1, further comprising:

Obtain a target image; the target image includes a target feature area;

Input the target image into the trained prediction neural network to obtain the target prediction result; the target prediction result It includes: target prediction pixel information indicating whether each pixel of the target image belongs to the target feature area;

Segment the image of the target feature area in the target image based on the target predicted pixel information.
A data processing device, the device includes:

An acquisition module, configured to acquire a first image and a second image containing a target object; wherein, the image area where the target object is located in the first image is the first feature area; and the image area where the target object is located in the second image is is the second characteristic area;

A first processing module, configured to input the first image into a prediction neural network to obtain a first prediction result; the first prediction result includes: indicating whether each pixel of the first image belongs to the first feature area The first predicted pixel information;

A second processing module, configured to input the second image into the prediction neural network to obtain a second prediction result; the second prediction result includes: indicating whether each pixel of the second image belongs to the second Second predicted pixel information of the feature area;

A classification module used to perform classification prediction on each pixel point of the second image through an auxiliary neural network to obtain a classification prediction result; the classification prediction result is used to indicate the pixel points belonging to the first classification in the second image, and pixels belonging to the second category; wherein, the pixels of the first category are pixels with correct label information in the second image predicted by the auxiliary neural network; and the pixels of the second category are the pixels of the second category. pixels with incorrect label information in the second image predicted by the auxiliary neural network;

An optimization module, configured to optimize the network parameters of the predictive neural network according to the first prediction result, the second prediction result, and the classification prediction result to obtain a trained predictive neural network. The predictive neural network is used for image segmentation of target images.
A computer program product, comprising computer program instructions, which when executed by a processor, implement the steps of the method described in any one of claims 1-12.
A computer device, including a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor performs the method of any one of claims 1-12. step.
A computer-readable storage medium stores a computer program, and the computer program is adapted to be loaded by a processor and execute the method described in any one of claims 1-12.