WO2021060076A1

WO2021060076A1 - Information processing device, information processing system, information processing method, and program

Info

Publication number: WO2021060076A1
Application number: PCT/JP2020/034919
Authority: WO
Inventors: 達也原田; 修輔高濱; 優介黒瀬; 喜連川　優; 正久深山; 阿部　浩幸; 昌伸北川; 明彦吉澤
Original assignee: 国立大学法人東京大学
Priority date: 2019-09-27
Filing date: 2020-09-15
Publication date: 2021-04-01
Also published as: JP2021056571A; JP7544338B2

Abstract

One embodiment of the present invention is an information processing device. This information processing device has an extraction unit, a feature amount acquisition unit, a generation unit, and an identification unit. The extraction unit extracts a plurality of partial images from an image. The feature amount acquisition unit inputs the plurality of partial images extracted by the extraction unit to a feature extraction model, and obtains feature amounts. The generation unit generates a feature amount map in which the feature amounts obtained by the feature amount acquisition unit are arranged on the basis of position information about the corresponding partial images. The identification unit inputs the feature amount map to a segmentation model, and identifies each of the plurality of partial images.

Description

Information processing equipment, information processing system, information processing method and program

The present invention relates to an information processing device, an information processing system, an information processing method and a program.

A digitized image of a tissue fragment is called a Whole Slide Image (WSI). In order to assist doctors in diagnosis and reduce the burden, research is being conducted to realize automatic diagnosis of pathological images by applying deep learning to WSI.
WSI is characterized by high resolution. In order to use it for learning a deep model without lowering the resolution of WSI, Non-Patent Document 1 uses a method of dividing WSI into small images called patches and inputting them into the model.

However, the method of Non-Patent Document 1 has a problem that only local information limited to the patch size can be considered.

In view of the above circumstances, the present invention has decided to provide a technique for identifying an image in consideration of both local features and global features.

According to one aspect of the present invention, an information processing device is provided. This information processing device has a cutting unit, a feature amount acquisition unit, a generation unit, and an identification unit. The cutout portion cuts out a plurality of partial images from the image. The feature amount acquisition unit inputs a plurality of partial images cut out by the cutout unit into the feature extraction model and acquires the feature amount. The generation unit generates a feature amount map in which the feature amounts acquired by the feature amount acquisition unit are arranged based on the position information of the corresponding partial image. The identification unit inputs a feature map into the segmentation model and identifies each of the plurality of partial images.

According to one of the present inventions, it is possible to provide a technique for identifying an image in consideration of both local features and global features.

FIG. 1 is a diagram showing an example of a system configuration of an information processing system. FIG. 2 is a diagram showing an example of the hardware configuration of the server device. FIG. 3 is a diagram showing an example of the functional configuration of the server device. FIG. 4 is an activity diagram showing an example of information processing of the server device. FIG. 5 is a diagram showing an example of a pipeline. FIG. 6 is a diagram showing an example of arrangement when creating a feature map. FIG. 7 is a diagram showing an example of a model equivalent to the first embodiment. FIG. 8 is a diagram showing an example of performance evaluation.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The various features shown in the embodiments shown below can be combined with each other.

In the present specification, the "part" may include, for example, a combination of hardware resources implemented by a circuit in a broad sense and software information processing that can be concretely realized by these hardware resources. Further, in the first embodiment, various information is handled, and these information are represented by high and low signal values as a bit set of binary numbers composed of 0 or 1, and communication / calculation is executed on a circuit in a broad sense. Can be done.

Further, a circuit in a broad sense is a circuit realized by at least appropriately combining a circuit, a circuit, a processor, a memory, and the like. That is, an integrated circuit for a specific application (Application Special Integrated Circuit: ASIC), a programmable logic device (for example, a simple programmable logic device (Simple Programmable Logical Device: SPLD), a composite programmable logic device (Complex Program)) It includes a programmable gate array (Field Programmable Gate Array: FPGA) and the like.

<Embodiment 1>
1. 1. System Configuration FIG. 1 is a diagram showing an example of a system configuration of an information processing system. The information processing system includes a server device 100 and a client device 110. The server device 100 communicates with the client device 110 via the network 120. In FIG. 1, for simplification of the description, one client device 110 is connected to the server device 100 via the network 120. However, a plurality of client devices may be connected to the server device 100 via the network 120. Further, the server device 100 may be configured not as one but as a plurality of server devices, so-called clouds. Further, the server device 100 may be connected to another server device, system, or the like via the network 120.
(Outline of processing)
When the server device 100 receives the WSI from the image system based on the request from the client device 110 or the like, the server device 100 cuts out a plurality of partial images from the WSI, inputs the cut out plurality of partial images into the feature extraction model, and receives each of the plurality of partial images. Get the linear feature vector corresponding to. Then, the server device 100 generates a feature map in which the feature vectors are arranged based on the position information of the partial image in the WSI, inputs the generated feature map to the segmentation model, and outputs the prediction map. For example, the server device 100 transmits a prediction map to the requesting client device 110. As described above, WSI is a digitized image of a tissue fragment and is an example of a pathological image.

2. Hardware Configuration FIG. 2 is a diagram showing an example of the hardware configuration of the server device 100. The server device 100 includes a control unit 201, a storage unit 202, and a communication unit 203 as a hardware configuration. The control unit 201 is a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like, and controls the entire server device 100 or controls image processing. The storage unit 202 is an HDD (Hard Disk Drive), a ROM (Read Only Memory), a RAM (Random Access Memory), or the like, and stores data or the like used when the program and the control unit 201 execute processing based on the program. Remember. The control unit 201, more specifically the GPU, executes processing based on the program stored in the storage unit 202, thereby performing the functional configuration of the server device 100 of FIG. 3 described later and the activity diagram of FIG. 4 described later. Processing is realized. The communication unit 203 is a NIC (Network Interface Card) or the like, and connects the server device 100 to the network 120. The storage unit 202 is an example of a storage medium.

3. 3. Functional configuration FIG. 3 is a diagram showing an example of the functional configuration of the server device 100. The server device 100 (information processing device) includes a cutting unit 301, a feature amount acquisition unit 302, a generation unit 303, an identification unit 304, an output unit 305, and a learning unit 306 as functional configurations. The cutout unit 301 cuts out a partial image from the WSI. WSI is an example of an image. The feature amount acquisition unit 302 inputs a plurality of partial images cut out by the cutout unit 301 into the feature extraction model, and acquires the feature amount. The generation unit 303 generates a feature amount map in which the feature amounts acquired by the feature amount acquisition unit 302 are arranged based on the position information of the corresponding partial image. The identification unit 304 inputs the feature amount map into the segmentation model and identifies each of the plurality of partial images. The output unit 305 outputs the identification result by the identification unit 304. The learning unit 306 learns the network.

4. Information processing FIG. 4 is an activity diagram showing an example of information processing of the server device 100.
In A401, the cutting unit 301 cuts out a partial image from the WSI received from the image system based on a request from the client device 110 or the like.
In A402, the feature amount acquisition unit 302 inputs a plurality of partial images cut out by the cutout unit 301 into the feature extraction model, and acquires the feature amount. An example of a feature extraction model is GoogleLeNet.
In A403, the generation unit 303 generates a feature amount map in which the feature amounts acquired by the feature amount acquisition unit 302 are arranged based on the position information of the corresponding partial image.

In A404, the identification unit 304 inputs the feature amount map into the segmentation model and discriminates whether each of the plurality of partial images is normal or abnormal. An example of a segmentation model is U-Net.
In A405, the output unit 305 outputs a prediction map in which the partial image identified as normal by the identification unit 304 and the partial image identified as abnormal by the identification unit 304 are colored differently as the result of identification. In the example of the first embodiment, the output unit 305 transmits the prediction map to the requesting client device 110. As another example, when the server device 100 has a display unit such as a display, the output unit 305 may output the prediction map to the display of the server device 100, if requested. Further, depending on the request, the output unit 305 may output the prediction map to the storage unit 202, the storage unit of the external device, or the like.

5. Pipeline FIG. 5 is a diagram showing an example of a pipeline.
The cutting unit 301 cuts out a partial image from the WSI received from the image system based on a request from the client device 110 or the like. The feature amount acquisition unit 302 inputs a plurality of partial images cut out by the cutout unit 301 into the feature extraction model, and acquires the feature amount. The generation unit 303 generates a feature amount map in which the feature amounts acquired by the feature amount acquisition unit 302 are arranged based on the position information of the corresponding partial image. The identification unit 304 inputs the feature amount map into the segmentation model and discriminates whether each of the plurality of partial images is normal or abnormal. The output unit 305 outputs, as a result of the identification, a prediction map in which the partial image identified as normal by the identification unit 304 and the partial image identified as abnormal by the identification unit 304 are colored differently.
As another example, the identification unit 304 and the output unit 305 are integrated, and the identification unit 304 inputs a feature amount map into the segmentation model to identify whether each of the plurality of partial images is normal or abnormal. A predicted map in which the partial image identified as normal and the partial image identified as abnormal may be colored may be output.

6. Optimization method Two types of learning methods will be described below for the network optimization method by the learning unit 306.
(1) The first individual learning optimization method is a method in which the learning unit 306 learns the feature extraction model and the segmentation model separately. In this method, the learning unit 306 first learns the feature extraction model of the first half. The input is a partial image cut from the WSI, and the output is a two-dimensional vector that determines whether the partial image is normal or abnormal and represents the probability. The partial image used for the training data is labeled as normal or abnormal based on the annotation of the doctor, and each element of the output two-dimensional vector is a value from 0 to 1. The learning unit 306 learns the feature extraction model using this training data until the performance becomes stable. Here, the performance of the feature extraction model is obtained by examining the generalization to the test data prepared separately from the training data.

Next, the learning unit 306 fixes the weights of the feature extraction model for which learning has been completed, inputs partial images of all the training data and test data, and extracts an intermediate feature amount before the identification result. WSI identification information and coordinate information are attached to each partial image, and the learning unit 306 creates a feature map of the entire WSI by arranging intermediate feature quantities based on the position information. When inputting to the segmentation model, all feature maps must be in fixed dimensions, but it is assumed that the size of WSI differs depending on the image. Therefore, as shown in FIG. 6, a map filled with sufficient 0s is prepared in advance, and the learning unit 306 arranges the features so that the WSI comes to the center of the map. In FIG. 6, WSI is arranged for the sake of clarity, but what is actually arranged is the feature vector of each partial image. As a more specific example, consider the entire WSI having the size of a partial image of _{L h in the} _{vertical direction and L w in} the horizontal direction. Among them, the i-th partial image from the left and the j-th partial image from the top are referred to as partial images [i, j]. Assuming that the fixed-length feature map is the size of the partial image feature amount L × L (L> L _h , L> L _w ), the learning unit 306 sets the feature amount of the partial image [i, j].

By arranging it at the position of [i', j'], it is arranged in the center of the feature map as a whole.

At the same time, the learning unit 306 creates a correct answer map in which the label information of the partial image is arranged. Assuming a dataset in which all or part of the tissue in WSI is annotated by a doctor, there are four classes defined in the correct data. The first is a normal class, the second is an abnormal class, the third is an unannotated organizational part, an unlabeled class, and the fourth is a blank area where there is no partial image on the feature map. It is a class. Enter the teacher label given to each partial image at the position corresponding to the feature amount of the partial image.

After that, the learning unit 306 learns the segmentation model. The learning unit 306 performs supervised learning using the feature map that generated the input and the teacher data as the correct answer map. At this time, the objects to be correctly identified are two of the four classes, the normal class and the abnormal class, and the unlabeled class and the background class do not need to be learned. Therefore, the learning unit 306 calculates the error for updating the model only for the normal class and the abnormal class, and sets the error to 0 for the other classes. The learning unit 306 uses this model to perform learning until the performance becomes stable. Discrimination performance is evaluated using test data. At that time, the learning unit 306 outputs the identification prediction map of the test data. The learning unit 306 uses the same test data as that used when evaluating the first feature extraction model.

Looking at this pipeline as a whole, although the model structure is different, it can be said that it is essentially equivalent to segmenting the entire WSI. Specifically, it can be regarded as a state in which a feature amount of a partial image size is extracted by performing convolution and pooling with a feature extraction model, and an expected map having a size smaller than the original WSI is output. Considering that the restrictions on the memory and processing time of the GPU of the server device 100 become large when trying to segment the entire WSI, the optimization method by individual learning uses the first half of the segmentation encoder as shown in FIG. It can be regarded as a fixed state after learning first. That is, it can be said that the segmentation of the entire WSI is realized by fixing the gradient of a part of the model.

(2) Batch learning The second of the optimization methods is a method in which the learning unit 306 learns from the feature extraction model to the segmentation model end-to-end. The structure and learning method of each model are basically the same as those described in individual learning.
Generally, in a network in which two or more models are arranged in series, the output of the first model is used as the input of the second model, and it is the same as learning with one model from the last output to the first input. The error can be propagated to. However, under the condition of the first embodiment, information of hundreds to thousands of partial images is required to extract the feature amount of one WSI. That is, in order to give one input to the segmentation model in the latter half, it is necessary to give thousands of inputs to the feature extraction model in the first half in many cases. If you try to input thousands of partial images at once, you have to store all the intermediate layer outputs due to error backpropagation. Specifically, assuming that the partial image cut out from one WSI is N and the memory consumption of the feature extraction model required for learning per partial image is M, one WSI is learned by the segmentation model. In the end, the feature extraction model consumes MN memory, which is almost the same as the memory required to learn all WSI using the segmentation model, and is a partial image for one WSI at a time. Is not realistic considering the memory capacity.

In order to solve this, one set of divided images constituting WSI is divided into realistic sizes, and the feature extraction model is updated in a plurality of times. That is, the learning unit 306 divides the total number of partial images N into r batches and learns them in order. By updating in several times in this way, the memory consumption becomes NM / r. The learning unit 306 adjusts r so that M / r has a computable size. This makes it possible to learn the feature extraction model.
Not updating the feature extraction model at once requires a structure that holds information once between the two models. At the time of forward calculation, the learning unit 306 holds the intermediate feature quantities output from the feature extraction model in the first half together with the position information in the same manner as in the individual learning, and when all the feature quantities on the feature map are available. Fill in the segmentation model. With respect to backpropagation, the error with the teacher label is defined between the final output of the segmentation model, so the learning unit 306 must also obtain the gradient of the feature extraction model in the first half by differentiating the final error. In order to update the feature extraction model, it is necessary to retain the error information calculated from the segmentation model. In order to hold the error information with a small memory consumption, the learning unit 306 adopts the following method.

When the error L is defined for the model, the formula for updating the weights mainly for w in the feature extraction model is

It is represented by. η is a learning coefficient.

If ∂L / ∂W can be obtained, the weight can be updated and learned, but ∂L / ∂W cannot be simply calculated for w of the feature extraction model because the model is different.
Here, assuming that the intermediate feature amount, which is the output of the feature extraction model in the first half, is x, ∂L / ∂W is

Can be written.

Since x is also the input of the segmentation model, ∂L / ∂x can be calculated. Use ∂L / ∂x to calculate the error L'of the feature extraction model.

If you define it again

∂L / ∂W can be obtained as.

That is, if the learning unit 306 calculates and holds ∂L / ∂x for the latter half of the segmentation model, and takes the inner product of the output x of the model and the held value when learning the feature extraction model. , Can be treated as equivalent to the error of the feature extraction model. The learning unit 306 can update the feature extraction model in this way.

When the gradient of the segmentation model and the inner product of the intermediate features of the feature extraction model are considered as errors in this way, the output of the feature extraction model may be up to the intermediate features representing the partial image. Therefore, the final layer of the feature extraction model is removed during batch learning.
The following is a summary of the batch learning optimization procedure.

Step 1: The learning unit 306 sequentially calculates the feature extraction model with the partial image as an input, and extracts the feature amount for each partial image. The learning unit 306 arranges the extracted feature amounts based on the position information given to the partial image, and creates a feature map and a correct answer map of the entire WSI.
Step 2: The learning unit 306 learns and updates the segmentation model using the feature map and the correct answer map after the forward calculation of all partial images is completed.
Step 3: The learning unit 306 calculates [Equation 4] using the error L and the output x of the feature extraction model.
Step 4: The learning unit 306 learns the feature extraction model of the first half using the calculated L'.
By following such a procedure, the two models can be trained end-to-end. In addition, since the number of intermediate layer outputs of the feature extraction model for all partial images is too large to be retained, only the features are extracted in step 1 without retaining the output of each layer, and in step 4, the order is again in a realistic batch size. Calculation and error back propagation calculation are performed. In this way, memory consumption is suppressed and learning on the scale of the entire WSI is realized.

The result of the process described in the first embodiment is shown in FIG.
The "Identifier only" line shows the performance evaluation of only GoogleLeNet, which is a feature extraction model. The "Segmentation only" line shows the performance evaluation of the segmentation model only. The line of "optimization method 1: individual learning" shows the performance evaluation of the pipeline of the first embodiment learned by the above-mentioned individual learning. The line of "optimization method 2: batch learning" shows the performance evaluation of the pipeline of the first embodiment learned by the above-mentioned batch learning. For the evaluation, the correct answer rate of identification and the Area Under Curve (AUC) of the Precision Real (PR) curve are used.
As shown in FIG. 8, "optimization method 1: individual learning" and "optimization method 2: batch learning" described in the first embodiment are more than "discriminator only" and "segmentation only". , Correct answer rate and PR-AUC are higher.
That is, according to the first embodiment, highly accurate identification is performed by providing a technique for identifying an image in consideration of both local features and global features within the hardware constraints of the GPU memory. be able to.

<Modification example>
In the first embodiment, a pathological image has been used as an example for explanation. However, the image is not limited to the pathological image. For example, an aerial image may be used as the image, and the aerial image may be identified with high accuracy. By executing the above-mentioned processing, it is possible to identify the aerial image with high accuracy.
Further, in the first embodiment, the information processing system has been described with a configuration including a server device 100 and a client device 110. However, for example, the client device 110 may have the function of the server device 100 as a single feature.

It may be provided in each of the following aspects.
An information processing device that further includes an output unit, and the output unit outputs the result of identification by the identification unit.
The information processing device, wherein the identification unit inputs the feature amount map into a segmentation model and discriminates whether each partial image is normal or abnormal.
In the information processing device, the output unit outputs a predicted map in which a partial image identified as normal by the identification unit and a partial image identified as abnormal by the identification unit are colored as a result of the identification. Information processing equipment.
An information processing device that further includes a learning unit, and the learning unit separately learns the feature extraction model and the segmentation model.
An information processing device that further includes a learning unit, and the learning unit collectively learns from the feature extraction model to the segmentation model.
The information processing device, wherein the image is a pathological image.
The information processing device, wherein the image is an aerial image.
It is an information processing system and has a cutting unit, a feature amount acquisition unit, a generation unit, an identification unit, and an output unit. The cutting unit cuts out a plurality of partial images from an image and the feature amount. The acquisition unit inputs a plurality of partial images cut out by the cutout unit into the feature extraction model to acquire the feature amount, and the generation unit obtains the feature amount acquired by the feature amount acquisition unit as a corresponding portion. A feature amount map arranged based on the position information of the image is generated, the identification unit inputs the feature amount map into the segmentation model, identifies each partial image, and the output unit is the result of identification by the identification unit. An information processing system that outputs.
An information processing method executed by an information processing apparatus, which includes a first step, a second step, a third step, and a fourth step. In the second step, a plurality of partial images cut out in the first step are input to the feature extraction model to acquire the feature amount, and in the third step, the second step is performed. A feature amount map is generated in which the feature amounts acquired in the above step are arranged based on the position information of the corresponding partial image, and in the fourth step, the feature amount map is input to the segmentation model for each partial image. An information processing method that identifies normal and abnormal.
In the program, a computer is made to execute a first step, a second step, a third step, and a fourth step, and in the first step, a plurality of partial images are obtained from the images. In the second step, the plurality of partial images cut out in the first step are input to the feature extraction model to acquire the feature amount, and in the third step, the partial images are obtained in the second step. A feature amount map in which the obtained feature amounts are arranged based on the position information of the corresponding partial image is generated, and in the fourth step, the feature amount map is input to the segmentation model, and normality and abnormality for each partial image are determined. A program to identify.
Of course, this is not the case.
For example, it may be provided as a computer-readable non-temporary storage medium for storing the above-mentioned program.

Finally, various embodiments according to the present invention have been described, but these are presented as examples and are not intended to limit the scope of the invention. The novel embodiment can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. The embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and the equivalent scope thereof.

100: Server device 110: Client device 120: Network 201: Control unit 202: Storage unit 203: Communication unit 301: Cutout unit 302: Feature amount acquisition unit 303: Generation unit 304: Identification unit 305: Output unit 306: Learning unit

Claims

It is an information processing device
It has a cutting unit, a feature amount acquisition unit, a generation unit, and an identification unit.
The cutout portion cuts out a plurality of partial images from the image and cuts out a plurality of partial images.
The feature amount acquisition unit inputs a plurality of partial images cut out by the cutout unit into the feature extraction model, acquires the feature amount, and obtains the feature amount.
The generation unit generates a feature amount map in which the feature amounts acquired by the feature amount acquisition unit are arranged based on the position information of the corresponding partial image.
The identification unit inputs the feature amount map into the segmentation model and identifies each of the plurality of partial images.
Information processing device.
The information processing device according to claim 1.
It also has an output section
The output unit outputs the result of identification by the identification unit.
Information processing device.
The information processing device according to claim 2.
The identification unit inputs the feature amount map into the segmentation model and discriminates whether each partial image is normal or abnormal.
Information processing device.
The information processing device according to claim 3.
The output unit outputs, as a result of the identification, a predicted map in which the partial image identified as normal by the identification unit and the partial image identified as abnormal by the identification unit are colored.
Information processing device.
The information processing device according to any one of claims 1 to 4.
Has a learning department
The learning unit separately learns the feature extraction model and the segmentation model.
Information processing device.
The information processing device according to any one of claims 1 to 4.
Has a learning department
The learning unit collectively learns from the feature extraction model to the segmentation model.
Information processing device.
The information processing device according to any one of claims 1 to 6.
The image is a pathological image,
Information processing device.
The information processing device according to claim 1.
The image is an aerial image,
Information processing device.
It is an information processing system
It has a cutting unit, a feature amount acquisition unit, a generation unit, an identification unit, and an output unit.
The cutout portion cuts out a plurality of partial images from the image and cuts out a plurality of partial images.
The feature amount acquisition unit inputs a plurality of partial images cut out by the cutout unit into the feature extraction model, acquires the feature amount, and obtains the feature amount.
The generation unit generates a feature amount map in which the feature amounts acquired by the feature amount acquisition unit are arranged based on the position information of the corresponding partial image.
The identification unit inputs the feature amount map into the segmentation model, identifies each partial image, and then identifies the partial image.
The output unit outputs the result of identification by the identification unit.
Information processing system.
It is an information processing method executed by an information processing device.
The first step, the second step, the third step, and the fourth step are included.
In the first step, a plurality of partial images are cut out from the image.
In the second step, a plurality of partial images cut out in the first step are input to the feature extraction model, and the feature amount is acquired.
In the third step, a feature amount map in which the feature amounts acquired in the second step are arranged based on the position information of the corresponding partial image is generated.
In the fourth step, the feature amount map is input to the segmentation model to discriminate between normal and abnormal for each partial image.
Information processing method.
It ’s a program
On the computer
The first step, the second step, the third step, and the fourth step are executed.
In the first step, a plurality of partial images are cut out from the image.
In the second step, a plurality of partial images cut out in the first step are input to the feature extraction model, and the feature amount is acquired.
In the third step, a feature amount map in which the feature amounts acquired in the second step are arranged based on the position information of the corresponding partial image is generated.
In the fourth step, the feature amount map is input to the segmentation model to discriminate between normal and abnormal for each partial image.
program.