WO2022064901A1

WO2022064901A1 - Trained model transformation method, inference method, trained model transformation device, trained model, and inference device

Info

Publication number: WO2022064901A1
Application number: PCT/JP2021/030212
Authority: WO
Inventors: 駿平加門
Original assignee: 富士フイルム株式会社
Priority date: 2020-09-28
Filing date: 2021-08-18
Publication date: 2022-03-31
Also published as: JPWO2022064901A1; US20230230369A1

Abstract

The present invention provides a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device that can reduce the processing cost caused by a regularization layer. The trained model transformation method according to one aspect of the present invention comprises: a convolutional layer generation process for generating a second convolution layer, for a trained convolutional neural network containing at least one regularization layer, on the basis of the trained parameters of the regularization layer and the trained parameters of the first convolutional layer adjacent to the regularization layer; and a transformation model generation process for replacing the regularization layer and the first convolution layer with the second convolution layer to generate a transformation model that is a transformed trained model.

Description

Trained model transformation method, inference method, trained model transformation device, trained model, and inference device

The present invention relates to a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device.

Patent Document 1 describes that an equivalent neural network excluding the bias of the convolution operation is used. Further, Patent Document 2 describes that when the batch normalization layer is in front of the convolution layer, the feature index is converted by a set of parameters.

Japanese Unexamined Patent Publication No. 2019-57072 Japanese Unexamined Patent Publication No. 2019-71080

One embodiment of the present invention provides a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device that can reduce the processing cost by the regularization layer.

In the trained model conversion method according to the first aspect of the present invention, for a trained convolutional neural network including at least one regularization layer, the trained parameters of the regularization layer and the first adjoining the regularization layer are adjacent to the trained convolutional neural network. A convolutional layer generation step that generates a second convolutional layer based on the trained parameters of the convolutional layer, and a transformation that is a transformed trained model by replacing the regularization layer and the first convolutional layer with the second convolutional layer. It has a transformation model generation step of generating a model.

The trained model transformation method according to the second aspect is the first aspect, in which the convolution layer generation step is composed of only the first processing unit and the second convolution layer composed of the first convolution layer and the regularized layer. In each of the second processing units, the second convolution layer is generated so that the inference processing results when the same feature amount is input are equal.

The trained model transformation method according to the third aspect is the first or second aspect, in which the regularization layer is a batch regularization layer.

The inference method according to the fourth aspect of the present invention is input to the conversion model obtained by the data acquisition step of acquiring the input data and the trained model conversion method according to any one of the first to third aspects. It has an inference process of inputting data and obtaining an inference result.

The inference method according to the fifth aspect is the fourth aspect, in which at least a part of the inference process is executed by the parallel computing apparatus.

The inference method according to the sixth aspect is the fourth or fifth aspect, in which time series data is acquired as input data in the data acquisition process.

The inference method according to the seventh aspect is the sixth aspect, in which a moving image of a subject is acquired as input data in the data acquisition step.

The trained model converter according to the eighth aspect of the present invention comprises a processor, wherein the trained convolutional neural network including at least one regularization layer is subject to the trained parameters of the regularization layer and regularization. A convolutional layer generation process that generates a second convolutional layer based on the trained parameters of the first convolutional layer adjacent to the layer, and learning that is converted by replacing the regularized layer and the first convolutional layer with the second convolutional layer. Executes a transformation model generation process that generates a transformation model that is a completed model.

The trained model according to the ninth aspect of the present invention is a trained model used by a computer to output an inference result for input data, and the processor of the trained model converter has at least one regularization layer. A convolutional layer generation process that generates a second convolutional layer based on the trained parameters of the regularization layer and the trained parameters of the first convolutional layer adjacent to the regularization layer for the included convolutional neural network. , A trained model obtained by replacing the regularization layer and the first convolutional layer with the second convolutional layer, and performing a transformation model generation process of generating a transformation model which is a transformed trained model. ..

The inference device according to the tenth aspect of the present invention includes a processor and a trained model according to the ninth aspect, and the processor includes a data acquisition process for acquiring input data and input data to the trained model. It is an inference device that executes inference processing that inputs and obtains inference results.

In the tenth aspect of the inference device according to the eleventh aspect, the processor includes a parallel calculation processing device that executes at least a part of the inference processing.

FIG. 1 is a diagram showing a configuration of a trained model transformation device. FIG. 2 is a diagram showing a state of training of a convolutional neural network and conversion of a trained model. FIG. 3 is a diagram showing a configuration example of a convolutional neural network. FIG. 4 is a diagram showing a state of the convolution process by the filter. FIG. 5 is a diagram showing a state of formation of a convolutional layer. FIG. 6 is another diagram showing how the convolutional layer is formed. FIG. 7 is a diagram showing a configuration example of a converted convolutional neural network. FIG. 8 is an external view of an endoscope system as an aspect of an inference device. FIG. 9 is a block diagram showing a configuration of a main part of the endoscope system. FIG. 10 is a functional block diagram of the image processing unit. FIG. 11 is a diagram showing a state of inference using a transformation model.

Hereinafter, embodiments of the trained model transformation method, the inference method, the trained model transformation device, the trained model, and the inference device according to the present invention will be described in detail with reference to the attached drawings.

<Configuration of trained model transformation device>
FIG. 1 is a diagram showing a configuration of a trained model transformation device 500 (trained model transformation device). The trained model conversion device 500 includes a processor 510 (processor, computer) including a learning control unit 512 and a conversion model generation unit 514, a ROM 520 (ROM: Read Only Memory, a non-temporary recording medium, a memory), and a RAM 530 (RAM). : Random Access Memory) and. The processor 510 is driven by various processors and / or electrical circuits, similar to the main control unit 210 and the image processing unit 204 of the endoscope system 10 (see FIGS. 8 to 11 and related to these figures) described later. Can be configured. The ROM 520 stores a computer-readable code of a trained model conversion program (a program that causes a computer to execute the trained model conversion method according to the present invention) and various data necessary for executing the trained model conversion method. ing. Codes and data may be stored in EEPROM (Electronically Erasable and Programmable Read Only Memory) or flash memory instead of ROM 520. The RAM 530 is used as a temporary storage area or a work area during processing.

The trained model conversion device 500 trains a convolutional neural network (CNN) and converts a trained model according to the above configuration. Part (a) of FIG. 2 shows how the CNN 560 before learning becomes CNN562, which is a trained model under the control of the learning control unit 512, and part (b) of FIG. 2 shows CNN562 (learned model). Shows how CNN563 (conversion model) is formed by the control of the conversion model generation unit 514 (convolution layer generation step, conversion model generation step). The type of processing device (CPU, GPU, etc.) that converts the trained model is not particularly limited.

<Convolutional neural network configuration>
FIG. 3 is a diagram showing a configuration example of CNN562 (convolutional neural network; trained model) (CNN560 has a similar configuration). In the example shown in part (a) of FIG. 3, the CNN 562 has an input layer 562A, an intermediate layer 562B, and an output layer 562C. The input layer 562A inputs time-series data (for example, a moving image of a subject, but is not limited to this; input data) and outputs a feature amount. The intermediate layer 562B includes a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and the feature amount output by the input layer 562A is input to calculate other feature amounts. The convolution layer 564 has a structure in which a plurality of "nodes" are connected by "edges", and a weighting coefficient applied to the input image is associated with the node and the edge and stored in a weighting coefficient storage unit (not shown). It is remembered. The value of the weighting coefficient changes from the initial state (value in CNN560) as the learning progresses, and in CNN562 (trained model), the weighting coefficient in the state where the learning is completed is used.

In FIG. 3, when the convolution layer 564 is on the input side and the batch regularization layer 565 is on the output side as in the conversion target layer 561A, the conversion as described later with respect to FIG. 5 (convolution layer generation step, conversion model generation step). When the batch regularization layer 565 is on the input side and the convolution layer 564 is on the output side as in the conversion target layer 561B, the conversion (convolution layer generation step, conversion model generation step) as described later with respect to FIG. 6 is performed. Will be.

<Processing in the middle layer>
<Convolution>
The intermediate layer 562B calculates the feature amount by the convolution calculation. The convolution operation performed in the convolution layer 564 is a process of acquiring a feature map by a convolution operation using a filter, and plays a role of feature extraction such as edge extraction from an image. By the convolution operation using this filter, one channel (one sheet) of "feature map" is generated for one filter. When downscaled by convolution, the size of the "feature map" becomes smaller as each layer is convolved. The intermediate layer 562B can be composed of one or a plurality of layers that are subjected to the convolution process.

FIG. 4 is a diagram showing a state of the convolution process by the filter. In the first (first) convolution layer of the intermediate layer 562B, for example, an image set (learning image set during learning, inference image set during inference) composed of a plurality of medical images (input data) and a filter F ₁ The convolution operation with is performed. The image set is composed of N images (N channels) having an image size of H in the vertical direction and W in the horizontal direction. When a normal optical image is input, the images constituting the image set are images of three channels of R (red), G (green), and B (blue). Since the image set of the filter F ₁ to be convolved with this image set is N channels (N sheets), for example, in the case of a filter of size 5 (5 × 5), the filter size is 5 × 5 × N. Become. By the convolution operation using this filter F ₁ , one channel (one sheet) of "feature map" is generated for one filter F ₁ . The filter F ₂ used in the second convolution layer has a filter size of 3 × 3 × M, for example, in the case of a filter of size 3 (3 × 3).

Similar to the first convolution layer, the _second to _nth convolution layers perform a convolution operation using the filters F2 to Fn. The size of the "feature map" in the nth convolution layer is smaller than the size of the "feature map" in the second convolution layer because it is downscaled by the convolution layers up to the previous stage.

Of the layers of the intermediate layer 562B, low-order feature extraction (edge extraction, etc.) is performed in the convolutional layer near the input side, and higher-order feature extraction (features related to the shape, structure, etc. of the recognition target) is performed as the intermediate layer 562B approaches the output side. Extraction) is performed.

<Regularization>
In learning with a convolutional neural network, the internal covariate shift is suppressed by inserting a regularization layer, so improvement in convergence speed and accuracy can be expected. In the regularized layer, statistics such as the average and variance of the features are calculated and the features are whitened. There are several types of regularization layers, and these normalization layers differ in the range in which the above-mentioned statistics are calculated, as described below. The feature value is defined as f (b, x, y, c). Here, b, x, y, and c are the indexes of the batch, X-axis, Y-axis, and channel, respectively.

(1) Batch regularization The average and variance are calculated by the following equations (1) and (2), respectively, and whitening is performed for each channel.

(2) Layer regularization The average variance is calculated by the following equations (3) and (4), respectively, and whitening is performed in batch units.

(3) Instance regularization The average and variance are calculated by the following equations (5) and (6), respectively, and whitening is performed for each batch and channel.

(4) Group regularization The channel is divided into N groups as shown in the following equations (7) and (8).

The average and variance are calculated as the following formulas (9) and (10), respectively, and whitening is performed for each group of each batch.

The layer structure of the CNN 562 is not limited to the case where the convolution layer 564 and the batch regularization layer 565 are repeated one by one, and even if any one layer (for example, the convolution layer 564) is continuously included. good.

<Other configurations>
CNN562 may include a pooling layer. The pooling process performed in the pooling layer is a process of reducing (or enlarging) the feature map output by the convolution operation to make a new feature map, so that the extracted features are not affected by translation or the like. Plays the role of giving robustness to. Further, the CNN 562 may include a fully bonded layer 566 as in the example shown in the portion (b) of FIG.

<Transformation of trained model: Omission of regularization layer at the time of inference>
As mentioned above, there are multiple types of regularization methods, but the difference is in "in what range of features the statistics are calculated", and in the following discussions, all can be implemented by the same method. Is. In the following, only batch regularization will be described.

The batch regularization layer is often placed in front of, behind, or before and after the convolutional layer. The batch regularization process can be integrated into adjacent convolution layers (convolution and batch regularization are combined into one convolution) only at the time of inference by the method described below. As a result, the number of memory accesses during inference processing can be reduced, and the calculation speed can be increased. The method of the present invention is highly effective when performing inference on a parallel computing device (such as a GPU) in which the memory access cost becomes more dominant in processing time. Further, the method of the present invention is only converted at the time of inference processing, and the model at the time of learning includes a batch regularization layer. That is, while obtaining the effect on the learning of batch regularization, the processing cost can be omitted at the time of inference processing of the trained model.

<Conversion method (Pattern 1)>
FIG. 5 is a diagram showing a state of conversion (pattern 1) of the trained model. Pattern 1 is processing when the convolution layer 564 is on the input side and the batch regularization layer 565 is on the output side, as in the conversion target layer 561A (first processing unit) in FIG. Assuming that the input and output of the convolution layer 564 are x and y and the output of the batch regularization layer 565 is z as in the part (a) of FIG. 5, the convolution and batch regularization processing is performed by the following equation (11). ) And (12).

Here, W and b are trained parameters of the convolution layer 564 (first convolution layer), and γ, μ, σ, ε, and β are trained parameters of the batch regularization layer 565 (regularization layer). By transforming the above equation, the following equation (13) can be obtained.

That is, processing from x to z can be realized by convolution having W tilde (weight parameter at the time of convolution in the second convolution layer 567) and b tilde (bias component in the second convolution layer 567) as parameters. The definitions of W tilde and b tilde are as shown in the following equations (14) and (15), respectively.

As a result, as shown in the portion (b) of FIG. 5, the conversion model generation unit 514 performs the convolution in the convolution layer 564 and the processing in the batch regularization layer 565 for the result in one convolution layer 567 (second convolution). It is possible to convert to a layer (second processing unit) (generation of a second convolution layer, convolution layer generation step, convolution layer generation process). That is, in the convolution layer generation step, the conversion model generation unit 514 has a first processing unit composed of a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and a convolution layer 567 (second). In each of the second processing units composed of only the convolution layer), the convolution layer 567 (for example, the above-mentioned z) has the same inference processing result (for example, the above-mentioned z) when the same feature amount (for example, x) is input. 2 convolutional layers) can be generated.

When the input data is an image, x, y, z are vectors, W and W tildes (parameters after conversion described in equations (13) and (14)) are weight coefficient matrices, b and b tildes. (Parameters after conversion described in equations (13) and (15)) are also a matrix showing the bias component.

The conversion model generation unit 514 replaces the conversion target layer 561A (convolution layer 564 and batch regularization layer 565; see FIG. 3) with the convolution layer 567 (second convolution layer, second processing unit), and has been converted and trained. Generate a transformation model that is a model (convolution model generation process, transformation model generation process).

<Conversion method (Pattern 2)>
FIG. 6 is a diagram showing a state of conversion (pattern 2) of the trained model. Pattern 2 is a process in which the batch regularization layer 565 is on the input side and the convolution layer 564 is on the output side, as in the conversion target layer 561B (first processing unit) in FIG. Assuming that the input and output of the batch regularization layer 565 are x and y and the output of the convolution layer 564 is z as in the part (a) of FIG. 6, the batch regularization and convolution processing is performed by the following equation (16). ) And (17).

The following equation (18) is obtained by transforming the equation in the same manner as in pattern 1.

That is, processing from x to z can be realized by convolution having W tilde (weight parameter at the time of convolution in the second convolution layer 567) and b tilde (bias component in the second convolution layer 567) as parameters. The definition of the W tilde is as shown in the following formula (19), and the definition of the b tilde is as shown in the following formula (20).

As a result, as shown in the portion (b) of FIG. 6, the conversion model generation unit 514 combines the processing in the batch regularization layer 565 and the convolution in the convolution layer 564 with respect to the result into one convolution layer 567 (second). It is possible to convert to a convolutional layer (second processing unit) (generation of a second convolutional layer, convolutional layer generation step, convolutional layer generation processing). That is, in the convolution layer generation step, the conversion model generation unit 514 has a first processing unit composed of a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and a convolution layer 567 (second). In each of the second processing units composed of only the convolution layer), the convolution layer 567 (for example, the above-mentioned z) has the same inference processing result (for example, the above-mentioned z) when the same feature amount (for example, x) is input. 2 convolutional layers) can be generated.

When the input data is an image, x, y, z are vectors, W and W tildes (parameters after conversion described in equations (18) and (19)) are weight coefficient matrices, b and b tildes. (Parameters after conversion described in equations (18) and (20)) are also a matrix showing the bias component.

<Trained model after conversion>
FIG. 7 is a diagram showing a configuration example of a transformed convolutional neural network (transformed trained model, transformed model). Part (a) of FIG. 7 shows CNN563 (conversion model; no fully connected layer) corresponding to part (a) of FIG. 3, and part (b) of FIG. 3 corresponds to part (b) of FIG. (Conversion model; with fully connected layer) is shown. CNN563 includes an input layer 563A, an intermediate layer 563B, and an output layer 563C. Note that FIG. 7 shows an example in which all the sets of the convolution layer 564 and the batch regularization layer 565 are converted and replaced with the convolution layer 567, but the conversion and replacement may be performed for some sets.

<One aspect of inference device and inference method: application to endoscopic system>
FIG. 8 is an external view of the endoscope system 10 (endoscope system, medical image processing device, reasoning device) as one aspect of the reasoning device, and FIG. 9 shows a main part configuration of the endoscope system 10. It is a block diagram. As shown in FIGS. 8 and 9, the endoscope system 10 includes an endoscope scope 100 (image acquisition unit, endoscope scope) and a medical image processing device 200 (medical image processing device, computer, processor, inference device). , A light source device 300 (light source device), and a monitor 400 (display device, display).

<Structure of endoscopic scope>
The endoscope scope 100 includes a hand operation unit 102 and an insertion unit 104 connected to the hand operation unit 102. The operator (user) grips and operates the hand operation unit 102, inserts the insertion unit 104 into the body of the subject (living body), and observes it. Further, the hand operation unit 102 is provided with an air supply / water supply button 141, a suction button 142, a function button 143 to which various functions are assigned, and a shooting button 144 for receiving a shooting instruction operation (still image, moving image). ..

The hand operation unit 102 is provided with a scope information recording unit 139 that records individual information (individual information, scope information) of the endoscope scope 100. The individual information includes, for example, the type of the endoscope scope 100 (direct view or side view, etc.), the model, the individual identification number, the characteristics of the optical system (viewing angle, distortion, etc.), and the instrument used for treating the subject (viewing angle, distortion, etc.). Information on treatment tools, etc.). The scope information acquisition unit 204E (scope information acquisition unit, individual information acquisition unit; see FIG. 10) of the image processing unit 204 acquires this individual information and processes it by the medical image processing apparatus 200 (image acquisition processing, inference processing, Used for display control processing). The scope information recording unit 139 may be provided in another portion such as inside the light guide connector 108.

The insertion portion 104 is composed of a flexible portion 112, a curved portion 114, and a hard tip portion 116 in this order from the hand operation portion 102 side. That is, the curved portion 114 is connected to the proximal end side of the hard tip portion 116, and the flexible portion 112 is connected to the proximal end side of the curved portion 114. The hand operation unit 102 is connected to the base end side of the insertion unit 104. The user can bend the curved portion 114 and change the direction of the hard tip portion 116 up, down, left and right by operating the hand operation portion 102. The hard tip 116 is provided with an imaging optical system 130, an illumination unit 123, a forceps opening 126, and the like (see FIGS. 8 and 9).

During observation and treatment, by operating the operation unit 208 (see FIG. 9), white light and / or narrow band light (red narrow band light, green narrow band light, One or more of blue narrow band light and purple narrow band light) can be irradiated. Further, by operating the air supply / water supply button 141, cleaning water is discharged from a water supply nozzle (not shown) to clean the photographing lens 132 (photographing lens, photographing unit) of the photographing optical system 130 and the

lighting lenses

123A and 123B. Can be done. A duct (not shown) is communicated with the forceps opening 126 opened by the hard tip 116, and a treatment tool (not shown) for removing a tumor or the like is inserted into this duct, and the subject moves back and forth as appropriate. You can take the necessary measures.

As shown in FIGS. 8 and 9, a photographing lens 132 (photographing portion) is arranged on the tip end surface 116A of the tip rigid portion 116. A CMOS (Complementary Metal-Oxide Semiconductor) type image sensor 134 (image sensor, image acquisition unit), a drive circuit 136, and an AFE138 (AFE: Analog Front End) are arranged behind the photographing lens 132, and these elements are arranged. Outputs an image signal. The image pickup element 134 is a color image pickup element, and is composed of a plurality of light receiving elements arranged in a matrix (two-dimensional arrangement) in a specific pattern arrangement (Bayer arrangement, X-Transs (registered trademark) arrangement, honeycomb arrangement, etc.). It has a plurality of pixels. Each pixel of the image sensor 134 includes a microlens, a red (R), green (G), or blue (B) color filter and a photoelectric conversion unit (photodiode or the like). An image sensor in which the image sensor 134, the drive circuit 136, and the AFE 138 are included in one package may be used. The photographing optical system 130 can also generate a color image from pixel signals of three colors of red, green, and blue, and generate an image from a pixel signal of any one or two colors of red, green, and blue. You can also do it. The image sensor 134 may be an XY address type or a CCD (Charge Coupled Device) type. Further, each pixel of the image pickup element 134 may further include a purple color filter corresponding to a purple light source 310V and / or an infrared filter corresponding to an infrared light source.

The optical image of the subject is imaged on the light receiving surface (imaging surface) of the image pickup element 134 by the photographing lens 132, converted into an electric signal, and output to the medical image processing apparatus 200 via a signal cable (not shown) to form a video signal. Is converted to. As a result, the endoscopic image (observation image, medical image) of the subject is displayed on the screen on the monitor 400 connected to the medical image processing device 200.

Further, on the tip end surface 116A of the tip rigid portion 116, the

illumination lenses

123A and 123B of the illumination portion 123 are provided adjacent to the photographing lens 132. An ejection end of a light guide 170, which will be described later, is arranged behind the

illumination lenses

123A and 123B, and the light guide 170 is inserted into an insertion portion 104, a hand operation portion 102, and a universal cable 106, and the light guide 170 is inserted. The incident end is arranged within the light guide connector 108.

The user performs imaging at a predetermined frame rate while inserting or removing the endoscope scope 100 (insertion unit 104) having the above-described configuration into the living body as the subject (under the control of the medical image acquisition unit 204A). By doing so, it is possible to sequentially take time-series images of the living body (subject).

<Structure of light source device>
As shown in FIG. 9, the light source device 300 includes a light source 310 for illumination, a diaphragm 330, a condenser lens 340, a light source control unit 350, and the like, and causes observation light to enter the light guide 170. The light source 310 includes a red light source 310R, a green light source 310G, a blue light source 310B, and a purple light source 310V that irradiate narrow-band light of red, green, blue, and purple, respectively, and is narrow in red, green, blue, and purple. It can irradiate band light. The illuminance of the observation light by the light source 310 is controlled by the light source control unit 350, and the illuminance of the observation light can be changed (increased or decreased) and the illumination can be stopped as needed.

The light source 310 can emit red, green, blue, and purple narrow band light in any combination. For example, narrow-band light of red, green, blue, and purple can be emitted at the same time to irradiate white light (normal light) as observation light, or one or two of them can be emitted to emit narrow-band light. It is also possible to irradiate light (special light). The light source 310 may further include an infrared light source that irradiates infrared light (an example of narrow band light). Further, white light or narrow band light may be irradiated as observation light by a light source that irradiates white light and a filter that transmits white light and each narrow band light.

<Wavelength band of light source>
The light source 310 may be a light source having a white band or a light source having a plurality of wavelength bands as the light having a white band, or a light source having a specific wavelength band narrower than the white wavelength band. The specific wavelength band may be a blue band or a green band in the visible region, or a red band in the visible region. When a specific wavelength band is a visible blue band or green band, it includes a wavelength band of 390 nm or more and 450 nm or less, or 530 nm or more and 550 nm or less, and peaks in a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less. It may have a wavelength. Further, when the specific wavelength band is the red band in the visible region, the wavelength band of 585 nm or more and 615 nm or less, or 610 nm or more and 730 nm or less is included, and the light of the specific wavelength band is 585 nm or more and 615 nm or less or 610 nm or more. It may have a peak wavelength in the wavelength band of 730 nm or less. However, nm represents "nanometer".

Even if the above-mentioned light having a specific wavelength band includes a wavelength band having different absorption coefficients between oxidized hemoglobin and reduced hemoglobin and has a peak wavelength in a wavelength band having different absorption coefficients between oxidized hemoglobin and reduced hemoglobin. good. In this case, the specific wavelength band includes a wavelength band of 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm, and 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm. It may have a peak wavelength in the following wavelength band.

Further, the light generated by the light source 310 may include a wavelength band of 790 nm or more and 820 nm or less, or 905 nm or more and 970 nm or less, and may have a peak wavelength in a wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less.

Further, the light source 310 may include a light source that irradiates excitation light having a peak of 390 nm or more and 470 nm or less. In this case, it is possible to acquire a medical image (medical image, in-vivo image) having information on the fluorescence emitted by the fluorescent substance in the subject (living body). When acquiring a fluorescence image, a dye for the fluorescence method (fluorestin, acridine orange, etc.) may be used.

It is preferable to configure the light source type (laser light source, xenon light source, LED light source (LED: Light-Emitting Diode), etc.), wavelength, presence / absence of filter, etc. of the light source 310 according to the type of subject, the part, the purpose of observation, and the like. Further, when observing, it is preferable to combine and / or switch the wavelength of the observation light according to the type, part, observation purpose, and the like of the subject. When switching the wavelength, for example, by rotating a disk-shaped filter (rotary color filter) arranged in front of the light source and provided with a filter that transmits or blocks light of a specific wavelength, the wavelength of the irradiated light is switched. May be good.

Further, the image pickup element used in the endoscope system 10 is not limited to the color image pickup element in which the color filter is arranged for each pixel as in the image pickup element 134, and may be a monochrome image pickup element. When a monochrome image sensor is used, the wavelength of the observation light can be sequentially switched to perform surface-sequential (color-sequential) imaging. For example, the wavelength of the emitted observation light may be sequentially switched between (purple, blue, green, red), or a rotary color filter (red, green, blue, purple, etc.) is irradiated with broadband light (white light). You may switch the wavelength of the observation light emitted by. Further, the wavelength of the observation light emitted by the rotary color filter (green, blue, purple, etc.) may be switched by irradiating one or a plurality of narrow band lights (green, blue, purple, etc.). The narrow band light may be infrared light having two or more wavelengths (first narrow band light, second narrow band light) having different wavelengths.

By connecting the light guide connector 108 (see FIGS. 8 and 9) to the light source device 300, the observation light emitted from the light source device 300 is transmitted to the

lighting lenses

123A and 123B via the light guide 170, and the lighting lens. The observation range is irradiated from 123A and 123B.

<Medical image processing device configuration>
The configuration of the medical image processing apparatus 200 will be described with reference to FIG. The medical image processing apparatus 200 inputs an image signal output from the endoscope scope 100 by the image input controller 202, performs necessary image processing by the image processing unit 204 (processor, computer), and outputs the image signal from the video output unit 206. do. As a result, the observation image (medical image, endoscopic image, in-vivo image) is displayed on the monitor 400 (display device). These processes are performed under the control of the main control unit 210 (processor, computer). The communication control unit 205 controls communication for acquiring medical images with an in-hospital system (HIS: Hospital Information System), an in-hospital LAN (Local Area Network), and / or an external system or network (not shown). I do.

<Function of image processing unit>
FIG. 10 is a functional block diagram of the image processing unit 204. The image processing unit 204 includes a medical image acquisition unit 204A (medical image acquisition unit), an inference unit 204B (inference unit, interest area recognition unit), a display control unit 204C (display control unit), and a recording control unit 204D (recording). It includes a control unit) and a scope information acquisition unit 204E (scope information acquisition unit). The inference unit 204B includes a conversion model (CNN563 or the like shown in FIG. 7) obtained by the above-mentioned method (learned model transformation method according to the present invention).
The details of the processing using these functions will be described later.

The image processing unit 204 uses the above-mentioned functions to recognize (infer) medical images, calculate features, process to emphasize or reduce components in a specific frequency band, and perform a specific target (region of interest, blood vessel of a desired depth). Etc.) can be emphasized or inconspicuous. The image processing unit 204 acquires special light having information in a specific wavelength band based on a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band. It may be provided with an image acquisition unit. In this case, the signal in a specific wavelength band is used for RGB (R: red, G: green, B: blue) or CMY (C: cyan, M: magenta, Y: yellow) color information contained in a normal optical image. It can be obtained by the calculation based on. Further, the image processing unit 204 includes a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band, and a special optical image obtained by irradiating light in a specific wavelength band. A feature amount image generation unit that generates a feature amount image by an operation based on at least one of the above may be provided, and a feature amount image as a medical image (medical image) may be acquired and displayed. The above-mentioned processing is performed under the control of the main control unit 210.

<Realization of functions by various processors>
The functions of the image processing unit 204 and the main control unit 210 described above can be realized by using various processors and recording media. The various processors include, for example, a CPU (Central Processing Unit), which is a general-purpose processor that executes software (program) to realize various functions. In addition, for the various processors described above, the circuit configuration is changed after manufacturing GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), etc., which are processors specialized in image processing and one aspect of parallel computing equipment. A programmable logic device (PLD), which is a possible processor, is also included. Further, the above-mentioned various processors also include a dedicated electric circuit, which is a processor having a circuit configuration specially designed for executing a specific process such as an ASIC (Application Specific Integrated Circuit).

The functions of each part may be realized by one processor, or may be realized by a plurality of processors of the same type or different types (for example, a plurality of FPGAs, or a combination of a CPU and an FPGA, or a combination of a CPU and a GPU). Further, a plurality of functions may be realized by one processor. As an example of configuring a plurality of functions with one processor, first, as represented by a computer, one processor is configured by a combination of one or more CPUs and software, and this processor is used as a plurality of functions. There is a form to be realized. Secondly, as typified by System On Chip (SoC), there is a form of using a processor that realizes the functions of the entire system with one IC (Integrated Circuit) chip. As described above, various functions are configured by using one or more of the above-mentioned various processors as a hardware structure. Further, the hardware-like structure of these various processors is, more specifically, an electric circuit (circuitry) in which circuit elements such as semiconductor elements are combined. These electric circuits may be electric circuits that realize the above-mentioned functions by using logical sum, logical product, logical denial, exclusive OR, and logical operations combining these.

When the above-mentioned processor or electric circuit executes software (program), it can be read by a computer of the software (for example, various processors and electric circuits constituting the image processing unit 204, and / or a combination thereof). The code is stored in a non-temporary recording medium such as ROM 211 (ROM: ReadOnlyMemory) or flash memory (not shown), and the computer refers to the software. The software stored in the non-temporary recording medium includes a program for executing the medical image processing method (method of operating the medical image processing device) according to the present invention and data used for executing the medical image processing method (data related to acquisition of medical images). Includes data used to define the biopsy state, etc., and to set the mode of identification display, parameters used in the recognition unit, etc.). The code may be recorded on a non-temporary recording medium such as various optical magnetic recording devices and semiconductor memories instead of the ROM 211. When processing using software, for example, RAM212 (RAM: Random Access Memory) is used as a temporary storage area, and for example, data stored in an EEPROM (Electronically Erasable and Programmable Read Only Memory) (not shown) is referred to. You can also do it. The recording unit 207 may be used as a “non-temporary recording medium”.

Further, the ROM 211 (ROM: ReadOnlyMemory) is a non-volatile storage element (non-temporary recording medium), and various image processing methods (including the medical image processing method according to the present invention) are used in the main control unit 210 and /. Alternatively, a computer-readable code of a program to be executed by the image processing unit 204 (computer) is stored. The RAM 212 (RAM: Random Access Memory) is a storage element for temporary storage during various processes, and can also be used as a buffer for image acquisition. The voice processing unit 209 outputs a message (voice) related to medical image processing, inference results of the region of interest, notification, etc. from the speaker 209A (notification unit, speaker) under the control of the main control unit 210 and the image processing unit 204.

When image processing and recognition are performed as in the endoscope system 10, the image processing unit 204 and / or the main control unit 210 are configured using a GPU, which is one aspect of a parallel computing device, and will be described later. It is effective to execute at least a part of the inference process (inference process) on the GPU.

<Operation unit>
The operation unit 208 can be configured by a device such as a keyboard and a mouse (not shown), and the user sets an execution instruction of the medical image processing method (inference method) and conditions necessary for execution via the operation unit 208. Can be done.

<Procedure of medical image processing method>
An example (recognition of a region of interest) of a medical image processing method using the endoscope system 10 will be described. It should be noted that the training of CNN562 using the training data and the conversion of the trained model (convolution layer generation process and convolution layer generation process, conversion model generation process and conversion model generation process; see FIG. 2 and the like) have been executed. And.

<Acquisition of endoscopic images>
The medical image acquisition unit 204A (processor) acquires an endoscopic image (moving image of a subject; observation image, medical image) as an example of time-series data (data acquisition step, data acquisition process). The medical image acquisition unit 204A may acquire an endoscope image taken by the endoscope scope 100, or may acquire an endoscope image recorded by the recording unit 207. The recording control unit 204D can record the acquired endoscopic image in the recording unit 207.

<Inference (recognition of area of interest)>
The inference unit 204B (processor) recognizes the region of interest from the observed image using CNN563 (learned model, transformation model) (inference step, inference processing). Recognition of areas of interest includes detection and discrimination. FIG. 11 is a diagram showing the state of inference using the conversion model, and the inference unit 204B inputs a moving image (time series data) of the subject into CNN563 to obtain a detection result and a discrimination result (inference result). It is preferable that at least a part of the inference process (inference processing) is performed on a parallel computing device such as a GPU. Further, the reasoning unit 204B may refer to the individual information of the endoscope scope 100 in the above recognition (inference).

<Display of observation image>
The display control unit 204C causes the display device to identify and display the observed image (display control step). At this time, the display control unit 204C may identify and display the region of interest (display of characters, figures, symbols indicating the region of interest, coloring of the region of interest, etc.).

The main control unit 210 and the image processing unit 204 repeat the above-mentioned processing until the observation is completed.

As described above, according to the endoscope system 10, by using the conversion model CNN563 at the time of inference, it is possible to perform inference while reducing the memory access cost.

<Application to other than medical endoscopes>
The method of the present invention (trained model conversion method, inference method, trained model conversion device, inference device, trained model) is generally applied to a system that performs real-time processing using a convolutional neural network in addition to a medical endoscope. can do. For example, medical equipment such as ultrasonic inspection equipment and X-ray fluoroscopy equipment that handle time-series data (moving images of subjects), industrial endoscopes, machine vision, face recognition and security cameras with digital cameras, automobiles and flying. It can be applied to object recognition by a camera mounted on a moving body such as a body, thereby making inferences while reducing memory access costs.

(Additional note)
In addition to the embodiments described above, the configurations described below are also included in the scope of the present invention.

(Appendix 1)
The medical image analysis processing unit (inference unit) detects a region of interest, which is a region of interest, based on the feature amount of pixels of the medical image.
The medical image analysis result acquisition unit is a medical image processing device (inference device) that acquires the analysis results of the medical image analysis processing unit.

(Appendix 2)
The medical image analysis processing unit detects the presence or absence of a noteworthy object based on the feature amount of the pixel of the medical image.
The medical image analysis result acquisition unit is a medical image processing device that acquires the analysis results of the medical image analysis processing unit.

(Appendix 3)
The medical image analysis result acquisition department
Obtained from a recording device that records the analysis results (input data, time series data) of medical images,
The analysis result is a medical image processing apparatus that is either or both of a region of interest (region of interest) that is a region of interest included in a medical image and the presence or absence of an object of interest.

(Appendix 4)
A medical image is a medical image processing device that is a normal optical image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band.

(Appendix 5)
A medical image is an image obtained by irradiating light in a specific wavelength band.
A medical image processing device in which a specific wavelength band is narrower than the white wavelength band.

(Appendix 6)
A medical image processing device in which a specific wavelength band is a blue or green band in the visible range.

(Appendix 7)
The specific wavelength band includes a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less. Image processing device.

(Appendix 8)
A specific wavelength band is a medical image processing device that is a red band in the visible range.

(Appendix 9)
The specific wavelength band includes a wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less. Image processing device.

(Appendix 10)
The specific wavelength band includes a wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin, and the light in the specific wavelength band has a peak wavelength in the wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin. Medical image processing equipment.

(Appendix 11)
The specific wavelength band includes a wavelength band of 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm or less, and light in the specific wavelength band is 400 ± 10 nm, 440 ± 10 nm, 470 ±. A medical image processing apparatus having a peak wavelength in a wavelength band of 10 nm or 600 nm or more and 750 nm or less.

(Appendix 12)
A medical image is an in-vivo image that shows the inside of a living body.
An in-vivo image is a medical image processing device that has information on fluorescence emitted by a fluorescent substance in the living body.

(Appendix 13)
Fluorescence is a medical image processing device obtained by irradiating a living body with excitation light having a peak of 390 or more and 470 nm or less.

(Appendix 14)
A medical image is an in-vivo image that shows the inside of a living body.
A specific wavelength band is a medical image processing device that is a wavelength band of infrared light.

(Appendix 15)
The specific wavelength band includes a wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less. Processing equipment.

(Appendix 16)
The medical image acquisition unit acquires a special optical image having information in a specific wavelength band based on a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band. Equipped with an optical image acquisition unit
Medical images are medical image processing devices that are special optical images.

(Appendix 17)
A medical image processing device that obtains a signal in a specific wavelength band by calculation based on RGB or CMY color information normally included in an optical image.

(Appendix 18)
By calculation based on at least one of a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in a white band and a special light image obtained by irradiating light in a specific wavelength band. Equipped with a feature amount image generation unit that generates feature amount images,
A medical image is a medical image processing device that is a feature image.

(Appendix 19)
The medical image processing apparatus according to any one of Supplementary note 1 to 18.
An endoscope that irradiates at least one of light in a white wavelength band or light in a specific wavelength band to acquire an image, and
Endoscope device (inference device).

(Appendix 20)
A diagnostic support device (inference device) including the medical image processing device according to any one of Supplementary note 1 to 18.

(Appendix 21)
A medical work support device (inference device) including the medical image processing device according to any one of Supplementary note 1 to 18.

Although the embodiments and other examples of the present invention have been described above, the present invention is not limited to the above-described aspects, and various modifications can be made without departing from the spirit of the present invention.

10 Endoscope system 100 Endoscope scope 102 Hand operation part 104 Insertion part 106 Universal cable 108 Light guide connector 112 Flexible part 114 Curved part 116 Tip hard part 116A Tip side end face 123 Lighting part

123A Lighting lens

123B Lighting lens 126 Force opening 130 Imaging optical system 132 Imaging lens 134 Imaging element 136 Drive circuit 139 Scope information recording unit 141 Air supply / water supply button 142 Suction button 143 Function button 144 Imaging button 170 Light guide 200 Medical image processing device 202 Image input controller 204 Image processing unit 204A Medical image acquisition unit 204B Reasoning unit 204C Display control unit 204D Recording control unit 204E Scope information acquisition unit 205 Communication control unit 206 Video output unit 207 Recording unit 208 Operation unit 209 Audio processing unit 209A Speaker 210 Main control unit 211 ROM
212 RAM
300 Light source device 310 Light source 310B Blue light source 310G Green light source 310R Red light source 310V Purple light source 330 Aperture 340 Condensing lens 350 Light source control unit 400 Monitor 500 Model conversion device 510 Processor 512 Learning control unit 514 Conversion model generation unit 520 ROM
530 RAM
561A Conversion target layer 561B Conversion target layer

562A Input layer

562B Intermediate layer

562C Output layer 563 CNN

563A Input layer

563B Intermediate layer

563C Output layer 564 Convolution layer 565 Batch regularization layer 566 Fully connected layer 567 Convolution layer F1 filter F2 filter

Claims

A second convolution based on the trained parameters of the regularization layer and the trained parameters of the first convolution layer adjacent to the regularization layer for a trained convolutional neural network containing at least one regularization layer. The convolutional layer generation process to generate the layer,
A transformation model generation step of replacing the regularization layer and the first convolution layer with the second convolution layer to generate a transformation model which is a transformed trained model.
Trained model transformation method with.
In the convolution layer generation step, the same feature amount is input to each of the first processing unit composed of the first convolution layer and the regularized layer and the second processing unit composed of only the second convolution layer. The trained model transformation method according to claim 1, wherein the second convolution layer is generated so that the inference processing results are equal to each other.
The trained model transformation method according to claim 1 or 2, wherein the regularization layer is a batch regularization layer.
Data acquisition process to acquire input data and
An inference step of inputting the input data into the transformation model obtained by the trained model transformation method according to any one of claims 1 to 3 to obtain an inference result.
Inference method with.
The inference method according to claim 4, wherein at least a part of the inference process is executed by a parallel computing device.
The inference method according to claim 4 or 5, wherein in the data acquisition process, time series data is acquired as the input data.
The inference method according to claim 6, wherein in the data acquisition step, a moving image of a subject is acquired as the input data.
A trained model converter with a processor
The processor
A second convolution based on the trained parameters of the regularization layer and the trained parameters of the first convolution layer adjacent to the regularization layer for a trained convolutional neural network containing at least one regularization layer. Convolutional layer generation process to generate layers and
A transformation model generation process in which the regularization layer and the first convolution layer are replaced with the second convolution layer to generate a transformation model which is a transformed trained model.
A trained model transforming device that runs.
A trained model used by a computer to output inference results for input data.
The processor of the trained model converter
A second convolution based on the trained parameters of the regularization layer and the trained parameters of the first convolution layer adjacent to the regularization layer for a trained convolutional neural network containing at least one regularization layer. Convolutional layer generation process to generate layers and
A transformation model generation process in which the regularization layer and the first convolution layer are replaced with the second convolution layer to generate a transformation model which is a transformed trained model.
Obtained by running,
Trained model.
With the processor
The trained model according to claim 9 and
Equipped with
The processor
Data acquisition process to acquire input data and
Inference processing that inputs the input data to the trained model and obtains inference results,
An inference device that runs.
The inference device according to claim 10, wherein the processor includes a parallel computing device that executes at least a part of the inference processing.