WO2022064901A1 - Trained model transformation method, inference method, trained model transformation device, trained model, and inference device - Google Patents

Trained model transformation method, inference method, trained model transformation device, trained model, and inference device Download PDF

Info

Publication number
WO2022064901A1
WO2022064901A1 PCT/JP2021/030212 JP2021030212W WO2022064901A1 WO 2022064901 A1 WO2022064901 A1 WO 2022064901A1 JP 2021030212 W JP2021030212 W JP 2021030212W WO 2022064901 A1 WO2022064901 A1 WO 2022064901A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
inference
trained
convolution
trained model
Prior art date
Application number
PCT/JP2021/030212
Other languages
French (fr)
Japanese (ja)
Inventor
駿平 加門
Original Assignee
富士フイルム株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士フイルム株式会社 filed Critical 富士フイルム株式会社
Priority to JP2022551194A priority Critical patent/JPWO2022064901A5/en
Publication of WO2022064901A1 publication Critical patent/WO2022064901A1/en
Priority to US18/188,449 priority patent/US20230230369A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/031Recognition of patterns in medical or anatomical images of internal organs

Definitions

  • the present invention relates to a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device.
  • Patent Document 1 describes that an equivalent neural network excluding the bias of the convolution operation is used. Further, Patent Document 2 describes that when the batch normalization layer is in front of the convolution layer, the feature index is converted by a set of parameters.
  • One embodiment of the present invention provides a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device that can reduce the processing cost by the regularization layer.
  • the trained parameters of the regularization layer and the first adjoining the regularization layer are adjacent to the trained convolutional neural network.
  • a convolutional layer generation step that generates a second convolutional layer based on the trained parameters of the convolutional layer, and a transformation that is a transformed trained model by replacing the regularization layer and the first convolutional layer with the second convolutional layer. It has a transformation model generation step of generating a model.
  • the trained model transformation method is the first aspect, in which the convolution layer generation step is composed of only the first processing unit and the second convolution layer composed of the first convolution layer and the regularized layer. In each of the second processing units, the second convolution layer is generated so that the inference processing results when the same feature amount is input are equal.
  • the trained model transformation method according to the third aspect is the first or second aspect, in which the regularization layer is a batch regularization layer.
  • the inference method according to the fourth aspect of the present invention is input to the conversion model obtained by the data acquisition step of acquiring the input data and the trained model conversion method according to any one of the first to third aspects. It has an inference process of inputting data and obtaining an inference result.
  • the inference method according to the fifth aspect is the fourth aspect, in which at least a part of the inference process is executed by the parallel computing apparatus.
  • the inference method according to the sixth aspect is the fourth or fifth aspect, in which time series data is acquired as input data in the data acquisition process.
  • the inference method according to the seventh aspect is the sixth aspect, in which a moving image of a subject is acquired as input data in the data acquisition step.
  • the trained model converter comprises a processor, wherein the trained convolutional neural network including at least one regularization layer is subject to the trained parameters of the regularization layer and regularization.
  • a convolutional layer generation process that generates a second convolutional layer based on the trained parameters of the first convolutional layer adjacent to the layer, and learning that is converted by replacing the regularized layer and the first convolutional layer with the second convolutional layer.
  • Executes a transformation model generation process that generates a transformation model that is a completed model.
  • the trained model according to the ninth aspect of the present invention is a trained model used by a computer to output an inference result for input data, and the processor of the trained model converter has at least one regularization layer.
  • a convolutional layer generation process that generates a second convolutional layer based on the trained parameters of the regularization layer and the trained parameters of the first convolutional layer adjacent to the regularization layer for the included convolutional neural network.
  • a trained model obtained by replacing the regularization layer and the first convolutional layer with the second convolutional layer, and performing a transformation model generation process of generating a transformation model which is a transformed trained model. ..
  • the inference device includes a processor and a trained model according to the ninth aspect, and the processor includes a data acquisition process for acquiring input data and input data to the trained model. It is an inference device that executes inference processing that inputs and obtains inference results.
  • the processor includes a parallel calculation processing device that executes at least a part of the inference processing.
  • FIG. 1 is a diagram showing a configuration of a trained model transformation device.
  • FIG. 2 is a diagram showing a state of training of a convolutional neural network and conversion of a trained model.
  • FIG. 3 is a diagram showing a configuration example of a convolutional neural network.
  • FIG. 4 is a diagram showing a state of the convolution process by the filter.
  • FIG. 5 is a diagram showing a state of formation of a convolutional layer.
  • FIG. 6 is another diagram showing how the convolutional layer is formed.
  • FIG. 7 is a diagram showing a configuration example of a converted convolutional neural network.
  • FIG. 8 is an external view of an endoscope system as an aspect of an inference device.
  • FIG. 9 is a block diagram showing a configuration of a main part of the endoscope system.
  • FIG. 10 is a functional block diagram of the image processing unit.
  • FIG. 11 is a diagram showing a state of inference using a transformation model.
  • FIG. 1 is a diagram showing a configuration of a trained model transformation device 500 (trained model transformation device).
  • the trained model conversion device 500 includes a processor 510 (processor, computer) including a learning control unit 512 and a conversion model generation unit 514, a ROM 520 (ROM: Read Only Memory, a non-temporary recording medium, a memory), and a RAM 530 (RAM). : Random Access Memory) and.
  • the processor 510 is driven by various processors and / or electrical circuits, similar to the main control unit 210 and the image processing unit 204 of the endoscope system 10 (see FIGS. 8 to 11 and related to these figures) described later. Can be configured.
  • the ROM 520 stores a computer-readable code of a trained model conversion program (a program that causes a computer to execute the trained model conversion method according to the present invention) and various data necessary for executing the trained model conversion method. ing. Codes and data may be stored in EEPROM (Electronically Erasable and Programmable Read Only Memory) or flash memory instead of ROM 520.
  • the RAM 530 is used as a temporary storage area or a work area during processing.
  • the trained model conversion device 500 trains a convolutional neural network (CNN) and converts a trained model according to the above configuration.
  • Part (a) of FIG. 2 shows how the CNN 560 before learning becomes CNN562, which is a trained model under the control of the learning control unit 512, and part (b) of FIG. 2 shows CNN562 (learned model). Shows how CNN563 (conversion model) is formed by the control of the conversion model generation unit 514 (convolution layer generation step, conversion model generation step).
  • the type of processing device (CPU, GPU, etc.) that converts the trained model is not particularly limited.
  • FIG. 3 is a diagram showing a configuration example of CNN562 (convolutional neural network; trained model) (CNN560 has a similar configuration).
  • the CNN 562 has an input layer 562A, an intermediate layer 562B, and an output layer 562C.
  • the input layer 562A inputs time-series data (for example, a moving image of a subject, but is not limited to this; input data) and outputs a feature amount.
  • the intermediate layer 562B includes a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and the feature amount output by the input layer 562A is input to calculate other feature amounts.
  • the convolution layer 564 has a structure in which a plurality of "nodes" are connected by “edges”, and a weighting coefficient applied to the input image is associated with the node and the edge and stored in a weighting coefficient storage unit (not shown). It is remembered. The value of the weighting coefficient changes from the initial state (value in CNN560) as the learning progresses, and in CNN562 (trained model), the weighting coefficient in the state where the learning is completed is used.
  • the intermediate layer 562B calculates the feature amount by the convolution calculation.
  • the convolution operation performed in the convolution layer 564 is a process of acquiring a feature map by a convolution operation using a filter, and plays a role of feature extraction such as edge extraction from an image. By the convolution operation using this filter, one channel (one sheet) of "feature map" is generated for one filter. When downscaled by convolution, the size of the "feature map" becomes smaller as each layer is convolved.
  • the intermediate layer 562B can be composed of one or a plurality of layers that are subjected to the convolution process.
  • FIG. 4 is a diagram showing a state of the convolution process by the filter.
  • an image set (learning image set during learning, inference image set during inference) composed of a plurality of medical images (input data) and a filter F 1
  • the convolution operation with is performed.
  • the image set is composed of N images (N channels) having an image size of H in the vertical direction and W in the horizontal direction.
  • the images constituting the image set are images of three channels of R (red), G (green), and B (blue).
  • the filter size is 5 ⁇ 5 ⁇ N.
  • the filter F 2 used in the second convolution layer has a filter size of 3 ⁇ 3 ⁇ M, for example, in the case of a filter of size 3 (3 ⁇ 3).
  • the second to nth convolution layers perform a convolution operation using the filters F2 to Fn.
  • the size of the "feature map" in the nth convolution layer is smaller than the size of the "feature map” in the second convolution layer because it is downscaled by the convolution layers up to the previous stage.
  • low-order feature extraction is performed in the convolutional layer near the input side
  • higher-order feature extraction is performed as the intermediate layer 562B approaches the output side. Extraction) is performed.
  • ⁇ Regularization> In learning with a convolutional neural network, the internal covariate shift is suppressed by inserting a regularization layer, so improvement in convergence speed and accuracy can be expected.
  • the regularized layer statistics such as the average and variance of the features are calculated and the features are whitened.
  • regularization layers There are several types of regularization layers, and these normalization layers differ in the range in which the above-mentioned statistics are calculated, as described below.
  • the feature value is defined as f (b, x, y, c).
  • b, x, y, and c are the indexes of the batch, X-axis, Y-axis, and channel, respectively.
  • the average and variance are calculated as the following formulas (9) and (10), respectively, and whitening is performed for each group of each batch.
  • the layer structure of the CNN 562 is not limited to the case where the convolution layer 564 and the batch regularization layer 565 are repeated one by one, and even if any one layer (for example, the convolution layer 564) is continuously included. good.
  • CNN562 may include a pooling layer.
  • the pooling process performed in the pooling layer is a process of reducing (or enlarging) the feature map output by the convolution operation to make a new feature map, so that the extracted features are not affected by translation or the like. Plays the role of giving robustness to.
  • the CNN 562 may include a fully bonded layer 566 as in the example shown in the portion (b) of FIG.
  • the batch regularization layer is often placed in front of, behind, or before and after the convolutional layer.
  • the batch regularization process can be integrated into adjacent convolution layers (convolution and batch regularization are combined into one convolution) only at the time of inference by the method described below.
  • the method of the present invention is highly effective when performing inference on a parallel computing device (such as a GPU) in which the memory access cost becomes more dominant in processing time.
  • the method of the present invention is only converted at the time of inference processing, and the model at the time of learning includes a batch regularization layer. That is, while obtaining the effect on the learning of batch regularization, the processing cost can be omitted at the time of inference processing of the trained model.
  • FIG. 5 is a diagram showing a state of conversion (pattern 1) of the trained model.
  • Pattern 1 is processing when the convolution layer 564 is on the input side and the batch regularization layer 565 is on the output side, as in the conversion target layer 561A (first processing unit) in FIG.
  • the convolution and batch regularization processing is performed by the following equation (11). ) And (12).
  • W and b are trained parameters of the convolution layer 564 (first convolution layer), and ⁇ , ⁇ , ⁇ , and ⁇ are trained parameters of the batch regularization layer 565 (regularization layer).
  • processing from x to z can be realized by convolution having W tilde (weight parameter at the time of convolution in the second convolution layer 567) and b tilde (bias component in the second convolution layer 567) as parameters.
  • W tilde weight parameter at the time of convolution in the second convolution layer 567
  • b tilde bias component in the second convolution layer 567
  • the conversion model generation unit 514 performs the convolution in the convolution layer 564 and the processing in the batch regularization layer 565 for the result in one convolution layer 567 (second convolution). It is possible to convert to a layer (second processing unit) (generation of a second convolution layer, convolution layer generation step, convolution layer generation process). That is, in the convolution layer generation step, the conversion model generation unit 514 has a first processing unit composed of a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and a convolution layer 567 (second).
  • the convolution layer 567 (for example, the above-mentioned z) has the same inference processing result (for example, the above-mentioned z) when the same feature amount (for example, x) is input. 2 convolutional layers) can be generated.
  • x, y, z are vectors
  • W and W tildes are weight coefficient matrices
  • b and b tildes are also a matrix showing the bias component.
  • the conversion model generation unit 514 replaces the conversion target layer 561A (convolution layer 564 and batch regularization layer 565; see FIG. 3) with the convolution layer 567 (second convolution layer, second processing unit), and has been converted and trained. Generate a transformation model that is a model (convolution model generation process, transformation model generation process).
  • FIG. 6 is a diagram showing a state of conversion (pattern 2) of the trained model.
  • Pattern 2 is a process in which the batch regularization layer 565 is on the input side and the convolution layer 564 is on the output side, as in the conversion target layer 561B (first processing unit) in FIG.
  • the batch regularization and convolution processing is performed by the following equation (16). ) And (17).
  • processing from x to z can be realized by convolution having W tilde (weight parameter at the time of convolution in the second convolution layer 567) and b tilde (bias component in the second convolution layer 567) as parameters.
  • W tilde weight parameter at the time of convolution in the second convolution layer 567
  • b tilde bias component in the second convolution layer 567
  • the conversion model generation unit 514 combines the processing in the batch regularization layer 565 and the convolution in the convolution layer 564 with respect to the result into one convolution layer 567 (second). It is possible to convert to a convolutional layer (second processing unit) (generation of a second convolutional layer, convolutional layer generation step, convolutional layer generation processing). That is, in the convolution layer generation step, the conversion model generation unit 514 has a first processing unit composed of a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and a convolution layer 567 (second).
  • the convolution layer 567 (for example, the above-mentioned z) has the same inference processing result (for example, the above-mentioned z) when the same feature amount (for example, x) is input. 2 convolutional layers) can be generated.
  • x, y, z are vectors
  • W and W tildes are weight coefficient matrices
  • b and b tildes are also a matrix showing the bias component.
  • the conversion model generation unit 514 replaces the conversion target layer 561A (convolution layer 564 and batch regularization layer 565; see FIG. 3) with the convolution layer 567 (second convolution layer, second processing unit), and has been converted and trained. Generate a transformation model that is a model (convolution model generation process, transformation model generation process).
  • FIG. 7 is a diagram showing a configuration example of a transformed convolutional neural network (transformed trained model, transformed model).
  • Part (a) of FIG. 7 shows CNN563 (conversion model; no fully connected layer) corresponding to part (a) of FIG. 3, and part (b) of FIG. 3 corresponds to part (b) of FIG. (Conversion model; with fully connected layer) is shown.
  • CNN563 includes an input layer 563A, an intermediate layer 563B, and an output layer 563C.
  • FIG. 7 shows an example in which all the sets of the convolution layer 564 and the batch regularization layer 565 are converted and replaced with the convolution layer 567, but the conversion and replacement may be performed for some sets.
  • FIG. 8 is an external view of the endoscope system 10 (endoscope system, medical image processing device, reasoning device) as one aspect of the reasoning device
  • FIG. 9 shows a main part configuration of the endoscope system 10. It is a block diagram.
  • the endoscope system 10 includes an endoscope scope 100 (image acquisition unit, endoscope scope) and a medical image processing device 200 (medical image processing device, computer, processor, inference device).
  • a light source device 300 light source device
  • a monitor 400 display device, display
  • the endoscope scope 100 includes a hand operation unit 102 and an insertion unit 104 connected to the hand operation unit 102.
  • the operator grips and operates the hand operation unit 102, inserts the insertion unit 104 into the body of the subject (living body), and observes it.
  • the hand operation unit 102 is provided with an air supply / water supply button 141, a suction button 142, a function button 143 to which various functions are assigned, and a shooting button 144 for receiving a shooting instruction operation (still image, moving image). ..
  • the hand operation unit 102 is provided with a scope information recording unit 139 that records individual information (individual information, scope information) of the endoscope scope 100.
  • the individual information includes, for example, the type of the endoscope scope 100 (direct view or side view, etc.), the model, the individual identification number, the characteristics of the optical system (viewing angle, distortion, etc.), and the instrument used for treating the subject (viewing angle, distortion, etc.). Information on treatment tools, etc.).
  • the scope information acquisition unit 204E (scope information acquisition unit, individual information acquisition unit; see FIG. 10) of the image processing unit 204 acquires this individual information and processes it by the medical image processing apparatus 200 (image acquisition processing, inference processing, Used for display control processing).
  • the scope information recording unit 139 may be provided in another portion such as inside the light guide connector 108.
  • the insertion portion 104 is composed of a flexible portion 112, a curved portion 114, and a hard tip portion 116 in this order from the hand operation portion 102 side. That is, the curved portion 114 is connected to the proximal end side of the hard tip portion 116, and the flexible portion 112 is connected to the proximal end side of the curved portion 114.
  • the hand operation unit 102 is connected to the base end side of the insertion unit 104. The user can bend the curved portion 114 and change the direction of the hard tip portion 116 up, down, left and right by operating the hand operation portion 102.
  • the hard tip 116 is provided with an imaging optical system 130, an illumination unit 123, a forceps opening 126, and the like (see FIGS. 8 and 9).
  • white light and / or narrow band light red narrow band light, green narrow band light, One or more of blue narrow band light and purple narrow band light
  • white light and / or narrow band light red narrow band light, green narrow band light, One or more of blue narrow band light and purple narrow band light
  • cleaning water is discharged from a water supply nozzle (not shown) to clean the photographing lens 132 (photographing lens, photographing unit) of the photographing optical system 130 and the lighting lenses 123A and 123B. Can be done.
  • a duct (not shown) is communicated with the forceps opening 126 opened by the hard tip 116, and a treatment tool (not shown) for removing a tumor or the like is inserted into this duct, and the subject moves back and forth as appropriate. You can take the necessary measures.
  • a photographing lens 132 (photographing portion) is arranged on the tip end surface 116A of the tip rigid portion 116.
  • a CMOS (Complementary Metal-Oxide Semiconductor) type image sensor 134 image sensor, image acquisition unit), a drive circuit 136, and an AFE138 (AFE: Analog Front End) are arranged behind the photographing lens 132, and these elements are arranged. Outputs an image signal.
  • the image pickup element 134 is a color image pickup element, and is composed of a plurality of light receiving elements arranged in a matrix (two-dimensional arrangement) in a specific pattern arrangement (Bayer arrangement, X-Transs (registered trademark) arrangement, honeycomb arrangement, etc.).
  • Each pixel of the image sensor 134 includes a microlens, a red (R), green (G), or blue (B) color filter and a photoelectric conversion unit (photodiode or the like).
  • An image sensor in which the image sensor 134, the drive circuit 136, and the AFE 138 are included in one package may be used.
  • the photographing optical system 130 can also generate a color image from pixel signals of three colors of red, green, and blue, and generate an image from a pixel signal of any one or two colors of red, green, and blue. You can also do it.
  • the image sensor 134 may be an XY address type or a CCD (Charge Coupled Device) type.
  • each pixel of the image pickup element 134 may further include a purple color filter corresponding to a purple light source 310V and / or an infrared filter corresponding to an infrared light source.
  • the optical image of the subject is imaged on the light receiving surface (imaging surface) of the image pickup element 134 by the photographing lens 132, converted into an electric signal, and output to the medical image processing apparatus 200 via a signal cable (not shown) to form a video signal. Is converted to.
  • the endoscopic image (observation image, medical image) of the subject is displayed on the screen on the monitor 400 connected to the medical image processing device 200.
  • the illumination lenses 123A and 123B of the illumination portion 123 are provided adjacent to the photographing lens 132.
  • An ejection end of a light guide 170 which will be described later, is arranged behind the illumination lenses 123A and 123B, and the light guide 170 is inserted into an insertion portion 104, a hand operation portion 102, and a universal cable 106, and the light guide 170 is inserted.
  • the incident end is arranged within the light guide connector 108.
  • the user performs imaging at a predetermined frame rate while inserting or removing the endoscope scope 100 (insertion unit 104) having the above-described configuration into the living body as the subject (under the control of the medical image acquisition unit 204A). By doing so, it is possible to sequentially take time-series images of the living body (subject).
  • the light source device 300 includes a light source 310 for illumination, a diaphragm 330, a condenser lens 340, a light source control unit 350, and the like, and causes observation light to enter the light guide 170.
  • the light source 310 includes a red light source 310R, a green light source 310G, a blue light source 310B, and a purple light source 310V that irradiate narrow-band light of red, green, blue, and purple, respectively, and is narrow in red, green, blue, and purple. It can irradiate band light.
  • the illuminance of the observation light by the light source 310 is controlled by the light source control unit 350, and the illuminance of the observation light can be changed (increased or decreased) and the illumination can be stopped as needed.
  • the light source 310 can emit red, green, blue, and purple narrow band light in any combination.
  • narrow-band light of red, green, blue, and purple can be emitted at the same time to irradiate white light (normal light) as observation light, or one or two of them can be emitted to emit narrow-band light. It is also possible to irradiate light (special light).
  • the light source 310 may further include an infrared light source that irradiates infrared light (an example of narrow band light).
  • white light or narrow band light may be irradiated as observation light by a light source that irradiates white light and a filter that transmits white light and each narrow band light.
  • the light source 310 may be a light source having a white band or a light source having a plurality of wavelength bands as the light having a white band, or a light source having a specific wavelength band narrower than the white wavelength band.
  • the specific wavelength band may be a blue band or a green band in the visible region, or a red band in the visible region.
  • a specific wavelength band is a visible blue band or green band, it includes a wavelength band of 390 nm or more and 450 nm or less, or 530 nm or more and 550 nm or less, and peaks in a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less. It may have a wavelength. Further, when the specific wavelength band is the red band in the visible region, the wavelength band of 585 nm or more and 615 nm or less, or 610 nm or more and 730 nm or less is included, and the light of the specific wavelength band is 585 nm or more and 615 nm or less or 610 nm or more. It may have a peak wavelength in the wavelength band of 730 nm or less. However, nm represents "nanometer".
  • the specific wavelength band includes a wavelength band of 400 ⁇ 10 nm, 440 ⁇ 10 nm, 470 ⁇ 10 nm, or 600 nm or more and 750 nm, and 400 ⁇ 10 nm, 440 ⁇ 10 nm, 470 ⁇ 10 nm, or 600 nm or more and 750 nm. It may have a peak wavelength in the following wavelength band.
  • the light generated by the light source 310 may include a wavelength band of 790 nm or more and 820 nm or less, or 905 nm or more and 970 nm or less, and may have a peak wavelength in a wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less.
  • the light source 310 may include a light source that irradiates excitation light having a peak of 390 nm or more and 470 nm or less.
  • a medical image medical image, in-vivo image
  • a dye for the fluorescence method fluorestin, acridine orange, etc.
  • the light source type laser light source, xenon light source, LED light source (LED: Light-Emitting Diode), etc.
  • wavelength, presence / absence of filter, etc. of the light source 310 it is preferable to combine and / or switch the wavelength of the observation light according to the type, part, observation purpose, and the like of the subject.
  • switching the wavelength for example, by rotating a disk-shaped filter (rotary color filter) arranged in front of the light source and provided with a filter that transmits or blocks light of a specific wavelength, the wavelength of the irradiated light is switched. May be good.
  • the image pickup element used in the endoscope system 10 is not limited to the color image pickup element in which the color filter is arranged for each pixel as in the image pickup element 134, and may be a monochrome image pickup element.
  • the wavelength of the observation light can be sequentially switched to perform surface-sequential (color-sequential) imaging.
  • the wavelength of the emitted observation light may be sequentially switched between (purple, blue, green, red), or a rotary color filter (red, green, blue, purple, etc.) is irradiated with broadband light (white light). You may switch the wavelength of the observation light emitted by.
  • the wavelength of the observation light emitted by the rotary color filter may be switched by irradiating one or a plurality of narrow band lights (green, blue, purple, etc.).
  • the narrow band light may be infrared light having two or more wavelengths (first narrow band light, second narrow band light) having different wavelengths.
  • the observation light emitted from the light source device 300 is transmitted to the lighting lenses 123A and 123B via the light guide 170, and the lighting lens.
  • the observation range is irradiated from 123A and 123B.
  • the configuration of the medical image processing apparatus 200 will be described with reference to FIG.
  • the medical image processing apparatus 200 inputs an image signal output from the endoscope scope 100 by the image input controller 202, performs necessary image processing by the image processing unit 204 (processor, computer), and outputs the image signal from the video output unit 206. do.
  • the observation image (medical image, endoscopic image, in-vivo image) is displayed on the monitor 400 (display device).
  • the communication control unit 205 controls communication for acquiring medical images with an in-hospital system (HIS: Hospital Information System), an in-hospital LAN (Local Area Network), and / or an external system or network (not shown). I do.
  • FIG. 10 is a functional block diagram of the image processing unit 204.
  • the image processing unit 204 includes a medical image acquisition unit 204A (medical image acquisition unit), an inference unit 204B (inference unit, interest area recognition unit), a display control unit 204C (display control unit), and a recording control unit 204D (recording). It includes a control unit) and a scope information acquisition unit 204E (scope information acquisition unit).
  • the inference unit 204B includes a conversion model (CNN563 or the like shown in FIG. 7) obtained by the above-mentioned method (learned model transformation method according to the present invention). The details of the processing using these functions will be described later.
  • the image processing unit 204 uses the above-mentioned functions to recognize (infer) medical images, calculate features, process to emphasize or reduce components in a specific frequency band, and perform a specific target (region of interest, blood vessel of a desired depth). Etc.) can be emphasized or inconspicuous.
  • the image processing unit 204 acquires special light having information in a specific wavelength band based on a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band. It may be provided with an image acquisition unit.
  • the signal in a specific wavelength band is used for RGB (R: red, G: green, B: blue) or CMY (C: cyan, M: magenta, Y: yellow) color information contained in a normal optical image. It can be obtained by the calculation based on.
  • the image processing unit 204 includes a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band, and a special optical image obtained by irradiating light in a specific wavelength band.
  • a feature amount image generation unit that generates a feature amount image by an operation based on at least one of the above may be provided, and a feature amount image as a medical image (medical image) may be acquired and displayed. The above-mentioned processing is performed under the control of the main control unit 210.
  • the functions of the image processing unit 204 and the main control unit 210 described above can be realized by using various processors and recording media.
  • the various processors include, for example, a CPU (Central Processing Unit), which is a general-purpose processor that executes software (program) to realize various functions.
  • the circuit configuration is changed after manufacturing GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), etc., which are processors specialized in image processing and one aspect of parallel computing equipment.
  • a programmable logic device (PLD) which is a possible processor, is also included.
  • the above-mentioned various processors also include a dedicated electric circuit, which is a processor having a circuit configuration specially designed for executing a specific process such as an ASIC (Application Specific Integrated Circuit).
  • ASIC Application Specific Integrated Circuit
  • each part may be realized by one processor, or may be realized by a plurality of processors of the same type or different types (for example, a plurality of FPGAs, or a combination of a CPU and an FPGA, or a combination of a CPU and a GPU). Further, a plurality of functions may be realized by one processor. As an example of configuring a plurality of functions with one processor, first, as represented by a computer, one processor is configured by a combination of one or more CPUs and software, and this processor is used as a plurality of functions. There is a form to be realized.
  • SoC System On Chip
  • various functions are configured by using one or more of the above-mentioned various processors as a hardware structure.
  • the hardware-like structure of these various processors is, more specifically, an electric circuit (circuitry) in which circuit elements such as semiconductor elements are combined.
  • These electric circuits may be electric circuits that realize the above-mentioned functions by using logical sum, logical product, logical denial, exclusive OR, and logical operations combining these.
  • the above-mentioned processor or electric circuit executes software (program), it can be read by a computer of the software (for example, various processors and electric circuits constituting the image processing unit 204, and / or a combination thereof).
  • the code is stored in a non-temporary recording medium such as ROM 211 (ROM: ReadOnlyMemory) or flash memory (not shown), and the computer refers to the software.
  • the software stored in the non-temporary recording medium includes a program for executing the medical image processing method (method of operating the medical image processing device) according to the present invention and data used for executing the medical image processing method (data related to acquisition of medical images).
  • the code may be recorded on a non-temporary recording medium such as various optical magnetic recording devices and semiconductor memories instead of the ROM 211.
  • RAM212 RAM: Random Access Memory
  • EEPROM Electrically Erasable and Programmable Read Only Memory
  • the recording unit 207 may be used as a “non-temporary recording medium”.
  • the ROM 211 (ROM: ReadOnlyMemory) is a non-volatile storage element (non-temporary recording medium), and various image processing methods (including the medical image processing method according to the present invention) are used in the main control unit 210 and /.
  • a computer-readable code of a program to be executed by the image processing unit 204 (computer) is stored.
  • the RAM 212 (RAM: Random Access Memory) is a storage element for temporary storage during various processes, and can also be used as a buffer for image acquisition.
  • the voice processing unit 209 outputs a message (voice) related to medical image processing, inference results of the region of interest, notification, etc. from the speaker 209A (notification unit, speaker) under the control of the main control unit 210 and the image processing unit 204.
  • the image processing unit 204 and / or the main control unit 210 are configured using a GPU, which is one aspect of a parallel computing device, and will be described later. It is effective to execute at least a part of the inference process (inference process) on the GPU.
  • the operation unit 208 can be configured by a device such as a keyboard and a mouse (not shown), and the user sets an execution instruction of the medical image processing method (inference method) and conditions necessary for execution via the operation unit 208. Can be done.
  • the medical image acquisition unit 204A acquires an endoscopic image (moving image of a subject; observation image, medical image) as an example of time-series data (data acquisition step, data acquisition process).
  • the medical image acquisition unit 204A may acquire an endoscope image taken by the endoscope scope 100, or may acquire an endoscope image recorded by the recording unit 207.
  • the recording control unit 204D can record the acquired endoscopic image in the recording unit 207.
  • the inference unit 204B recognizes the region of interest from the observed image using CNN563 (learned model, transformation model) (inference step, inference processing). Recognition of areas of interest includes detection and discrimination.
  • FIG. 11 is a diagram showing the state of inference using the conversion model, and the inference unit 204B inputs a moving image (time series data) of the subject into CNN563 to obtain a detection result and a discrimination result (inference result). It is preferable that at least a part of the inference process (inference processing) is performed on a parallel computing device such as a GPU. Further, the reasoning unit 204B may refer to the individual information of the endoscope scope 100 in the above recognition (inference).
  • the display control unit 204C causes the display device to identify and display the observed image (display control step). At this time, the display control unit 204C may identify and display the region of interest (display of characters, figures, symbols indicating the region of interest, coloring of the region of interest, etc.).
  • the main control unit 210 and the image processing unit 204 repeat the above-mentioned processing until the observation is completed.
  • the method of the present invention is generally applied to a system that performs real-time processing using a convolutional neural network in addition to a medical endoscope.
  • a system that performs real-time processing using a convolutional neural network in addition to a medical endoscope.
  • medical equipment such as ultrasonic inspection equipment and X-ray fluoroscopy equipment that handle time-series data (moving images of subjects), industrial endoscopes, machine vision, face recognition and security cameras with digital cameras, automobiles and flying. It can be applied to object recognition by a camera mounted on a moving body such as a body, thereby making inferences while reducing memory access costs.
  • the medical image analysis processing unit detects a region of interest, which is a region of interest, based on the feature amount of pixels of the medical image.
  • the medical image analysis result acquisition unit is a medical image processing device (inference device) that acquires the analysis results of the medical image analysis processing unit.
  • the medical image analysis processing unit detects the presence or absence of a noteworthy object based on the feature amount of the pixel of the medical image.
  • the medical image analysis result acquisition unit is a medical image processing device that acquires the analysis results of the medical image analysis processing unit.
  • the medical image analysis result acquisition department Obtained from a recording device that records the analysis results (input data, time series data) of medical images,
  • the analysis result is a medical image processing apparatus that is either or both of a region of interest (region of interest) that is a region of interest included in a medical image and the presence or absence of an object of interest.
  • a medical image is a medical image processing device that is a normal optical image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band.
  • a medical image is an image obtained by irradiating light in a specific wavelength band.
  • a medical image processing device in which a specific wavelength band is narrower than the white wavelength band.
  • a medical image processing device in which a specific wavelength band is a blue or green band in the visible range.
  • the specific wavelength band includes a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less.
  • Image processing device includes a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less.
  • a specific wavelength band is a medical image processing device that is a red band in the visible range.
  • the specific wavelength band includes a wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less.
  • Image processing device includes a wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less.
  • the specific wavelength band includes a wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin, and the light in the specific wavelength band has a peak wavelength in the wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin.
  • Medical image processing equipment includes a wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin, and the light in the specific wavelength band has a peak wavelength in the wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin.
  • the specific wavelength band includes a wavelength band of 400 ⁇ 10 nm, 440 ⁇ 10 nm, 470 ⁇ 10 nm, or 600 nm or more and 750 nm or less, and light in the specific wavelength band is 400 ⁇ 10 nm, 440 ⁇ 10 nm, 470 ⁇ .
  • a medical image processing apparatus having a peak wavelength in a wavelength band of 10 nm or 600 nm or more and 750 nm or less.
  • a medical image is an in-vivo image that shows the inside of a living body.
  • An in-vivo image is a medical image processing device that has information on fluorescence emitted by a fluorescent substance in the living body.
  • Fluorescence is a medical image processing device obtained by irradiating a living body with excitation light having a peak of 390 or more and 470 nm or less.
  • a medical image is an in-vivo image that shows the inside of a living body.
  • a specific wavelength band is a medical image processing device that is a wavelength band of infrared light.
  • the specific wavelength band includes a wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less. Processing equipment.
  • the medical image acquisition unit acquires a special optical image having information in a specific wavelength band based on a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band. Equipped with an optical image acquisition unit Medical images are medical image processing devices that are special optical images.
  • a medical image is a medical image processing device that is a feature image.
  • a diagnostic support device including the medical image processing device according to any one of Supplementary note 1 to 18.
  • a medical work support device including the medical image processing device according to any one of Supplementary note 1 to 18.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device that can reduce the processing cost caused by a regularization layer. The trained model transformation method according to one aspect of the present invention comprises: a convolutional layer generation process for generating a second convolution layer, for a trained convolutional neural network containing at least one regularization layer, on the basis of the trained parameters of the regularization layer and the trained parameters of the first convolutional layer adjacent to the regularization layer; and a transformation model generation process for replacing the regularization layer and the first convolution layer with the second convolution layer to generate a transformation model that is a transformed trained model.

Description

学習済みモデル変換方法、推論方法、学習済みモデル変換装置、学習済みモデル、及び推論装置Trained model transformation method, inference method, trained model transformation device, trained model, and inference device
 本発明は学習済みモデル変換方法、推論方法、学習済みモデル変換装置、学習済みモデル、及び推論装置に関する。 The present invention relates to a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device.
 特許文献1には、畳み込み演算のバイアスを除いた等価なニューラルネットワークを用いることが記載されている。また、特許文献2には、バッチ正規化層の前が畳み込み層の場合、特徴インデックスに対する1組のパラメータによる変換を行うことが記載されている。 Patent Document 1 describes that an equivalent neural network excluding the bias of the convolution operation is used. Further, Patent Document 2 describes that when the batch normalization layer is in front of the convolution layer, the feature index is converted by a set of parameters.
特開2019-57072号公報Japanese Unexamined Patent Publication No. 2019-57072 特開2019-71080号公報Japanese Unexamined Patent Publication No. 2019-71080
 本発明の一つの実施形態は、正則化層による処理コストを低減することができる学習済みモデル変換方法、推論方法、学習済みモデル変換装置、学習済みモデル、及び推論装置を提供する。 One embodiment of the present invention provides a trained model transformation method, an inference method, a trained model transformation device, a trained model, and an inference device that can reduce the processing cost by the regularization layer.
 本発明の第1の態様に係る学習済みモデル変換方法は、少なくとも1つの正則化層を含む学習済みの畳み込みニューラルネットワークに対し、正則化層の学習済みパラメータと、正則化層に隣接する第1畳み込み層の学習済みパラメータと、に基づいて第2畳み込み層を生成する畳み込み層生成工程と、正則化層及び第1畳み込み層を第2畳み込み層に入れ替えて、変換された学習済みモデルである変換モデルを生成する変換モデル生成工程と、を有する。 In the trained model conversion method according to the first aspect of the present invention, for a trained convolutional neural network including at least one regularization layer, the trained parameters of the regularization layer and the first adjoining the regularization layer are adjacent to the trained convolutional neural network. A convolutional layer generation step that generates a second convolutional layer based on the trained parameters of the convolutional layer, and a transformation that is a transformed trained model by replacing the regularization layer and the first convolutional layer with the second convolutional layer. It has a transformation model generation step of generating a model.
 第2の態様に係る学習済みモデル変換方法は第1の態様において、畳み込み層生成工程では、第1畳み込み層及び正則化層から構成される第1処理部と第2畳み込み層のみから構成される第2処理部のそれぞれにおいて、同一の特徴量を入力した場合の推論処理結果が等しくなるように第2畳み込み層を生成する。 The trained model transformation method according to the second aspect is the first aspect, in which the convolution layer generation step is composed of only the first processing unit and the second convolution layer composed of the first convolution layer and the regularized layer. In each of the second processing units, the second convolution layer is generated so that the inference processing results when the same feature amount is input are equal.
 第3の態様に係る学習済みモデル変換方法は第1または第2の態様において、正則化層はバッチ正則化層である。 The trained model transformation method according to the third aspect is the first or second aspect, in which the regularization layer is a batch regularization layer.
 本発明の第4の態様に係る推論方法は、入力データを取得するデータ取得工程と、第1から第3の態様のいずれか1つに係る学習済みモデル変換方法により得られた変換モデルに入力データを入力して推論結果を得る推論工程と、を有する。 The inference method according to the fourth aspect of the present invention is input to the conversion model obtained by the data acquisition step of acquiring the input data and the trained model conversion method according to any one of the first to third aspects. It has an inference process of inputting data and obtaining an inference result.
 第5の態様に係る推論方法は第4の態様において、推論工程の少なくとも一部を並列計算処理装置で実行する。 The inference method according to the fifth aspect is the fourth aspect, in which at least a part of the inference process is executed by the parallel computing apparatus.
 第6の態様に係る推論方法は第4または第5の態様において、データ取得工程では入力データとして時系列データを取得する。 The inference method according to the sixth aspect is the fourth or fifth aspect, in which time series data is acquired as input data in the data acquisition process.
 第7の態様に係る推論方法は第6の態様において、データ取得工程では入力データとして被検体の動画像を取得する。 The inference method according to the seventh aspect is the sixth aspect, in which a moving image of a subject is acquired as input data in the data acquisition step.
 本発明の第8の態様に係る学習済みモデル変換装置はプロセッサを備え、プロセッサは、少なくとも1つの正則化層を含む学習済みの畳み込みニューラルネットワークに対し、正則化層の学習済みパラメータと、正則化層に隣接する第1畳み込み層の学習済みパラメータと、に基づいて第2畳み込み層を生成する畳み込み層生成処理と、正則化層及び第1畳み込み層を第2畳み込み層に入れ替えて変換された学習済みモデルである変換モデルを生成する変換モデル生成処理と、を実行する。 The trained model converter according to the eighth aspect of the present invention comprises a processor, wherein the trained convolutional neural network including at least one regularization layer is subject to the trained parameters of the regularization layer and regularization. A convolutional layer generation process that generates a second convolutional layer based on the trained parameters of the first convolutional layer adjacent to the layer, and learning that is converted by replacing the regularized layer and the first convolutional layer with the second convolutional layer. Executes a transformation model generation process that generates a transformation model that is a completed model.
 本発明の第9の態様に係る学習済みモデルは、コンピュータが入力データに対する推論結果を出力させるために用いる学習済みモデルであって、学習済みモデル変換装置のプロセッサが、少なくとも1つの正則化層を含む学習済みの畳み込みニューラルネットワークに対し、正則化層の学習済みパラメータと、正則化層に隣接する第1畳み込み層の学習済みパラメータと、に基づいて第2畳み込み層を生成する畳み込み層生成処理と、正則化層及び第1畳み込み層を第2畳み込み層に入れ替えて、変換された学習済みモデルである変換モデルを生成する変換モデル生成処理と、を実行することにより得られる、学習済みモデルである。 The trained model according to the ninth aspect of the present invention is a trained model used by a computer to output an inference result for input data, and the processor of the trained model converter has at least one regularization layer. A convolutional layer generation process that generates a second convolutional layer based on the trained parameters of the regularization layer and the trained parameters of the first convolutional layer adjacent to the regularization layer for the included convolutional neural network. , A trained model obtained by replacing the regularization layer and the first convolutional layer with the second convolutional layer, and performing a transformation model generation process of generating a transformation model which is a transformed trained model. ..
 本発明の第10の態様に係る推論装置は、プロセッサと、第9の態様に係る学習済みモデルと、を備え、プロセッサは、入力データを取得するデータ取得処理と、学習済みモデルに入力データを入力して推論結果を得る推論処理と、を実行する推論装置である。 The inference device according to the tenth aspect of the present invention includes a processor and a trained model according to the ninth aspect, and the processor includes a data acquisition process for acquiring input data and input data to the trained model. It is an inference device that executes inference processing that inputs and obtains inference results.
 第11の態様に係る推論装置は第10の態様において、プロセッサは推論処理の少なくとも一部を実行する並列計算処理装置を備える。 In the tenth aspect of the inference device according to the eleventh aspect, the processor includes a parallel calculation processing device that executes at least a part of the inference processing.
図1は、学習済みモデル変換装置の構成を示す図である。FIG. 1 is a diagram showing a configuration of a trained model transformation device. 図2は、畳み込みニューラルネットワークの学習及び学習済みモデルの変換の様子を示す図である。FIG. 2 is a diagram showing a state of training of a convolutional neural network and conversion of a trained model. 図3は、畳み込みニューラルネットワークの構成例を示す図である。FIG. 3 is a diagram showing a configuration example of a convolutional neural network. 図4は、フィルタによる畳み込み処理の様子を示す図である。FIG. 4 is a diagram showing a state of the convolution process by the filter. 図5は、畳み込み層の生成の様子を示す図である。FIG. 5 is a diagram showing a state of formation of a convolutional layer. 図6は、畳み込み層の生成の様子を示す他の図である。FIG. 6 is another diagram showing how the convolutional layer is formed. 図7は、変換された畳み込みニューラルネットワークの構成例を示す図である。FIG. 7 is a diagram showing a configuration example of a converted convolutional neural network. 図8は、推論装置の一態様としての内視鏡システムの外観図である。FIG. 8 is an external view of an endoscope system as an aspect of an inference device. 図9は、内視鏡システムの要部構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a main part of the endoscope system. 図10は、画像処理部の機能ブロック図である。FIG. 10 is a functional block diagram of the image processing unit. 図11は、変換モデルを用いた推論の様子を示す図である。FIG. 11 is a diagram showing a state of inference using a transformation model.
 以下、添付図面を参照しつつ、本発明に係る学習済みモデル変換方法、推論方法、学習済みモデル変換装置、学習済みモデル、及び推論装置の実施形態について詳細に説明する。 Hereinafter, embodiments of the trained model transformation method, the inference method, the trained model transformation device, the trained model, and the inference device according to the present invention will be described in detail with reference to the attached drawings.
 <学習済みモデル変換装置の構成>
 図1は、学習済みモデル変換装置500(学習済みモデル変換装置)の構成を示す図である。学習済みモデル変換装置500は、学習制御部512及び変換モデル生成部514を備えるプロセッサ510(プロセッサ、コンピュータ)と、ROM520(ROM:Read Only Memory、非一時的記録媒体、メモリ)と、RAM530(RAM:Random Access Memory)と、を備える。プロセッサ510は、後述する内視鏡システム10(図8~11及びこれらの図に関連する記載を参照)の主制御部210及び画像処理部204と同様に、各種のプロセッサ及び/または電気回路により構成することができる。ROM520には、学習済みモデル変換プログラム(本発明に係る学習済みモデル変換方法をコンピュータに実行させるプログラム)のコンピュータ読み取り可能なコード、及び学習済みモデル変換方法の実行に必要な各種のデータが記憶されている。ROM520ではなく、EEPROM(Electronically Erasable and Programmable Read Only Memory)やフラッシュメモリにコードやデータを記憶してもよい。RAM530は、処理の際の一時記憶領域や作業領域として用いられる。
<Configuration of trained model transformation device>
FIG. 1 is a diagram showing a configuration of a trained model transformation device 500 (trained model transformation device). The trained model conversion device 500 includes a processor 510 (processor, computer) including a learning control unit 512 and a conversion model generation unit 514, a ROM 520 (ROM: Read Only Memory, a non-temporary recording medium, a memory), and a RAM 530 (RAM). : Random Access Memory) and. The processor 510 is driven by various processors and / or electrical circuits, similar to the main control unit 210 and the image processing unit 204 of the endoscope system 10 (see FIGS. 8 to 11 and related to these figures) described later. Can be configured. The ROM 520 stores a computer-readable code of a trained model conversion program (a program that causes a computer to execute the trained model conversion method according to the present invention) and various data necessary for executing the trained model conversion method. ing. Codes and data may be stored in EEPROM (Electronically Erasable and Programmable Read Only Memory) or flash memory instead of ROM 520. The RAM 530 is used as a temporary storage area or a work area during processing.
 学習済みモデル変換装置500は、上述の構成により、畳み込みニューラルネットワーク(CNN:Convolutional Neural Network)の学習及び学習済みモデルの変換を行う。図2の(a)部分は、学習前のCNN560が学習制御部512の制御により学習済みモデルであるCNN562となる様子を示しており、同図の(b)部分は、CNN562(学習済みモデル)が変換モデル生成部514の制御によりCNN563(変換モデル)となる様子(畳み込み層生成工程、変換モデル生成工程)を示している。なお、学習済みモデルの変換を行う処理装置の種類(CPU、GPU等)は特に限定されない。 The trained model conversion device 500 trains a convolutional neural network (CNN) and converts a trained model according to the above configuration. Part (a) of FIG. 2 shows how the CNN 560 before learning becomes CNN562, which is a trained model under the control of the learning control unit 512, and part (b) of FIG. 2 shows CNN562 (learned model). Shows how CNN563 (conversion model) is formed by the control of the conversion model generation unit 514 (convolution layer generation step, conversion model generation step). The type of processing device (CPU, GPU, etc.) that converts the trained model is not particularly limited.
 <畳み込みニューラルネットワークの構成>
 図3はCNN562(畳み込みニューラルネットワーク;学習済みモデル)の構成例を示す図である(CNN560も同様の構成である)。図3の(a)部分に示す例では、CNN562は、入力層562A、中間層562B、及び出力層562Cを有する。入力層562Aは時系列データ(例えば被検体の動画像等であるが、これに限定されない;入力データ)を入力して特徴量を出力する。中間層562Bは畳み込み層564(第1畳み込み層)及びバッチ正則化層565(正則化層)を含み、入力層562Aが出力する特徴量を入力して他の特徴量を算出する。畳み込み層564は複数の「ノード」が「エッジ」で結ばれた構造となっており、入力した画像に適用される重み係数が、ノード及びエッジに関連付けられて、図示せぬ重み係数記憶部に記憶されている。重み係数の値は、学習が進むにつれて初期状態(CNN560での値)から変化していき、CNN562(学習済みモデル)では学習が終了した状態での重み係数を用いる。
<Convolutional neural network configuration>
FIG. 3 is a diagram showing a configuration example of CNN562 (convolutional neural network; trained model) (CNN560 has a similar configuration). In the example shown in part (a) of FIG. 3, the CNN 562 has an input layer 562A, an intermediate layer 562B, and an output layer 562C. The input layer 562A inputs time-series data (for example, a moving image of a subject, but is not limited to this; input data) and outputs a feature amount. The intermediate layer 562B includes a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and the feature amount output by the input layer 562A is input to calculate other feature amounts. The convolution layer 564 has a structure in which a plurality of "nodes" are connected by "edges", and a weighting coefficient applied to the input image is associated with the node and the edge and stored in a weighting coefficient storage unit (not shown). It is remembered. The value of the weighting coefficient changes from the initial state (value in CNN560) as the learning progresses, and in CNN562 (trained model), the weighting coefficient in the state where the learning is completed is used.
 なお、図3において、変換対象層561Aのように畳み込み層564が入力側でバッチ正則化層565が出力側の場合は図5について後述するような変換(畳み込み層生成工程、変換モデル生成工程)が行われ、変換対象層561Bのようにバッチ正則化層565が入力側で畳み込み層564が出力側の場合は図6について後述するような変換(畳み込み層生成工程、変換モデル生成工程)が行われる。 In FIG. 3, when the convolution layer 564 is on the input side and the batch regularization layer 565 is on the output side as in the conversion target layer 561A, the conversion as described later with respect to FIG. 5 (convolution layer generation step, conversion model generation step). When the batch regularization layer 565 is on the input side and the convolution layer 564 is on the output side as in the conversion target layer 561B, the conversion (convolution layer generation step, conversion model generation step) as described later with respect to FIG. 6 is performed. Will be.
 <中間層における処理>
 <畳み込み>
 中間層562Bは、畳み込み演算によって特徴量を算出する。畳み込み層564で行われる畳み込み演算はフィルタを使用した畳み込み演算により特徴マップを取得する処理であり、画像からのエッジ抽出等の特徴抽出の役割を担う。このフィルタを用いた畳み込み演算により、1つのフィルタに対して1チャンネル(1枚)の「特徴マップ」が生成される。「特徴マップ」のサイズは、畳み込みによりダウンスケーリングされる場合は、各層で畳み込みが行われるにつれて小さくなって行く。中間層562Bは、畳み込みの処理を行う1または複数の層により構成することができる。
<Processing in the middle layer>
<Convolution>
The intermediate layer 562B calculates the feature amount by the convolution calculation. The convolution operation performed in the convolution layer 564 is a process of acquiring a feature map by a convolution operation using a filter, and plays a role of feature extraction such as edge extraction from an image. By the convolution operation using this filter, one channel (one sheet) of "feature map" is generated for one filter. When downscaled by convolution, the size of the "feature map" becomes smaller as each layer is convolved. The intermediate layer 562B can be composed of one or a plurality of layers that are subjected to the convolution process.
 図4は、フィルタによる畳み込み処理の様子を示す図である。中間層562Bの最初(1番目)の畳み込み層では、例えば複数の医療画像(入力データ)により構成される画像セット(学習時は学習用画像セット、推論時は推論用画像セット)とフィルタFとの畳み込み演算が行われる。画像セットは、縦がH、横がWの画像サイズを有するN枚(Nチャンネル)の画像により構成される。通常光画像を入力する場合、画像セットを構成する画像はR(赤色),G(緑色),B(青色)の3チャンネルの画像である。この画像セットと畳み込み演算されるフィルタFは、画像セットがNチャンネル(N枚)であるため、例えばサイズ5(5×5)のフィルタの場合、フィルタサイズは5×5×Nのフィルタになる。このフィルタFを用いた畳み込み演算により、1つのフィルタFに対して1チャンネル(1枚)の「特徴マップ」が生成される。2番目の畳み込み層で使用されるフィルタFは、例えばサイズ3(3×3)のフィルタの場合、フィルタサイズは3×3×Mになる。 FIG. 4 is a diagram showing a state of the convolution process by the filter. In the first (first) convolution layer of the intermediate layer 562B, for example, an image set (learning image set during learning, inference image set during inference) composed of a plurality of medical images (input data) and a filter F 1 The convolution operation with is performed. The image set is composed of N images (N channels) having an image size of H in the vertical direction and W in the horizontal direction. When a normal optical image is input, the images constituting the image set are images of three channels of R (red), G (green), and B (blue). Since the image set of the filter F 1 to be convolved with this image set is N channels (N sheets), for example, in the case of a filter of size 5 (5 × 5), the filter size is 5 × 5 × N. Become. By the convolution operation using this filter F 1 , one channel (one sheet) of "feature map" is generated for one filter F 1 . The filter F 2 used in the second convolution layer has a filter size of 3 × 3 × M, for example, in the case of a filter of size 3 (3 × 3).
 1番目の畳み込み層と同様に、2番目からn番目の畳み込み層ではフィルタF~Fを用いた畳み込み演算が行われる。n番目の畳み込み層における「特徴マップ」のサイズが、2番目の畳み込み層における「特徴マップ」のサイズよりも小さくなっているのは、前段までの畳み込み層によりダウンスケーリングされているからである。 Similar to the first convolution layer, the second to nth convolution layers perform a convolution operation using the filters F2 to Fn. The size of the "feature map" in the nth convolution layer is smaller than the size of the "feature map" in the second convolution layer because it is downscaled by the convolution layers up to the previous stage.
 中間層562Bの層のうち、入力側に近い畳み込み層では低次の特徴抽出(エッジの抽出等)が行われ、出力側に近づくにつれて高次の特徴抽出(認識対象の形状、構造等に関する特徴の抽出)が行われる。 Of the layers of the intermediate layer 562B, low-order feature extraction (edge extraction, etc.) is performed in the convolutional layer near the input side, and higher-order feature extraction (features related to the shape, structure, etc. of the recognition target) is performed as the intermediate layer 562B approaches the output side. Extraction) is performed.
 <正則化>
 畳み込みニューラルネットワークでの学習においては、正則化層を入れることで内部共変量シフトが抑制されるため、収束速度及び精度の向上が期待できる。正則化層では特徴量の平均や分散といった統計量を算出してそれにより特徴量の白色化を行っている。正則化層にはいくつかの種類があり、それらの正規化層は、以下に説明するように上述の統計量を算出する範囲が異なっている。なお、特徴量の値をf(b,x,y,c)と定義する。ここでb,x,y,c はそれぞれ、バッチ、X軸、Y軸、チャネルのインデックスとする。
<Regularization>
In learning with a convolutional neural network, the internal covariate shift is suppressed by inserting a regularization layer, so improvement in convergence speed and accuracy can be expected. In the regularized layer, statistics such as the average and variance of the features are calculated and the features are whitened. There are several types of regularization layers, and these normalization layers differ in the range in which the above-mentioned statistics are calculated, as described below. The feature value is defined as f (b, x, y, c). Here, b, x, y, and c are the indexes of the batch, X-axis, Y-axis, and channel, respectively.
 (1)バッチ正則化
 平均及び分散はそれぞれ以下の式(1),(2)のように算出され、チャネル単位で白色化を行う。
(1) Batch regularization The average and variance are calculated by the following equations (1) and (2), respectively, and whitening is performed for each channel.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 (2)レイヤー正則化
 平均分散はそれぞれ以下の式(3),(4)のように算出され、バッチ単位で白色化を行う。
(2) Layer regularization The average variance is calculated by the following equations (3) and (4), respectively, and whitening is performed in batch units.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 (3)インスタンス正則化
 平均及び分散はそれぞれ以下の式(5),(6)のように算出され、バッチ、チャネル単位で白色化を行う。
(3) Instance regularization The average and variance are calculated by the following equations (5) and (6), respectively, and whitening is performed for each batch and channel.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 (4)グループ正則化
 以下の式(7),(8)のように、チャネルをN個のグループに分割する。
(4) Group regularization The channel is divided into N groups as shown in the following equations (7) and (8).
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 平均及び分散はそれぞれ以下の式(9),(10)のように算出され、各バッチのグループ単位で白色化を行う。 The average and variance are calculated as the following formulas (9) and (10), respectively, and whitening is performed for each group of each batch.
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 なお、CNN562の層構成は、畳み込み層564とバッチ正則化層565とが1つずつ繰り返される場合に限らず、いずれかの層(例えば、畳み込み層564)が複数連続して含まれていてもよい。 The layer structure of the CNN 562 is not limited to the case where the convolution layer 564 and the batch regularization layer 565 are repeated one by one, and even if any one layer (for example, the convolution layer 564) is continuously included. good.
 <その他の構成>
 CNN562はプーリング層を含んでいてもよい。プーリング層で行われるプーリング処理は、畳み込み演算により出力された特徴マップを縮小(または拡大)して新たな特徴マップとする処理であり、抽出された特徴が、平行移動などによる影響を受けないようにロバスト性を与える役割を担う。また、CNN562は、図3の(b)部分に示す例のように全結合層566を含んでいてもよい。
<Other configurations>
CNN562 may include a pooling layer. The pooling process performed in the pooling layer is a process of reducing (or enlarging) the feature map output by the convolution operation to make a new feature map, so that the extracted features are not affected by translation or the like. Plays the role of giving robustness to. Further, the CNN 562 may include a fully bonded layer 566 as in the example shown in the portion (b) of FIG.
 <学習済みモデルの変換:推論時における正則化層の省略>
 上述のように、正則化の方法には複数の種類が存在するが、「特徴量のどの範囲で統計量を算出するか」の差異であり、以下の議論ではいずれも同様の方法で実施可能である。以降ではバッチ正則化のみについて説明する。
<Transformation of trained model: Omission of regularization layer at the time of inference>
As mentioned above, there are multiple types of regularization methods, but the difference is in "in what range of features the statistics are calculated", and in the following discussions, all can be implemented by the same method. Is. In the following, only batch regularization will be described.
 バッチ正則化層は多くの場合、畳み込み層の前、後ろ、もしくは前後に配置される。バッチ正則化処理は、推論時に限れば、以下に説明する方法で隣接する畳み込み層に統合する(畳み込みとバッチ正則化を1つの畳み込みにまとめる)ことが可能である。これにより、推論処理時のメモリアクセス回数を削減することができ計算高速化が可能となる。本発明の手法は、メモリアクセスコストが処理時間においてより支配的となる並列計算処理装置(GPUなど)上で推論を実行する場合に効果が大きい。また、本発明の手法はあくまで推論処理時に変換するのであり、学習時のモデルはバッチ正則化層を含んでいる。つまり、バッチ正則化の学習に対する効果を得ながら、学習済モデルの推論処理時にはその処理コストを省略することができる。 The batch regularization layer is often placed in front of, behind, or before and after the convolutional layer. The batch regularization process can be integrated into adjacent convolution layers (convolution and batch regularization are combined into one convolution) only at the time of inference by the method described below. As a result, the number of memory accesses during inference processing can be reduced, and the calculation speed can be increased. The method of the present invention is highly effective when performing inference on a parallel computing device (such as a GPU) in which the memory access cost becomes more dominant in processing time. Further, the method of the present invention is only converted at the time of inference processing, and the model at the time of learning includes a batch regularization layer. That is, while obtaining the effect on the learning of batch regularization, the processing cost can be omitted at the time of inference processing of the trained model.
 <変換方法(パターン1)>
 図5は、学習済みモデルの変換の様子(パターン1)を示す図である。パターン1は、図3の変換対象層561A(第1処理部)のように、畳み込み層564が入力側でバッチ正則化層565が出力側の場合の処理である。図5の(a)部分のように、畳み込み層564の入力及び出力をそれぞれx、yとし、バッチ正則化層565の出力をzとすると、畳み込み及びバッチ正則化の処理は以下の式(11)、(12)のように定式化することができる。
<Conversion method (Pattern 1)>
FIG. 5 is a diagram showing a state of conversion (pattern 1) of the trained model. Pattern 1 is processing when the convolution layer 564 is on the input side and the batch regularization layer 565 is on the output side, as in the conversion target layer 561A (first processing unit) in FIG. Assuming that the input and output of the convolution layer 564 are x and y and the output of the batch regularization layer 565 is z as in the part (a) of FIG. 5, the convolution and batch regularization processing is performed by the following equation (11). ) And (12).
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
 ここでW, b は畳み込み層564(第1畳み込み層)の学習済みパラメータであり、γ、μ、σ、ε、βはバッチ正則化層565(正則化層)の学習済みパラメータである。上述を式変形することで、以下の式(13)が得られる。 Here, W and b are trained parameters of the convolution layer 564 (first convolution layer), and γ, μ, σ, ε, and β are trained parameters of the batch regularization layer 565 (regularization layer). By transforming the above equation, the following equation (13) can be obtained.
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
 つまり、パラメータとしてWチルダ(第2畳み込み層567での畳み込みの際の重みパラメータ)及びbチルダ(第2畳み込み層567でのバイアス成分)をもつ畳み込みによってxからzへの処理を実現できる。なお、Wチルダ及びbチルダの定義は、それぞれ以下の式(14)、(15)の通りである。 That is, processing from x to z can be realized by convolution having W tilde (weight parameter at the time of convolution in the second convolution layer 567) and b tilde (bias component in the second convolution layer 567) as parameters. The definitions of W tilde and b tilde are as shown in the following equations (14) and (15), respectively.
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015
 これにより、変換モデル生成部514は、図5の(b)部分に示すように、畳み込み層564での畳み込みとその結果に対するバッチ正則化層565での処理を1つの畳み込み層567(第2畳み込み層、第2処理部)に変換することが可能となる(第2畳み込み層の生成、畳み込み層生成工程、畳み込み層生成処理)。すなわち、変換モデル生成部514は、畳み込み層生成工程において、畳み込み層564(第1畳み込み層)及びバッチ正則化層565(正則化層)から構成される第1処理部と畳み込み層567(第2畳み込み層)のみから構成される第2処理部のそれぞれにおいて、同一の特徴量(例えば上述のx)を入力した場合の推論処理結果(例えば上述のz)が等しくなるように畳み込み層567(第2畳み込み層)を生成することができる。 As a result, as shown in the portion (b) of FIG. 5, the conversion model generation unit 514 performs the convolution in the convolution layer 564 and the processing in the batch regularization layer 565 for the result in one convolution layer 567 (second convolution). It is possible to convert to a layer (second processing unit) (generation of a second convolution layer, convolution layer generation step, convolution layer generation process). That is, in the convolution layer generation step, the conversion model generation unit 514 has a first processing unit composed of a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and a convolution layer 567 (second). In each of the second processing units composed of only the convolution layer), the convolution layer 567 (for example, the above-mentioned z) has the same inference processing result (for example, the above-mentioned z) when the same feature amount (for example, x) is input. 2 convolutional layers) can be generated.
 なお、入力データが画像である場合、x,y,zはベクトル、W及びWチルダ(式(13),(14)に記載された、変換後のパラメータ)は重み係数行列、b及びbチルダ(式(13),(15)に記載された、変換後のパラメータ)もバイアス成分を示す行列となる。 When the input data is an image, x, y, z are vectors, W and W tildes (parameters after conversion described in equations (13) and (14)) are weight coefficient matrices, b and b tildes. (Parameters after conversion described in equations (13) and (15)) are also a matrix showing the bias component.
 変換モデル生成部514は、変換対象層561A(畳み込み層564及びバッチ正則化層565;図3参照)を畳み込み層567(第2畳み込み層、第2処理部)と入れ替えて、変換された学習済みモデルである変換モデルを生成する(変換モデル生成工程、変換モデル生成処理)。 The conversion model generation unit 514 replaces the conversion target layer 561A (convolution layer 564 and batch regularization layer 565; see FIG. 3) with the convolution layer 567 (second convolution layer, second processing unit), and has been converted and trained. Generate a transformation model that is a model (convolution model generation process, transformation model generation process).
 <変換方法(パターン2)>
 図6は、学習済みモデルの変換の様子(パターン2)を示す図である。パターン2は、図3の変換対象層561B(第1処理部)のように、バッチ正則化層565が入力側で畳み込み層564が出力側の場合の処理である。図6の(a)部分のように、バッチ正則化層565の入力及び出力をそれぞれx、yとし、畳み込み層564の出力をzとすると、バッチ正則化及び畳み込みの処理は以下の式(16)、(17)のように定式化することができる。
<Conversion method (Pattern 2)>
FIG. 6 is a diagram showing a state of conversion (pattern 2) of the trained model. Pattern 2 is a process in which the batch regularization layer 565 is on the input side and the convolution layer 564 is on the output side, as in the conversion target layer 561B (first processing unit) in FIG. Assuming that the input and output of the batch regularization layer 565 are x and y and the output of the convolution layer 564 is z as in the part (a) of FIG. 6, the batch regularization and convolution processing is performed by the following equation (16). ) And (17).
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000017
Figure JPOXMLDOC01-appb-M000017
 パターン1と同様に式変形すると、以下の式(18)が得られる。 The following equation (18) is obtained by transforming the equation in the same manner as in pattern 1.
Figure JPOXMLDOC01-appb-M000018
Figure JPOXMLDOC01-appb-M000018
 つまり、パラメータとしてWチルダ(第2畳み込み層567での畳み込みの際の重みパラメータ)及びbチルダ(第2畳み込み層567でのバイアス成分)をもつ畳み込みによってxからzへの処理を実現できる。なお、Wチルダの定義は以下の式(19)の通りであり、bチルダの定義は以下の式(20)の通りである。 That is, processing from x to z can be realized by convolution having W tilde (weight parameter at the time of convolution in the second convolution layer 567) and b tilde (bias component in the second convolution layer 567) as parameters. The definition of the W tilde is as shown in the following formula (19), and the definition of the b tilde is as shown in the following formula (20).
Figure JPOXMLDOC01-appb-M000019
Figure JPOXMLDOC01-appb-M000019
Figure JPOXMLDOC01-appb-M000020
Figure JPOXMLDOC01-appb-M000020
 これにより、変換モデル生成部514は、図6の(b)部分に示すように、バッチ正則化層565での処理とその結果に対する畳み込み層564での畳み込みとを1つの畳み込み層567(第2畳み込み層、第2処理部)に変換することが可能となる(第2畳み込み層の生成、畳み込み層生成工程、畳み込み層生成処理)。すなわち、変換モデル生成部514は、畳み込み層生成工程において、畳み込み層564(第1畳み込み層)及びバッチ正則化層565(正則化層)から構成される第1処理部と畳み込み層567(第2畳み込み層)のみから構成される第2処理部のそれぞれにおいて、同一の特徴量(例えば上述のx)を入力した場合の推論処理結果(例えば上述のz)が等しくなるように畳み込み層567(第2畳み込み層)を生成することができる。 As a result, as shown in the portion (b) of FIG. 6, the conversion model generation unit 514 combines the processing in the batch regularization layer 565 and the convolution in the convolution layer 564 with respect to the result into one convolution layer 567 (second). It is possible to convert to a convolutional layer (second processing unit) (generation of a second convolutional layer, convolutional layer generation step, convolutional layer generation processing). That is, in the convolution layer generation step, the conversion model generation unit 514 has a first processing unit composed of a convolution layer 564 (first convolution layer) and a batch regularization layer 565 (regularization layer), and a convolution layer 567 (second). In each of the second processing units composed of only the convolution layer), the convolution layer 567 (for example, the above-mentioned z) has the same inference processing result (for example, the above-mentioned z) when the same feature amount (for example, x) is input. 2 convolutional layers) can be generated.
 なお、入力データが画像である場合、x,y,zはベクトル、W及びWチルダ(式(18),(19)に記載された、変換後のパラメータ)は重み係数行列、b及びbチルダ(式(18),(20)に記載された、変換後のパラメータ)もバイアス成分を示す行列となる。 When the input data is an image, x, y, z are vectors, W and W tildes (parameters after conversion described in equations (18) and (19)) are weight coefficient matrices, b and b tildes. (Parameters after conversion described in equations (18) and (20)) are also a matrix showing the bias component.
 変換モデル生成部514は、変換対象層561A(畳み込み層564及びバッチ正則化層565;図3参照)を畳み込み層567(第2畳み込み層、第2処理部)と入れ替えて、変換された学習済みモデルである変換モデルを生成する(変換モデル生成工程、変換モデル生成処理)。 The conversion model generation unit 514 replaces the conversion target layer 561A (convolution layer 564 and batch regularization layer 565; see FIG. 3) with the convolution layer 567 (second convolution layer, second processing unit), and has been converted and trained. Generate a transformation model that is a model (convolution model generation process, transformation model generation process).
 <変換後の学習済みモデル>
 図7は、変換された畳み込みニューラルネットワーク(変換された学習済みモデル、変換モデル)の構成例を示す図である。図7の(a)部分は図3の(a)部分に対応するCNN563(変換モデル;全結合層なし)を示し、同図の(b)部分は図3の(b)部分に対応するCNN563(変換モデル;全結合層あり)を示す。CNN563は、入力層563A、中間層563B、及び出力層563Cを備える。なお、図7では畳み込み層564とバッチ正則化層565のセットを全て畳み込み層567に変換及び入れ替えした場合の例を示しているが、変換及び入れ替えは一部のセットについて行ってもよい。
<Trained model after conversion>
FIG. 7 is a diagram showing a configuration example of a transformed convolutional neural network (transformed trained model, transformed model). Part (a) of FIG. 7 shows CNN563 (conversion model; no fully connected layer) corresponding to part (a) of FIG. 3, and part (b) of FIG. 3 corresponds to part (b) of FIG. (Conversion model; with fully connected layer) is shown. CNN563 includes an input layer 563A, an intermediate layer 563B, and an output layer 563C. Note that FIG. 7 shows an example in which all the sets of the convolution layer 564 and the batch regularization layer 565 are converted and replaced with the convolution layer 567, but the conversion and replacement may be performed for some sets.
 <推論装置及び推論方法の一態様:内視鏡システムへの適用>
 図8は、推論装置の一態様としての内視鏡システム10(内視鏡システム、医療画像処理装置、推論装置)の外観図であり、図9は内視鏡システム10の要部構成を示すブロック図である。図8,9に示すように、内視鏡システム10は、内視鏡スコープ100(画像取得部、内視鏡スコープ)、医療画像処理装置200(医療画像処理装置、コンピュータ、プロセッサ、推論装置)、光源装置300(光源装置)、及びモニタ400(表示装置、ディスプレイ)から構成される。
<One aspect of inference device and inference method: application to endoscopic system>
FIG. 8 is an external view of the endoscope system 10 (endoscope system, medical image processing device, reasoning device) as one aspect of the reasoning device, and FIG. 9 shows a main part configuration of the endoscope system 10. It is a block diagram. As shown in FIGS. 8 and 9, the endoscope system 10 includes an endoscope scope 100 (image acquisition unit, endoscope scope) and a medical image processing device 200 (medical image processing device, computer, processor, inference device). , A light source device 300 (light source device), and a monitor 400 (display device, display).
 <内視鏡スコープの構成>
 内視鏡スコープ100は、手元操作部102と、この手元操作部102に連設される挿入部104とを備える。術者(ユーザ)は手元操作部102を把持して操作し、挿入部104を被検体(生体)の体内に挿入して観察する。また、手元操作部102には送気送水ボタン141、吸引ボタン142、及び各種の機能を割り付けられる機能ボタン143、及び撮影指示操作(静止画像、動画像)を受け付ける撮影ボタン144が設けられている。
<Structure of endoscopic scope>
The endoscope scope 100 includes a hand operation unit 102 and an insertion unit 104 connected to the hand operation unit 102. The operator (user) grips and operates the hand operation unit 102, inserts the insertion unit 104 into the body of the subject (living body), and observes it. Further, the hand operation unit 102 is provided with an air supply / water supply button 141, a suction button 142, a function button 143 to which various functions are assigned, and a shooting button 144 for receiving a shooting instruction operation (still image, moving image). ..
 手元操作部102には、内視鏡スコープ100の個体情報(個体情報、スコープ情報)を記録するスコープ情報記録部139が設けられている。個体情報は、例えば内視鏡スコープ100のタイプ(直視か側視か、等)、機種、個体識別番号、光学系の特性(視野角、歪み等)、被検体の処置に使用される器具(処置具等)の情報等である。画像処理部204のスコープ情報取得部204E(スコープ情報取得部、個体情報取得部;図10を参照)が、この個体情報を取得し、医療画像処理装置200による処理(画像取得処理、推論処理、表示制御処理)に用いられる。なお、スコープ情報記録部139はライトガイドコネクタ108内等、他の部分に設けられていてもよい。 The hand operation unit 102 is provided with a scope information recording unit 139 that records individual information (individual information, scope information) of the endoscope scope 100. The individual information includes, for example, the type of the endoscope scope 100 (direct view or side view, etc.), the model, the individual identification number, the characteristics of the optical system (viewing angle, distortion, etc.), and the instrument used for treating the subject (viewing angle, distortion, etc.). Information on treatment tools, etc.). The scope information acquisition unit 204E (scope information acquisition unit, individual information acquisition unit; see FIG. 10) of the image processing unit 204 acquires this individual information and processes it by the medical image processing apparatus 200 (image acquisition processing, inference processing, Used for display control processing). The scope information recording unit 139 may be provided in another portion such as inside the light guide connector 108.
 挿入部104は、手元操作部102側から順に、軟性部112、湾曲部114、先端硬質部116で構成されている。すなわち、先端硬質部116の基端側に湾曲部114が接続され、湾曲部114の基端側に軟性部112が接続される。挿入部104の基端側に手元操作部102が接続される。ユーザは、手元操作部102を操作することにより湾曲部114を湾曲させて先端硬質部116の向きを上下左右に変えることができる。先端硬質部116には、撮影光学系130、照明部123、鉗子口126等が設けられる(図8,9参照)。 The insertion portion 104 is composed of a flexible portion 112, a curved portion 114, and a hard tip portion 116 in this order from the hand operation portion 102 side. That is, the curved portion 114 is connected to the proximal end side of the hard tip portion 116, and the flexible portion 112 is connected to the proximal end side of the curved portion 114. The hand operation unit 102 is connected to the base end side of the insertion unit 104. The user can bend the curved portion 114 and change the direction of the hard tip portion 116 up, down, left and right by operating the hand operation portion 102. The hard tip 116 is provided with an imaging optical system 130, an illumination unit 123, a forceps opening 126, and the like (see FIGS. 8 and 9).
 観察、処置の際には、操作部208(図9参照)の操作により、照明部123の照明用レンズ123A,123Bから白色光及び/または狭帯域光(赤色狭帯域光、緑色狭帯域光、青色狭帯域光、及び紫色狭帯域光のうち1つ以上)を照射することができる。また、送気送水ボタン141の操作により図示せぬ送水ノズルから洗浄水が放出されて、撮影光学系130の撮影レンズ132(撮影レンズ、撮影部)、及び照明用レンズ123A,123Bを洗浄することができる。先端硬質部116で開口する鉗子口126には不図示の管路が連通しており、この管路に腫瘍摘出等のための図示せぬ処置具が挿通されて、適宜進退して被検体に必要な処置を施せるようになっている。 During observation and treatment, by operating the operation unit 208 (see FIG. 9), white light and / or narrow band light (red narrow band light, green narrow band light, One or more of blue narrow band light and purple narrow band light) can be irradiated. Further, by operating the air supply / water supply button 141, cleaning water is discharged from a water supply nozzle (not shown) to clean the photographing lens 132 (photographing lens, photographing unit) of the photographing optical system 130 and the lighting lenses 123A and 123B. Can be done. A duct (not shown) is communicated with the forceps opening 126 opened by the hard tip 116, and a treatment tool (not shown) for removing a tumor or the like is inserted into this duct, and the subject moves back and forth as appropriate. You can take the necessary measures.
 図8,9に示すように、先端硬質部116の先端側端面116Aには撮影レンズ132(撮影部)が配設されている。撮影レンズ132の奥にはCMOS(Complementary Metal-Oxide Semiconductor)型の撮像素子134(撮像素子、画像取得部)、駆動回路136、AFE138(AFE:Analog Front End)が配設されて、これらの要素により画像信号を出力する。撮像素子134はカラー撮像素子であり、特定のパターン配列(ベイヤー配列、X-Trans(登録商標)配列、ハニカム配列等)でマトリクス状に配置(2次元配列)された複数の受光素子により構成される複数の画素を備える。撮像素子134の各画素はマイクロレンズ、赤(R)、緑(G)、または青(B)のカラーフィルタ及び光電変換部(フォトダイオード等)を含んでいる。撮像素子134、駆動回路136、及びAFE138を1つのパッケージに含めたイメージセンサを用いてもよい。撮影光学系130は、赤,緑,青の3色の画素信号からカラー画像を生成することもできるし、赤,緑,青のうち任意の1色または2色の画素信号から画像を生成することもできる。なお、撮像素子134はXYアドレス型やCCD(Charge Coupled Device)型でもよい。また、撮像素子134の各画素は紫色光源310Vに対応した紫色カラーフィルタ、及び/または赤外光源に対応した赤外用フィルタをさらに備えていてもよい。 As shown in FIGS. 8 and 9, a photographing lens 132 (photographing portion) is arranged on the tip end surface 116A of the tip rigid portion 116. A CMOS (Complementary Metal-Oxide Semiconductor) type image sensor 134 (image sensor, image acquisition unit), a drive circuit 136, and an AFE138 (AFE: Analog Front End) are arranged behind the photographing lens 132, and these elements are arranged. Outputs an image signal. The image pickup element 134 is a color image pickup element, and is composed of a plurality of light receiving elements arranged in a matrix (two-dimensional arrangement) in a specific pattern arrangement (Bayer arrangement, X-Transs (registered trademark) arrangement, honeycomb arrangement, etc.). It has a plurality of pixels. Each pixel of the image sensor 134 includes a microlens, a red (R), green (G), or blue (B) color filter and a photoelectric conversion unit (photodiode or the like). An image sensor in which the image sensor 134, the drive circuit 136, and the AFE 138 are included in one package may be used. The photographing optical system 130 can also generate a color image from pixel signals of three colors of red, green, and blue, and generate an image from a pixel signal of any one or two colors of red, green, and blue. You can also do it. The image sensor 134 may be an XY address type or a CCD (Charge Coupled Device) type. Further, each pixel of the image pickup element 134 may further include a purple color filter corresponding to a purple light source 310V and / or an infrared filter corresponding to an infrared light source.
 被検体の光学像は撮影レンズ132により撮像素子134の受光面(撮像面)に結像されて電気信号に変換され、不図示の信号ケーブルを介して医療画像処理装置200に出力されて映像信号に変換される。これにより、医療画像処理装置200に接続されたモニタ400に被写体の内視鏡画像(観察画像、医療画像)が画面表示される。 The optical image of the subject is imaged on the light receiving surface (imaging surface) of the image pickup element 134 by the photographing lens 132, converted into an electric signal, and output to the medical image processing apparatus 200 via a signal cable (not shown) to form a video signal. Is converted to. As a result, the endoscopic image (observation image, medical image) of the subject is displayed on the screen on the monitor 400 connected to the medical image processing device 200.
 また、先端硬質部116の先端側端面116Aには、撮影レンズ132に隣接して照明部123の照明用レンズ123A、123Bが設けられている。照明用レンズ123A,123Bの奥には、後述するライトガイド170の射出端が配設され、このライトガイド170が挿入部104、手元操作部102、及びユニバーサルケーブル106に挿通され、ライトガイド170の入射端がライトガイドコネクタ108内に配置される。 Further, on the tip end surface 116A of the tip rigid portion 116, the illumination lenses 123A and 123B of the illumination portion 123 are provided adjacent to the photographing lens 132. An ejection end of a light guide 170, which will be described later, is arranged behind the illumination lenses 123A and 123B, and the light guide 170 is inserted into an insertion portion 104, a hand operation portion 102, and a universal cable 106, and the light guide 170 is inserted. The incident end is arranged within the light guide connector 108.
 ユーザは、上述した構成の内視鏡スコープ100(挿入部104)を被検体である生体内に挿入または抜去しながら決められたフレームレートで撮影を行う(医療画像取得部204Aの制御により行うことができる)ことにより、生体内(被検体)の時系列の画像を順次撮影することができる。 The user performs imaging at a predetermined frame rate while inserting or removing the endoscope scope 100 (insertion unit 104) having the above-described configuration into the living body as the subject (under the control of the medical image acquisition unit 204A). By doing so, it is possible to sequentially take time-series images of the living body (subject).
 <光源装置の構成>
 図9に示すように、光源装置300は、照明用の光源310、絞り330、集光レンズ340、及び光源制御部350等から構成されており、観察光をライトガイド170に入射させる。光源310は、それぞれ赤色、緑色、青色、紫色の狭帯域光を照射する赤色光源310R、緑色光源310G、青色光源310B、及び紫色光源310Vを備えており、赤色、緑色、青色、及び紫色の狭帯域光を照射することができる。光源310による観察光の照度は光源制御部350により制御され、必要に応じて観察光の照度を変更する(上げる、または下げる)こと、及び照明を停止することができる。
<Structure of light source device>
As shown in FIG. 9, the light source device 300 includes a light source 310 for illumination, a diaphragm 330, a condenser lens 340, a light source control unit 350, and the like, and causes observation light to enter the light guide 170. The light source 310 includes a red light source 310R, a green light source 310G, a blue light source 310B, and a purple light source 310V that irradiate narrow-band light of red, green, blue, and purple, respectively, and is narrow in red, green, blue, and purple. It can irradiate band light. The illuminance of the observation light by the light source 310 is controlled by the light source control unit 350, and the illuminance of the observation light can be changed (increased or decreased) and the illumination can be stopped as needed.
 光源310は赤色、緑色、青色、及び紫色の狭帯域光を任意の組合せで発光させることができる。例えば、赤色、緑色、青色、及び紫色の狭帯域光を同時に発光させて白色光(通常光)を観察光として照射することもできるし、いずれか1つもしくは2つを発光させることで狭帯域光(特殊光)を照射することもできる。光源310は、赤外光(狭帯域光の一例)を照射する赤外光源をさらに備えていてもよい。また、白色光を照射する光源と、白色光及び各狭帯域光を透過させるフィルタとにより、白色光または狭帯域光を観察光として照射してもよい。 The light source 310 can emit red, green, blue, and purple narrow band light in any combination. For example, narrow-band light of red, green, blue, and purple can be emitted at the same time to irradiate white light (normal light) as observation light, or one or two of them can be emitted to emit narrow-band light. It is also possible to irradiate light (special light). The light source 310 may further include an infrared light source that irradiates infrared light (an example of narrow band light). Further, white light or narrow band light may be irradiated as observation light by a light source that irradiates white light and a filter that transmits white light and each narrow band light.
 <光源の波長帯域>
 光源310は白色帯域の光、または白色帯域の光として複数の波長帯域の光を発生する光源でもよいし、白色の波長帯域よりも狭い特定の波長帯域の光を発生する光源でもよい。特定の波長帯域は、可視域の青色帯域もしくは緑色帯域、あるいは可視域の赤色帯域であってもよい。特定の波長帯域が可視域の青色帯域もしくは緑色帯域である場合、390nm以上450nm以下、または530nm以上550nm以下の波長帯域を含み、かつ、390nm以上450nm以下または530nm以上550nm以下の波長帯域内にピーク波長を有していてもよい。また、特定の波長帯域が可視域の赤色帯域である場合、585nm以上615nm以下、または610nm以上730nm以下、の波長帯域を含み、かつ、特定の波長帯域の光は、585nm以上615nm以下または610nm以上730nm以下の波長帯域内にピーク波長を有していてもよい。ただし、nmは、「ナノメートル」を表す。
<Wavelength band of light source>
The light source 310 may be a light source having a white band or a light source having a plurality of wavelength bands as the light having a white band, or a light source having a specific wavelength band narrower than the white wavelength band. The specific wavelength band may be a blue band or a green band in the visible region, or a red band in the visible region. When a specific wavelength band is a visible blue band or green band, it includes a wavelength band of 390 nm or more and 450 nm or less, or 530 nm or more and 550 nm or less, and peaks in a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less. It may have a wavelength. Further, when the specific wavelength band is the red band in the visible region, the wavelength band of 585 nm or more and 615 nm or less, or 610 nm or more and 730 nm or less is included, and the light of the specific wavelength band is 585 nm or more and 615 nm or less or 610 nm or more. It may have a peak wavelength in the wavelength band of 730 nm or less. However, nm represents "nanometer".
 上述した特定の波長帯域の光は、酸化ヘモグロビンと還元ヘモグロビンとで吸光係数が異なる波長帯域を含み、かつ、酸化ヘモグロビンと還元ヘモグロビンとで吸光係数が異なる波長帯域にピーク波長を有していてもよい。この場合、特定の波長帯域は、400±10nm、440±10nm、470±10nm、または、600nm以上750nmの波長帯域を含み、かつ、400±10nm、440±10nm、470±10nm、または600nm以上750nm以下の波長帯域にピーク波長を有していてもよい。 Even if the above-mentioned light having a specific wavelength band includes a wavelength band having different absorption coefficients between oxidized hemoglobin and reduced hemoglobin and has a peak wavelength in a wavelength band having different absorption coefficients between oxidized hemoglobin and reduced hemoglobin. good. In this case, the specific wavelength band includes a wavelength band of 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm, and 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm. It may have a peak wavelength in the following wavelength band.
 また、光源310が発生する光は790nm以上820nm以下、または905nm以上970nm以下の波長帯域を含み、かつ、790nm以上820nm以下または905nm以上970nm以下の波長帯域にピーク波長を有していてもよい。 Further, the light generated by the light source 310 may include a wavelength band of 790 nm or more and 820 nm or less, or 905 nm or more and 970 nm or less, and may have a peak wavelength in a wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less.
 また、光源310は、ピークが390nm以上470nm以下である励起光を照射する光源を備えていてもよい。この場合、被検体(生体)内の蛍光物質が発する蛍光の情報を有する医療画像(医用画像、生体内画像)を取得することができる。蛍光画像を取得する場合は、蛍光法用色素剤(フルオレスチン、アクリジンオレンジ等)を使用してもよい。 Further, the light source 310 may include a light source that irradiates excitation light having a peak of 390 nm or more and 470 nm or less. In this case, it is possible to acquire a medical image (medical image, in-vivo image) having information on the fluorescence emitted by the fluorescent substance in the subject (living body). When acquiring a fluorescence image, a dye for the fluorescence method (fluorestin, acridine orange, etc.) may be used.
 光源310の光源種類(レーザ光源、キセノン光源、LED光源(LED:Light-Emitting Diode)等)、波長、フィルタの有無等は被写体の種類、部位、観察の目的等に応じて構成することが好ましく、また観察の際は被写体の種類、部位、観察の目的等に応じて観察光の波長を組合せ及び/または切り替えることが好ましい。波長を切り替える場合、例えば光源の前方に配置され特定波長の光を透過または遮光するフィルタが設けられた円板状のフィルタ(ロータリカラーフィルタ)を回転させることにより、照射する光の波長を切り替えてもよい。 It is preferable to configure the light source type (laser light source, xenon light source, LED light source (LED: Light-Emitting Diode), etc.), wavelength, presence / absence of filter, etc. of the light source 310 according to the type of subject, the part, the purpose of observation, and the like. Further, when observing, it is preferable to combine and / or switch the wavelength of the observation light according to the type, part, observation purpose, and the like of the subject. When switching the wavelength, for example, by rotating a disk-shaped filter (rotary color filter) arranged in front of the light source and provided with a filter that transmits or blocks light of a specific wavelength, the wavelength of the irradiated light is switched. May be good.
 また、内視鏡システム10で用いる撮像素子は、撮像素子134のように各画素に対しカラーフィルタが配設されたカラー撮像素子に限定されるものではなくモノクロ撮像素子でもよい。モノクロ撮像素子を用いる場合、観察光の波長を順次切り替えて面順次(色順次)で撮像することができる。例えば出射する観察光の波長を(紫色、青色、緑色、赤色)の間で順次切り替えてもよいし、広帯域光(白色光)を照射してロータリカラーフィルタ(赤色、緑色、青色、紫色等)により出射する観察光の波長を切り替えてもよい。また、1または複数の狭帯域光(緑色、青色、紫色等)を照射してロータリカラーフィルタ(緑色、青色、紫色等)により出射する観察光の波長を切り替えてもよい。狭帯域光は波長の異なる2波長以上の赤外光(第1狭帯域光、第2狭帯域光)でもよい。 Further, the image pickup element used in the endoscope system 10 is not limited to the color image pickup element in which the color filter is arranged for each pixel as in the image pickup element 134, and may be a monochrome image pickup element. When a monochrome image sensor is used, the wavelength of the observation light can be sequentially switched to perform surface-sequential (color-sequential) imaging. For example, the wavelength of the emitted observation light may be sequentially switched between (purple, blue, green, red), or a rotary color filter (red, green, blue, purple, etc.) is irradiated with broadband light (white light). You may switch the wavelength of the observation light emitted by. Further, the wavelength of the observation light emitted by the rotary color filter (green, blue, purple, etc.) may be switched by irradiating one or a plurality of narrow band lights (green, blue, purple, etc.). The narrow band light may be infrared light having two or more wavelengths (first narrow band light, second narrow band light) having different wavelengths.
 ライトガイドコネクタ108(図8,9参照)を光源装置300に連結することにより、光源装置300から照射された観察光がライトガイド170を介して照明用レンズ123A、123Bに伝送され、照明用レンズ123A、123Bから観察範囲に照射される。 By connecting the light guide connector 108 (see FIGS. 8 and 9) to the light source device 300, the observation light emitted from the light source device 300 is transmitted to the lighting lenses 123A and 123B via the light guide 170, and the lighting lens. The observation range is irradiated from 123A and 123B.
 <医療画像処理装置の構成>
 図9に基づき医療画像処理装置200の構成を説明する。医療画像処理装置200は、内視鏡スコープ100から出力される画像信号を画像入力コントローラ202により入力し、画像処理部204(プロセッサ、コンピュータ)で必要な画像処理を行ってビデオ出力部206から出力する。これにより、モニタ400(表示装置)に観察画像(医療画像、内視鏡画像、生体内画像)が表示される。これらの処理は主制御部210(プロセッサ、コンピュータ)の制御下で行われる。通信制御部205は、図示せぬ病院内システム(HIS:Hospital Information System)や病院内LAN(Local Area Network)、及び/または外部のシステムやネットワークとの間で医療画像の取得等についての通信制御を行う。
<Medical image processing device configuration>
The configuration of the medical image processing apparatus 200 will be described with reference to FIG. The medical image processing apparatus 200 inputs an image signal output from the endoscope scope 100 by the image input controller 202, performs necessary image processing by the image processing unit 204 (processor, computer), and outputs the image signal from the video output unit 206. do. As a result, the observation image (medical image, endoscopic image, in-vivo image) is displayed on the monitor 400 (display device). These processes are performed under the control of the main control unit 210 (processor, computer). The communication control unit 205 controls communication for acquiring medical images with an in-hospital system (HIS: Hospital Information System), an in-hospital LAN (Local Area Network), and / or an external system or network (not shown). I do.
 <画像処理部の機能>
 図10は画像処理部204の機能ブロック図である。画像処理部204は、医療画像取得部204A(医療画像取得部)と、推論部204B(推論部、関心領域認識部)と、表示制御部204C(表示制御部)と、記録制御部204D(記録制御部)と、スコープ情報取得部204E(スコープ情報取得部)と、を備える。推論部204Bは、上述した手法(本発明に係る学習済みモデル変換方法)により得られた変換モデル(図7に示すCNN563等)を備える。
これらの機能を用いた処理については、詳細を後述する。
<Function of image processing unit>
FIG. 10 is a functional block diagram of the image processing unit 204. The image processing unit 204 includes a medical image acquisition unit 204A (medical image acquisition unit), an inference unit 204B (inference unit, interest area recognition unit), a display control unit 204C (display control unit), and a recording control unit 204D (recording). It includes a control unit) and a scope information acquisition unit 204E (scope information acquisition unit). The inference unit 204B includes a conversion model (CNN563 or the like shown in FIG. 7) obtained by the above-mentioned method (learned model transformation method according to the present invention).
The details of the processing using these functions will be described later.
 画像処理部204は、上述した機能により、医療画像の認識(推論)、特徴量の算出、特定の周波数帯域の成分を強調または低減する処理、特定の対象(関心領域、所望の深さの血管等)を強調または目立たなくする処理を行うことができる。画像処理部204は、白色帯域の光、または白色帯域の光として複数の波長帯域の光を照射して得る通常光画像に基づいて特定の波長帯域の情報を有する特殊光画像を取得する特殊光画像取得部を備えていてもよい。この場合、特定の波長帯域の信号は、通常光画像に含まれるRGB(R:赤、G:緑、B:青)あるいはCMY(C:シアン、M:マゼンタ、Y:イエロー)の色情報に基づく演算により得ることができる。また、画像処理部204は、白色帯域の光、または白色帯域の光として複数の波長帯域の光を照射して得る通常光画像と、特定の波長帯域の光を照射して得る特殊光画像との少なくとも一方に基づく演算によって特徴量画像を生成する特徴量画像生成部を備え、医療画像(医用画像)としての特徴量画像を取得及び表示してもよい。なお、上述した処理は主制御部210の制御下で行われる。 The image processing unit 204 uses the above-mentioned functions to recognize (infer) medical images, calculate features, process to emphasize or reduce components in a specific frequency band, and perform a specific target (region of interest, blood vessel of a desired depth). Etc.) can be emphasized or inconspicuous. The image processing unit 204 acquires special light having information in a specific wavelength band based on a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band. It may be provided with an image acquisition unit. In this case, the signal in a specific wavelength band is used for RGB (R: red, G: green, B: blue) or CMY (C: cyan, M: magenta, Y: yellow) color information contained in a normal optical image. It can be obtained by the calculation based on. Further, the image processing unit 204 includes a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band, and a special optical image obtained by irradiating light in a specific wavelength band. A feature amount image generation unit that generates a feature amount image by an operation based on at least one of the above may be provided, and a feature amount image as a medical image (medical image) may be acquired and displayed. The above-mentioned processing is performed under the control of the main control unit 210.
 <各種のプロセッサによる機能の実現>
 上述した画像処理部204及び主制御部210の各部の機能は、各種のプロセッサ(processor)及び記録媒体を用いて実現できる。各種のプロセッサには、例えばソフトウェア(プログラム)を実行して各種の機能を実現する汎用的なプロセッサであるCPU(Central Processing Unit)が含まれる。また、上述した各種のプロセッサには、画像処理に特化したプロセッサであり並列計算処理装置の一態様であるGPU(Graphics Processing Unit)、FPGA(Field Programmable Gate Array)などの製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス(Programmable Logic Device:PLD)も含まれる。さらに、ASIC(Application Specific Integrated Circuit)などの特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路なども、上述した各種のプロセッサに含まれる。
<Realization of functions by various processors>
The functions of the image processing unit 204 and the main control unit 210 described above can be realized by using various processors and recording media. The various processors include, for example, a CPU (Central Processing Unit), which is a general-purpose processor that executes software (program) to realize various functions. In addition, for the various processors described above, the circuit configuration is changed after manufacturing GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), etc., which are processors specialized in image processing and one aspect of parallel computing equipment. A programmable logic device (PLD), which is a possible processor, is also included. Further, the above-mentioned various processors also include a dedicated electric circuit, which is a processor having a circuit configuration specially designed for executing a specific process such as an ASIC (Application Specific Integrated Circuit).
 各部の機能は1つのプロセッサにより実現されてもよいし、同種または異種の複数のプロセッサ(例えば、複数のFPGA、あるいはCPUとFPGAの組み合わせ、またはCPUとGPUの組み合わせ)で実現されてもよい。また、複数の機能を1つのプロセッサで実現してもよい。複数の機能を1つのプロセッサで構成する例としては、第1に、コンピュータに代表されるように、1つ以上のCPUとソフトウェアの組合せで1つのプロセッサを構成し、このプロセッサが複数の機能として実現する形態がある。第2に、システムオンチップ(System On Chip:SoC)などに代表されるように、システム全体の機能を1つのIC(Integrated Circuit)チップで実現するプロセッサを使用する形態がある。このように、各種の機能は、ハードウェア的な構造として、上述した各種のプロセッサを1つ以上用いて構成される。さらに、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子などの回路素子を組み合わせた電気回路(circuitry)である。これらの電気回路は、論理和、論理積、論理否定、排他的論理和、及びこれらを組み合わせた論理演算を用いて上述した機能を実現する電気回路であってもよい。 The functions of each part may be realized by one processor, or may be realized by a plurality of processors of the same type or different types (for example, a plurality of FPGAs, or a combination of a CPU and an FPGA, or a combination of a CPU and a GPU). Further, a plurality of functions may be realized by one processor. As an example of configuring a plurality of functions with one processor, first, as represented by a computer, one processor is configured by a combination of one or more CPUs and software, and this processor is used as a plurality of functions. There is a form to be realized. Secondly, as typified by System On Chip (SoC), there is a form of using a processor that realizes the functions of the entire system with one IC (Integrated Circuit) chip. As described above, various functions are configured by using one or more of the above-mentioned various processors as a hardware structure. Further, the hardware-like structure of these various processors is, more specifically, an electric circuit (circuitry) in which circuit elements such as semiconductor elements are combined. These electric circuits may be electric circuits that realize the above-mentioned functions by using logical sum, logical product, logical denial, exclusive OR, and logical operations combining these.
 上述したプロセッサあるいは電気回路がソフトウェア(プログラム)を実行する際は、実行するソフトウェアのコンピュータ(例えば、画像処理部204を構成する各種のプロセッサや電気回路、及び/またはそれらの組み合わせ)で読み取り可能なコードをROM211(ROM:Read Only Memory)やフラッシュメモリ(不図示)等の非一時的記録媒体に記憶しておき、コンピュータがそのソフトウェアを参照する。非一時的記録媒体に記憶しておくソフトウェアは、本発明に係る医療画像処理方法(医療画像処理装置の作動方法)を実行するためのプログラム及び実行に際して用いられるデータ(医療画像の取得に関するデータ、生検状態等の定義や識別表示の態様設定に用いられるデータ、認識部で用いられるパラメータ等)を含む。ROM211ではなく各種の光磁気記録装置、半導体メモリ等の非一時的記録媒体にコードを記録してもよい。ソフトウェアを用いた処理の際には例えばRAM212(RAM:Random Access Memory)が一時的記憶領域として用いられ、また例えば不図示のEEPROM(Electronically Erasable and Programmable Read Only Memory)に記憶されたデータを参照することもできる。「非一時的記録媒体」として記録部207を用いてもよい。 When the above-mentioned processor or electric circuit executes software (program), it can be read by a computer of the software (for example, various processors and electric circuits constituting the image processing unit 204, and / or a combination thereof). The code is stored in a non-temporary recording medium such as ROM 211 (ROM: ReadOnlyMemory) or flash memory (not shown), and the computer refers to the software. The software stored in the non-temporary recording medium includes a program for executing the medical image processing method (method of operating the medical image processing device) according to the present invention and data used for executing the medical image processing method (data related to acquisition of medical images). Includes data used to define the biopsy state, etc., and to set the mode of identification display, parameters used in the recognition unit, etc.). The code may be recorded on a non-temporary recording medium such as various optical magnetic recording devices and semiconductor memories instead of the ROM 211. When processing using software, for example, RAM212 (RAM: Random Access Memory) is used as a temporary storage area, and for example, data stored in an EEPROM (Electronically Erasable and Programmable Read Only Memory) (not shown) is referred to. You can also do it. The recording unit 207 may be used as a “non-temporary recording medium”.
 また、ROM211(ROM:Read Only Memory)は不揮発性の記憶素子(非一時的記録媒体)であり、各種の画像処理方法(本発明に係る医療画像処理方法を含む)を主制御部210及び/または画像処理部204(コンピュータ)に実行させるプログラムのコンピュータ読み取り可能なコードが記憶されている。RAM212(RAM:Random Access Memory)は各種処理の際の一時記憶用の記憶素子であり、また画像取得時のバッファとしても使用することができる。音声処理部209は、主制御部210及び画像処理部204の制御により、医療画像処理、関心領域の推論結果や報知等に関するメッセージ(音声)をスピーカ209A(報知部、スピーカ)から出力する。 Further, the ROM 211 (ROM: ReadOnlyMemory) is a non-volatile storage element (non-temporary recording medium), and various image processing methods (including the medical image processing method according to the present invention) are used in the main control unit 210 and /. Alternatively, a computer-readable code of a program to be executed by the image processing unit 204 (computer) is stored. The RAM 212 (RAM: Random Access Memory) is a storage element for temporary storage during various processes, and can also be used as a buffer for image acquisition. The voice processing unit 209 outputs a message (voice) related to medical image processing, inference results of the region of interest, notification, etc. from the speaker 209A (notification unit, speaker) under the control of the main control unit 210 and the image processing unit 204.
 なお、内視鏡システム10のように画像の処理や認識を行う場合は、並列計算処理装置の一態様であるGPUを用いて画像処理部204及び/または主制御部210を構成し、後述する推論工程(推論処理)の少なくとも一部をGPUで実行することが効果的である。 When image processing and recognition are performed as in the endoscope system 10, the image processing unit 204 and / or the main control unit 210 are configured using a GPU, which is one aspect of a parallel computing device, and will be described later. It is effective to execute at least a part of the inference process (inference process) on the GPU.
 <操作部>
 操作部208は図示せぬキーボード、マウス等のデバイスにより構成することができ、ユーザは操作部208を介して医療画像処理方法(推論方法)の実行指示や実行に必要な条件の設定を行うことができる。
<Operation unit>
The operation unit 208 can be configured by a device such as a keyboard and a mouse (not shown), and the user sets an execution instruction of the medical image processing method (inference method) and conditions necessary for execution via the operation unit 208. Can be done.
 <医療画像処理方法の手順>
 内視鏡システム10を用いた医療画像処理方法の一例(関心領域の認識)について説明する。なお、学習用データを用いたCNN562の学習及び学習済みモデルの変換(畳み込み層生成工程及び畳み込み層生成処理、変換モデル生成工程及び変換モデル生成処理;図2等を参照)を実行済みであるものとする。
<Procedure of medical image processing method>
An example (recognition of a region of interest) of a medical image processing method using the endoscope system 10 will be described. It should be noted that the training of CNN562 using the training data and the conversion of the trained model (convolution layer generation process and convolution layer generation process, conversion model generation process and conversion model generation process; see FIG. 2 and the like) have been executed. And.
 <内視鏡画像の取得>
 医療画像取得部204A(プロセッサ)は、時系列データの一例としての内視鏡画像(被検体の動画像;観察画像、医療画像)を取得する(データ取得工程、データ取得処理)。医療画像取得部204Aは、内視鏡スコープ100で撮影された内視鏡画像を取得してもよいし、記録部207に記録された内視鏡画像を取得してもよい。記録制御部204Dは、取得した内視鏡画像を記録部207に記録することができる。
<Acquisition of endoscopic images>
The medical image acquisition unit 204A (processor) acquires an endoscopic image (moving image of a subject; observation image, medical image) as an example of time-series data (data acquisition step, data acquisition process). The medical image acquisition unit 204A may acquire an endoscope image taken by the endoscope scope 100, or may acquire an endoscope image recorded by the recording unit 207. The recording control unit 204D can record the acquired endoscopic image in the recording unit 207.
 <推論(関心領域の認識)>
 推論部204B(プロセッサ)は、CNN563(学習済みモデル、変換モデル)を用いて、観察画像から関心領域を認識する(推論工程、推論処理)。関心領域の認識には、検出や鑑別が含まれる。図11は変換モデルを用いた推論の様子を示す図であり、推論部204Bは被検体の動画像(時系列データ)をCNN563に入力して検出結果や鑑別結果(推論結果)を得る。なお、推論工程(推論処理)の少なくとも一部をGPU等の並列計算処理装置上で行うことが好ましい。また、推論部204Bは、上述の認識(推論)において内視鏡スコープ100の個体情報を参照してもよい。
<Inference (recognition of area of interest)>
The inference unit 204B (processor) recognizes the region of interest from the observed image using CNN563 (learned model, transformation model) (inference step, inference processing). Recognition of areas of interest includes detection and discrimination. FIG. 11 is a diagram showing the state of inference using the conversion model, and the inference unit 204B inputs a moving image (time series data) of the subject into CNN563 to obtain a detection result and a discrimination result (inference result). It is preferable that at least a part of the inference process (inference processing) is performed on a parallel computing device such as a GPU. Further, the reasoning unit 204B may refer to the individual information of the endoscope scope 100 in the above recognition (inference).
 <観察画像の表示>
 表示制御部204Cは、観察画像を表示装置に識別表示させること(表示制御工程)。この際、表示制御部204Cは、関心領域を識別表示してもよい(関心領域を示す文字、図形、記号の表示や、関心領域の着色等)。
<Display of observation image>
The display control unit 204C causes the display device to identify and display the observed image (display control step). At this time, the display control unit 204C may identify and display the region of interest (display of characters, figures, symbols indicating the region of interest, coloring of the region of interest, etc.).
 主制御部210及び画像処理部204は、観察が終了するまで、上述した処理を繰り返す。 The main control unit 210 and the image processing unit 204 repeat the above-mentioned processing until the observation is completed.
 以上説明したように、内視鏡システム10によれば、推論時に変換モデルであるCNN563を用いることにより、メモリアクセスコストの削減を行いつつ推論を行うことができる。 As described above, according to the endoscope system 10, by using the conversion model CNN563 at the time of inference, it is possible to perform inference while reducing the memory access cost.
 <医療用内視鏡以外への適用>
 本発明の手法(学習済みモデル変換方法、推論方法、学習済みモデル変換装置、推論装置、学習済みモデル)は医療用内視鏡以外にも、畳み込みニューラルネットワークを用いてリアルタイム処理を行うシステム一般に適用することができる。例えば、時系列データ(被検体の動画像)を扱う超音波検査装置やX線透視装置等の医療機器、工業用内視鏡、マシンビジョン、デジタルカメラでの顔認証や防犯カメラ、自動車や飛翔体等の移動体に搭載されるカメラでの物体認識等に適用することができ、これによりメモリアクセスコストの削減を行いつつ推論を行うことができる。
<Application to other than medical endoscopes>
The method of the present invention (trained model conversion method, inference method, trained model conversion device, inference device, trained model) is generally applied to a system that performs real-time processing using a convolutional neural network in addition to a medical endoscope. can do. For example, medical equipment such as ultrasonic inspection equipment and X-ray fluoroscopy equipment that handle time-series data (moving images of subjects), industrial endoscopes, machine vision, face recognition and security cameras with digital cameras, automobiles and flying. It can be applied to object recognition by a camera mounted on a moving body such as a body, thereby making inferences while reducing memory access costs.
 (付記)
 上述した態様に加えて、以下に記載の構成も本発明の範囲に含まれる。
(Additional note)
In addition to the embodiments described above, the configurations described below are also included in the scope of the present invention.
 (付記1)
 医療画像解析処理部(推論部)は、医療画像の画素の特徴量に基づいて、注目すべき領域である注目領域を検出し、
 医療画像解析結果取得部は、医療画像解析処理部の解析結果を取得する医療画像処理装置(推論装置)。
(Appendix 1)
The medical image analysis processing unit (inference unit) detects a region of interest, which is a region of interest, based on the feature amount of pixels of the medical image.
The medical image analysis result acquisition unit is a medical image processing device (inference device) that acquires the analysis results of the medical image analysis processing unit.
 (付記2)
 医療画像解析処理部は、医療画像の画素の特徴量に基づいて、注目すべき対象の有無を検出し、
 医療画像解析結果取得部は、医療画像解析処理部の解析結果を取得する医療画像処理装置。
(Appendix 2)
The medical image analysis processing unit detects the presence or absence of a noteworthy object based on the feature amount of the pixel of the medical image.
The medical image analysis result acquisition unit is a medical image processing device that acquires the analysis results of the medical image analysis processing unit.
 (付記3)
 医療画像解析結果取得部は、
 医療画像の解析結果(入力データ、時系列データ)を記録する記録装置から取得し、
 解析結果は、医療画像に含まれる注目すべき領域である注目領域(関心領域)と、注目すべき対象の有無のいずれか、もしくは両方である医療画像処理装置。
(Appendix 3)
The medical image analysis result acquisition department
Obtained from a recording device that records the analysis results (input data, time series data) of medical images,
The analysis result is a medical image processing apparatus that is either or both of a region of interest (region of interest) that is a region of interest included in a medical image and the presence or absence of an object of interest.
 (付記4)
 医療画像は、白色帯域の光、または白色帯域の光として複数の波長帯域の光を照射して得た通常光画像である医療画像処理装置。
(Appendix 4)
A medical image is a medical image processing device that is a normal optical image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band.
 (付記5)
 医療画像は、特定の波長帯域の光を照射して得た画像であり、
 特定の波長帯域は、白色の波長帯域よりも狭い帯域である医療画像処理装置。
(Appendix 5)
A medical image is an image obtained by irradiating light in a specific wavelength band.
A medical image processing device in which a specific wavelength band is narrower than the white wavelength band.
 (付記6)
 特定の波長帯域は、可視域の青色もしくは、緑色帯域である医療画像処理装置。
(Appendix 6)
A medical image processing device in which a specific wavelength band is a blue or green band in the visible range.
 (付記7)
 特定の波長帯域は、390nm以上450nm以下または530nm以上550nm以下の波長帯域を含み、かつ、特定の波長帯域の光は、390nm以上450nm以下または530nm以上550nm以下の波長帯域内にピーク波長を有する医療画像処理装置。
(Appendix 7)
The specific wavelength band includes a wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less. Image processing device.
 (付記8)
 特定の波長帯域は、可視域の赤色帯域である医療画像処理装置。
(Appendix 8)
A specific wavelength band is a medical image processing device that is a red band in the visible range.
 (付記9)
 特定の波長帯域は、585nm以上615nm以下または610nm以上730nm以下の波長帯域を含み、かつ、特定の波長帯域の光は、585nm以上615nm以下または610nm以上730nm以下の波長帯域内にピーク波長を有する医療画像処理装置。
(Appendix 9)
The specific wavelength band includes a wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less. Image processing device.
 (付記10)
 特定の波長帯域は、酸化ヘモグロビンと還元ヘモグロビンとで吸光係数が異なる波長帯域を含み、かつ、特定の波長帯域の光は、酸化ヘモグロビンと還元ヘモグロビンとで吸光係数が異なる波長帯域にピーク波長を有する医療画像処理装置。
(Appendix 10)
The specific wavelength band includes a wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin, and the light in the specific wavelength band has a peak wavelength in the wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin. Medical image processing equipment.
 (付記11)
 特定の波長帯域は、400±10nm、440±10nm、470±10nm、または、600nm以上750nm以下の波長帯域を含み、かつ、特定の波長帯域の光は、400±10nm、440±10nm、470±10nm、または、600nm以上750nm以下の波長帯域にピーク波長を有する医療画像処理装置。
(Appendix 11)
The specific wavelength band includes a wavelength band of 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm or less, and light in the specific wavelength band is 400 ± 10 nm, 440 ± 10 nm, 470 ±. A medical image processing apparatus having a peak wavelength in a wavelength band of 10 nm or 600 nm or more and 750 nm or less.
 (付記12)
 医療画像は生体内を写した生体内画像であり、
 生体内画像は、生体内の蛍光物質が発する蛍光の情報を有する医療画像処理装置。
(Appendix 12)
A medical image is an in-vivo image that shows the inside of a living body.
An in-vivo image is a medical image processing device that has information on fluorescence emitted by a fluorescent substance in the living body.
 (付記13)
 蛍光は、ピークが390以上470nm以下である励起光を生体内に照射して得る医療画像処理装置。
(Appendix 13)
Fluorescence is a medical image processing device obtained by irradiating a living body with excitation light having a peak of 390 or more and 470 nm or less.
 (付記14)
 医療画像は生体内を写した生体内画像であり、
 特定の波長帯域は、赤外光の波長帯域である医療画像処理装置。
(Appendix 14)
A medical image is an in-vivo image that shows the inside of a living body.
A specific wavelength band is a medical image processing device that is a wavelength band of infrared light.
 (付記15)
 特定の波長帯域は、790nm以上820nm以下または905nm以上970nm以下の波長帯域を含み、かつ、特定の波長帯域の光は、790nm以上820nm以下または905nm以上970nm以下の波長帯域にピーク波長を有する医療画像処理装置。
(Appendix 15)
The specific wavelength band includes a wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less, and the light of the specific wavelength band has a peak wavelength in the wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less. Processing equipment.
 (付記16)
 医療画像取得部は、白色帯域の光、または白色帯域の光として複数の波長帯域の光を照射して得る通常光画像に基づいて、特定の波長帯域の情報を有する特殊光画像を取得する特殊光画像取得部を備え、
 医療画像は特殊光画像である医療画像処理装置。
(Appendix 16)
The medical image acquisition unit acquires a special optical image having information in a specific wavelength band based on a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in the white band. Equipped with an optical image acquisition unit
Medical images are medical image processing devices that are special optical images.
 (付記17)
 特定の波長帯域の信号は、通常光画像に含まれるRGBあるいはCMYの色情報に基づく演算により得る医療画像処理装置。
(Appendix 17)
A medical image processing device that obtains a signal in a specific wavelength band by calculation based on RGB or CMY color information normally included in an optical image.
 (付記18)
 白色帯域の光、または白色帯域の光として複数の波長帯域の光を照射して得る通常光画像と、特定の波長帯域の光を照射して得る特殊光画像との少なくとも一方に基づく演算によって、特徴量画像を生成する特徴量画像生成部を備え、
 医療画像は特徴量画像である医療画像処理装置。
(Appendix 18)
By calculation based on at least one of a normal light image obtained by irradiating light in a white band or light in a plurality of wavelength bands as light in a white band and a special light image obtained by irradiating light in a specific wavelength band. Equipped with a feature amount image generation unit that generates feature amount images,
A medical image is a medical image processing device that is a feature image.
 (付記19)
 付記1から18のいずれか1つに記載の医療画像処理装置と、
 白色の波長帯域の光、または、特定の波長帯域の光の少なくともいずれかを照射して画像を取得する内視鏡と、
 を備える内視鏡装置(推論装置)。
(Appendix 19)
The medical image processing apparatus according to any one of Supplementary note 1 to 18.
An endoscope that irradiates at least one of light in a white wavelength band or light in a specific wavelength band to acquire an image, and
Endoscope device (inference device).
 (付記20)
 付記1から18のいずれか1つに記載の医療画像処理装置を備える診断支援装置(推論装置)。
(Appendix 20)
A diagnostic support device (inference device) including the medical image processing device according to any one of Supplementary note 1 to 18.
 (付記21)
 付記1から18のいずれか1つに記載の医療画像処理装置を備える医療業務支援装置(推論装置)。
(Appendix 21)
A medical work support device (inference device) including the medical image processing device according to any one of Supplementary note 1 to 18.
 以上で本発明の実施形態及び他の例に関して説明してきたが、本発明は上述した態様に限定されず、本発明の精神を逸脱しない範囲で種々の変形が可能である。 Although the embodiments and other examples of the present invention have been described above, the present invention is not limited to the above-described aspects, and various modifications can be made without departing from the spirit of the present invention.
10   内視鏡システム
100  内視鏡スコープ
102  手元操作部
104  挿入部
106  ユニバーサルケーブル
108  ライトガイドコネクタ
112  軟性部
114  湾曲部
116  先端硬質部
116A 先端側端面
123  照明部
123A 照明用レンズ
123B 照明用レンズ
126  鉗子口
130  撮影光学系
132  撮影レンズ
134  撮像素子
136  駆動回路
139  スコープ情報記録部
141  送気送水ボタン
142  吸引ボタン
143  機能ボタン
144  撮影ボタン
170  ライトガイド
200  医療画像処理装置
202  画像入力コントローラ
204  画像処理部
204A 医療画像取得部
204B 推論部
204C 表示制御部
204D 記録制御部
204E スコープ情報取得部
205  通信制御部
206  ビデオ出力部
207  記録部
208  操作部
209  音声処理部
209A スピーカ
210  主制御部
211  ROM
212  RAM
300  光源装置
310  光源
310B 青色光源
310G 緑色光源
310R 赤色光源
310V 紫色光源
330  絞り
340  集光レンズ
350  光源制御部
400  モニタ
500  モデル変換装置
510  プロセッサ
512  学習制御部
514  変換モデル生成部
520  ROM
530  RAM
561A 変換対象層
561B 変換対象層
562A 入力層
562B 中間層
562C 出力層
563  CNN
563A 入力層
563B 中間層
563C 出力層
564  畳み込み層
565  バッチ正則化層
566  全結合層
567  畳み込み層
F1   フィルタ
F2   フィルタ
10 Endoscope system 100 Endoscope scope 102 Hand operation part 104 Insertion part 106 Universal cable 108 Light guide connector 112 Flexible part 114 Curved part 116 Tip hard part 116A Tip side end face 123 Lighting part 123A Lighting lens 123B Lighting lens 126 Force opening 130 Imaging optical system 132 Imaging lens 134 Imaging element 136 Drive circuit 139 Scope information recording unit 141 Air supply / water supply button 142 Suction button 143 Function button 144 Imaging button 170 Light guide 200 Medical image processing device 202 Image input controller 204 Image processing unit 204A Medical image acquisition unit 204B Reasoning unit 204C Display control unit 204D Recording control unit 204E Scope information acquisition unit 205 Communication control unit 206 Video output unit 207 Recording unit 208 Operation unit 209 Audio processing unit 209A Speaker 210 Main control unit 211 ROM
212 RAM
300 Light source device 310 Light source 310B Blue light source 310G Green light source 310R Red light source 310V Purple light source 330 Aperture 340 Condensing lens 350 Light source control unit 400 Monitor 500 Model conversion device 510 Processor 512 Learning control unit 514 Conversion model generation unit 520 ROM
530 RAM
561A Conversion target layer 561B Conversion target layer 562A Input layer 562B Intermediate layer 562C Output layer 563 CNN
563A Input layer 563B Intermediate layer 563C Output layer 564 Convolution layer 565 Batch regularization layer 566 Fully connected layer 567 Convolution layer F1 filter F2 filter

Claims (11)

  1.  少なくとも1つの正則化層を含む学習済みの畳み込みニューラルネットワークに対し、前記正則化層の学習済みパラメータと、前記正則化層に隣接する第1畳み込み層の学習済みパラメータと、に基づいて第2畳み込み層を生成する畳み込み層生成工程と、
     前記正則化層及び前記第1畳み込み層を前記第2畳み込み層に入れ替えて、変換された学習済みモデルである変換モデルを生成する変換モデル生成工程と、
     を有する学習済みモデル変換方法。
    A second convolution based on the trained parameters of the regularization layer and the trained parameters of the first convolution layer adjacent to the regularization layer for a trained convolutional neural network containing at least one regularization layer. The convolutional layer generation process to generate the layer,
    A transformation model generation step of replacing the regularization layer and the first convolution layer with the second convolution layer to generate a transformation model which is a transformed trained model.
    Trained model transformation method with.
  2.  前記畳み込み層生成工程では、前記第1畳み込み層及び前記正則化層から構成される第1処理部と前記第2畳み込み層のみから構成される第2処理部のそれぞれにおいて、同一の特徴量を入力した場合の推論処理結果が等しくなるように前記第2畳み込み層を生成する請求項1に記載の学習済みモデル変換方法。 In the convolution layer generation step, the same feature amount is input to each of the first processing unit composed of the first convolution layer and the regularized layer and the second processing unit composed of only the second convolution layer. The trained model transformation method according to claim 1, wherein the second convolution layer is generated so that the inference processing results are equal to each other.
  3.  前記正則化層はバッチ正則化層である請求項1または2に記載の学習済みモデル変換方法。 The trained model transformation method according to claim 1 or 2, wherein the regularization layer is a batch regularization layer.
  4.  入力データを取得するデータ取得工程と、
     請求項1から3のいずれか1項に記載の学習済みモデル変換方法により得られた前記変換モデルに前記入力データを入力して推論結果を得る推論工程と、
     を有する推論方法。
    Data acquisition process to acquire input data and
    An inference step of inputting the input data into the transformation model obtained by the trained model transformation method according to any one of claims 1 to 3 to obtain an inference result.
    Inference method with.
  5.  前記推論工程の少なくとも一部を並列計算処理装置で実行する請求項4に記載の推論方法。 The inference method according to claim 4, wherein at least a part of the inference process is executed by a parallel computing device.
  6.  前記データ取得工程では前記入力データとして時系列データを取得する請求項4または5に記載の推論方法。 The inference method according to claim 4 or 5, wherein in the data acquisition process, time series data is acquired as the input data.
  7.  前記データ取得工程では前記入力データとして被検体の動画像を取得する請求項6に記載の推論方法。 The inference method according to claim 6, wherein in the data acquisition step, a moving image of a subject is acquired as the input data.
  8.  プロセッサを備える学習済みモデル変換装置であって、
     前記プロセッサは、
     少なくとも1つの正則化層を含む学習済みの畳み込みニューラルネットワークに対し、前記正則化層の学習済みパラメータと、前記正則化層に隣接する第1畳み込み層の学習済みパラメータと、に基づいて第2畳み込み層を生成する畳み込み層生成処理と、
     前記正則化層及び前記第1畳み込み層を前記第2畳み込み層に入れ替えて、変換された学習済みモデルである変換モデルを生成する変換モデル生成処理と、
     を実行する学習済みモデル変換装置。
    A trained model converter with a processor
    The processor
    A second convolution based on the trained parameters of the regularization layer and the trained parameters of the first convolution layer adjacent to the regularization layer for a trained convolutional neural network containing at least one regularization layer. Convolutional layer generation process to generate layers and
    A transformation model generation process in which the regularization layer and the first convolution layer are replaced with the second convolution layer to generate a transformation model which is a transformed trained model.
    A trained model transforming device that runs.
  9.  コンピュータが入力データに対する推論結果を出力させるために用いる学習済みモデルであって、
     学習済みモデル変換装置のプロセッサが、
     少なくとも1つの正則化層を含む学習済みの畳み込みニューラルネットワークに対し、前記正則化層の学習済みパラメータと、前記正則化層に隣接する第1畳み込み層の学習済みパラメータと、に基づいて第2畳み込み層を生成する畳み込み層生成処理と、
     前記正則化層及び前記第1畳み込み層を前記第2畳み込み層に入れ替えて、変換された学習済みモデルである変換モデルを生成する変換モデル生成処理と、
     を実行することにより得られる、
     学習済みモデル。
    A trained model used by a computer to output inference results for input data.
    The processor of the trained model converter
    A second convolution based on the trained parameters of the regularization layer and the trained parameters of the first convolution layer adjacent to the regularization layer for a trained convolutional neural network containing at least one regularization layer. Convolutional layer generation process to generate layers and
    A transformation model generation process in which the regularization layer and the first convolution layer are replaced with the second convolution layer to generate a transformation model which is a transformed trained model.
    Obtained by running,
    Trained model.
  10.  プロセッサと、
     請求項9に記載の学習済みモデルと、
     を備え、
     前記プロセッサは、
     入力データを取得するデータ取得処理と、
     前記学習済みモデルに前記入力データを入力して推論結果を得る推論処理と、
     を実行する推論装置。
    With the processor
    The trained model according to claim 9 and
    Equipped with
    The processor
    Data acquisition process to acquire input data and
    Inference processing that inputs the input data to the trained model and obtains inference results,
    An inference device that runs.
  11.  前記プロセッサは前記推論処理の少なくとも一部を実行する並列計算処理装置を備える請求項10に記載の推論装置。 The inference device according to claim 10, wherein the processor includes a parallel computing device that executes at least a part of the inference processing.
PCT/JP2021/030212 2020-09-28 2021-08-18 Trained model transformation method, inference method, trained model transformation device, trained model, and inference device WO2022064901A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022551194A JPWO2022064901A5 (en) 2021-08-18 Reasoning method and reasoning device
US18/188,449 US20230230369A1 (en) 2020-09-28 2023-03-22 Trained model conversion method, inference method, trained model conversion apparatus, trained model, and inference apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020162403 2020-09-28
JP2020-162403 2020-09-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/188,449 Continuation US20230230369A1 (en) 2020-09-28 2023-03-22 Trained model conversion method, inference method, trained model conversion apparatus, trained model, and inference apparatus

Publications (1)

Publication Number Publication Date
WO2022064901A1 true WO2022064901A1 (en) 2022-03-31

Family

ID=80845091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/030212 WO2022064901A1 (en) 2020-09-28 2021-08-18 Trained model transformation method, inference method, trained model transformation device, trained model, and inference device

Country Status (2)

Country Link
US (1) US20230230369A1 (en)
WO (1) WO2022064901A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082263A1 (en) * 2018-10-24 2020-04-30 Alibaba Group Holding Limited Fast computation of convolutional neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082263A1 (en) * 2018-10-24 2020-04-30 Alibaba Group Holding Limited Fast computation of convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DUAN JIE; ZHANG RUIXIN; HUANG JIAHU; ZHU QIUYU: "The Speed Improvement by Merging Batch Normalization into Previously Linear Layer in CNN", 2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), IEEE, 16 July 2018 (2018-07-16), pages 67 - 72, XP033398373, DOI: 10.1109/ICALIP.2018.8455587 *

Also Published As

Publication number Publication date
JPWO2022064901A1 (en) 2022-03-31
US20230230369A1 (en) 2023-07-20

Similar Documents

Publication Publication Date Title
JP7430287B2 (en) Medical image processing equipment and endoscope systems
JP7062068B2 (en) Image processing method and image processing device
JP7048732B2 (en) Image processing equipment, endoscope system, and image processing method
JP6941233B2 (en) Image processing equipment, endoscopic system, and image processing method
US11200460B2 (en) Image learning device, image learning method, neural network, and image classification device
US20210343011A1 (en) Medical image processing apparatus, endoscope system, and medical image processing method
WO2021149552A1 (en) Medical image processing device, method for operating medical image processing device, and endoscope system
WO2020170809A1 (en) Medical image processing device, endoscope system, and medical image processing method
JP7091349B2 (en) Diagnosis support system, endoscopy system, processor, and how to operate the diagnosis support system
US20220285010A1 (en) Medical image processing apparatus, medical image processing method, and program
JP2022159496A (en) Endoscope system, endoscopic image learning method, and program
WO2021157487A1 (en) Medical image processing device, endoscope system, medical image processing method, and program
US20200383553A1 (en) Image processing device, endoscope system, and image processing method
WO2022064901A1 (en) Trained model transformation method, inference method, trained model transformation device, trained model, and inference device
US20230157768A1 (en) Medical image processing apparatus, medical image processing method, endoscope system, and medical image processing program
JP6931425B2 (en) Medical image learning device, medical image learning method, and program
WO2021153471A1 (en) Medical image processing device, medical image processing method, and program
WO2022181748A1 (en) Medical image processing device, endoscope system, medical image processing method, and medical image processing program
JP2023041458A (en) Image processing device, image processing method, and program
JP2023039245A (en) Image processing device, image processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21872033

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022551194

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21872033

Country of ref document: EP

Kind code of ref document: A1