US20230011053A1 - Learning data generating system and learning data generating method - Google Patents

Learning data generating system and learning data generating method Download PDF

Info

Publication number
US20230011053A1
US20230011053A1 US17/902,009 US202217902009A US2023011053A1 US 20230011053 A1 US20230011053 A1 US 20230011053A1 US 202217902009 A US202217902009 A US 202217902009A US 2023011053 A1 US2023011053 A1 US 2023011053A1
Authority
US
United States
Prior art keywords
image
feature map
neural network
learning data
data generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/902,009
Other languages
English (en)
Inventor
Jun Ando
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Olympus Corp
Original Assignee
Olympus Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Olympus Corp filed Critical Olympus Corp
Assigned to OLYMPUS CORPORATION reassignment OLYMPUS CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ANDO, JUN
Publication of US20230011053A1 publication Critical patent/US20230011053A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Definitions

  • two different images are input to a convolutional neural network (CNN) to extract a feature map that is output of an intermediate layer of the CNN, a feature map of the first image and a feature map of the second image are subjected to addition with weighting to combine the feature maps, and the combined feature maps are input to the next intermediate layer.
  • CNN convolutional neural network
  • learning of combining the feature maps in the intermediate layer is performed. As a result, learning data is padded out.
  • a learning data generating system comprising a processor, the processor being configured to implement:
  • a learning data generating method comprising:
  • FIG. 1 is an explanatory diagram of Manifold Mixup.
  • FIG. 2 illustrates a first configuration example of a learning data generating system.
  • FIG. 3 is a diagram illustrating processing performed in the learning data generating system.
  • FIG. 4 is a flowchart of processes performed by a processing section in the first configuration example.
  • FIG. 5 is a diagram schematically illustrating the processes performed by the processing section in the first configuration example.
  • FIG. 6 illustrates simulation results of image recognition with respect to lesions.
  • FIG. 7 illustrates a second configuration example of the learning data generating system.
  • FIG. 8 is a flowchart of processes performed by the processing section in the second configuration example.
  • FIG. 9 is a diagram schematically illustrating the processes performed by the processing section in the second configuration example.
  • FIG. 10 illustrates an overall configuration example of a CNN.
  • FIG. 11 illustrates an example of a convolutional process.
  • FIG. 12 illustrates an example of a recognition result output by the CNN.
  • FIG. 13 illustrates a system configuration example when an ultrasonic image is input to the learning data generating system.
  • FIG. 14 illustrates a configuration example of a neural network in an ultrasonic diagnostic system.
  • first element is described as being “connected” or “coupled” to a second element, such description includes embodiments in which the first and second elements are directly connected or coupled to each other, and also includes embodiments in which the first and second elements are indirectly connected or coupled to each other with one or more other intervening elements in between.
  • a neural network 5 is a convolutional neural network (CNN) that performs image recognition through a convolutional process. In image recognition after learning, the neural network 5 outputs one score map with respect to one input image. On the other hand, during learning, two input images are input to the neural network 5 , and feature maps are combined in an intermediate layer to thereby pad learning data out.
  • CNN convolutional neural network
  • IMA 1 and IMA 2 are input.
  • a convolutional layer of the CNN outputs image data called a feature map.
  • MAPA 1 is a feature map generated by applying the CNN from the input layer to the certain intermediate layer to the input image IMA 1 .
  • the feature map MAPA 1 has a plurality of channels, each of which constitute one piece of image data. The same applies to MAPA 2 .
  • FIG. 1 illustrates an example where the feature map has three channels.
  • the channels are denoted with ch 1 , ch 2 , and ch 3 .
  • the channel ch 1 of the feature map MAPA 1 and the channel ch 1 of the feature map MAPA 2 are subjected to addition with weighting to generate a channel ch 1 of a combined feature map SMAPA.
  • the channels ch 2 and ch 3 are similarly subjected to addition with weighting to generate channels ch 2 and ch 3 of the combined feature map SMAPA.
  • the combined feature map SMAPA is input to an intermediate layer next to the intermediate layer from which the feature maps MAPA 1 and MAPA 2 are extracted.
  • the neural network 5 outputs a score map as output information NNQA, and the neural network 5 is updated on the basis of the score map and correct information.
  • each channel of the feature map various features are extracted in accordance with a filtering weight coefficient of the convolutional process.
  • channels of the feature maps MAPA 1 and MAPA 2 are subjected to addition with weighting. Therefore, pieces of information on texture of respective feature maps are mixed. Accordingly, there is a risk that a subtle difference in texture is not learned appropriately. For example, there is a risk when, like in lesion discrimination from ultrasonic endoscope images, a subtle difference in texture of lesions is necessary to be recognized, that a sufficient learning effect cannot be obtained.
  • FIG. 2 illustrates a first configuration example of a learning data generating system 10 according to the present embodiment.
  • the learning data generating system 10 includes an acquisition section 110 , a first neural network 121 , a second neural network 122 , a feature map combining section 130 , an output error calculation section 140 , and a neural network updating section 150 .
  • FIG. 3 is a diagram illustrating processing performed in the learning data generating system 10 .
  • the acquisition section 110 acquires a first image IM 1 , a second image IM 2 , first correct information TD 1 corresponding to the first image IM 1 , and second correct information TD 2 corresponding to the second image IM 2 .
  • the first neural network 121 receives input of the first image IM 1 to generate a first feature map MAP 1 , and receives input of the second image IM 2 to generate a second feature map MAP 2 .
  • the feature map combining section 130 replaces a part of the first feature map MAP 1 with a part of the second feature map MAP 2 to generate a combined feature map SMAP. Note that FIG.
  • the second neural network 122 generates output information NNQ on the basis of the combined feature map SMAP.
  • the output error calculation section 140 calculates an output error ERQ on the basis of the output information NNQ, the first correct information TD 1 , and the second correct information TD 2 .
  • the neural network updating section 150 updates the first neural network 121 and the second neural network 122 on the basis of the output error ERQ.
  • replace means deleting a part of channels or regions in the first feature map MAP 1 and disposing a part of channels or regions of the second feature map MAP 2 in place of the deleted part of channels or regions. From the viewpoint of the combined feature map SMAP, it can also be said that a part of the combined feature map SMAP is selected from the first feature map MAP 1 and a remaining part of the combined feature map SMAP is selected from the second feature map MAP 2 .
  • a part of the first feature map MAP 1 is replaced with a part of the second feature map MAP 2 . Consequently, texture of the feature maps is preserved in the combined feature map SMAP without addition with weighting.
  • the feature maps are combined with information of texture being favorably preserved. Consequently, it is possible to improve accuracy of image recognition using AI.
  • the padding method through image combination can be used even when, like in lesion discrimination from ultrasonic endoscope images, a subtle difference in lesion texture is necessary to be recognized, and high recognition performance can be obtained even in a case of a small amount of learning data.
  • the learning data generating system 10 includes a processing section 100 and a storage section 200 .
  • the processing section 100 includes the acquisition section 110 , the neural network 120 , the feature map combining section 130 , the output error calculation section 140 , and the neural network updating section 150 .
  • the learning data generating system 10 is an information processing device such as a personal computer (PC), for example.
  • the learning data generating system 10 may be configured by a terminal device and the information processing device.
  • the terminal device may include the storage section 200 , a display section (not shown), an operation section (not show), and the like
  • the information processing device may include the processing section 100
  • the terminal device and the information processing device may be connected to each other via a network.
  • the learning data generating system 10 may be a cloud system in which a plurality of information processing devices connected via a network performs distributed processing.
  • the storage section 200 stores training data used for learning in the neural network 120 .
  • the training data is configured by training images and correct information attached to the training images.
  • the correct information is also called a training label.
  • the storage section 200 is a storage device such as a memory, a hard disc drive, an optical drive, or the like.
  • the memory is a semiconductor memory, which is a volatile memory such as a RAM or a non-volatile memory such as an EPROM.
  • the processing section 100 is a processing circuit or a processing device including one or a plurality of circuit components.
  • the processing section 100 includes a processor such as a central processing unit (CPU), a graphical processing unit (GPU), a digital signal processor (DSP), or the like.
  • the processor may be an integrated circuit device such as a field programmable gate array (FPGA), an application specific integrated circuit (ASIS), or the like.
  • the processing section 100 may include a plurality of processors.
  • the processor executes a program stored in the storage section 200 to implement a function of the processing section 100 .
  • the program includes description of functions of the acquisitions section 110 , the neural network 120 , the feature map combining section 130 , the output error calculation section 140 , and the neural network updating section 150 .
  • the storage section 200 stores a learning model of the neural network 120 .
  • the learning model includes description of algorithm of the neural network 120 and parameters used for the learning model.
  • the parameters include a weighted coefficient between nodes, and the like.
  • the processor uses the learning model to execute an inference process of the neural network 120 , and uses the parameters that have been updated through learning to update the parameters stored in the storage section 200 .
  • FIG. 4 is a flowchart of processes performed by the processing section 100 in the first configuration example
  • FIG. 5 is a diagram schematically illustrating the processes.
  • step S 101 the processing section 100 initializes the neural network 120 .
  • steps S 102 and S 103 the first image IM 1 and the second image IM 2 are input to the processing section 100 .
  • steps S 104 and S 105 the first correct information TD 1 and the second correct information TD 2 are input to the processing section 100 .
  • Steps S 102 to S 105 may be executed in random order without being limited to the execution order illustrated in FIG. 4 , or may be executed in a parallel manner.
  • the acquisition section 110 includes an image acquisition section 111 that acquires the first image IM 1 and the second image IM 2 from the storage section 200 and a correct information acquisition section 112 that acquires the first correct information TD 1 and the second correct information TD 2 from the storage section 200 .
  • the acquisition section 110 is, for example, an access control section that controls access to the storage section 200 .
  • a recognition target TG 1 appears in the first image IM 1
  • a recognition target TG 2 in a classification category different from that of the recognition target TG 1 appears in the second image IM 2
  • the storage section 200 stores a first training image group and a second training image group that are in different classification categories in image recognition.
  • the classification categories include classifications of organs, parts in an organ, lesions, or the like.
  • the image acquisition section 111 acquires an arbitrary image from the first training image group as the first image IM 1 , and acquires an arbitrary image from the second training image group as the second image IM 2 .
  • step S 108 the processing section 100 applies the first neural network 121 to the first image IM 1 , and the first neural network 121 outputs a first feature map MAP 1 . Furthermore, the processing section 100 applies the first neural network 121 to the second image IM 2 , and the first neural network 121 outputs a second feature map MAP 2 .
  • step S 109 the feature map combining section 130 combines the first feature map MAP 1 with the second feature map MAP 2 and outputs the combined feature map SMAP.
  • step S 110 the processing section 100 applies the second neural network 122 to the combined feature map SMAP, and the second neural network 122 outputs the output information NNQ.
  • the neural network 120 is a CNN
  • the CNN divided at an intermediate layer corresponds to the first neural network 121 and the second neural network 122 .
  • layers from an input layer to the above-mentioned intermediate layer constitute the first neural network
  • layers from an intermediate layer next to the above-mentioned intermediate layer to an output layer constitute the second neural network 122 .
  • the CNN has a convolutional layer, a normalization layer, an activation layer, and a pooling layer. Any one of these layers may be used as a border to divide the CNN into the first neural network 121 and the second neural network 122 .
  • a plurality of intermediate layers exists. At which intermediate layer of the plurality of layers division is performed may be differentiated for each image input.
  • FIG. 5 illustrates an example where the first neural network 121 outputs a feature map having six channels.
  • Each channel of the feature map is image data having pixels to which output values of nodes are allocated, respectively.
  • the feature map combining section 130 replaces the channels ch 2 and ch 3 of the first feature map MAP 1 with the channels ch 2 and ch 3 of the second feature map MAP 2 .
  • channels ch 1 , ch 4 , ch 5 , and ch 6 of the first feature map MAP 1 are allocated to a part of channels ch 1 , ch 4 , ch 5 , and ch 6 of the combined feature map SMAP
  • channels ch 2 and ch 3 of the second feature map MAP 2 are allocated to a remaining part of channels ch 2 and ch 3 of the combined feature map SMAP.
  • a rate of each feature map in the combined feature map SMAP is referred to as a replacement rate.
  • the replacement rate of the first feature map MAP 1 is 4/6 ⁇ 0.7
  • the replacement rate of the second feature map MAP 2 is 2/6 ⁇ 0.3.
  • the number of channels of the feature maps is not limited to six.
  • a channel to be replaced and the number of channels to be replaced are not limited to the example of FIG. 5 .
  • the channel and the number may be set at random for each image input.
  • the output information NNQ to be output by the second neural network 122 is data called a score map.
  • the score map has a plurality of channels, and an individual channel corresponds to an individual classification category.
  • FIG. 5 illustrates an example where two classification categories exist.
  • Each channel of the score map is image data having pixels to which estimation values are allocated.
  • the estimation value is a value indicating probability that the recognition target has been detected in the pixel.
  • the neural network updating section 150 updates the neural network 120 on the basis of the output error ERQ. Updating the neural network 120 means updating parameters such as a weighted coefficient between nodes. As an updating method, a variety of publicly-known methods such as a back propagation method can be adopted.
  • the processing section 100 determines whether or not termination conditions of learning are satisfied.
  • the termination conditions include the output error ERQ that has become equal to or lower than a predetermined output error, learning of the predetermined number of images, and the like. The processing section 100 terminates the processes of this flow when the termination conditions are satisfied, whereas the processing section 100 returns to step S 102 when the termination conditions are not satisfied.
  • FIG. 6 illustrates simulation results of image recognition with respect to lesions.
  • the horizontal axis represents a correct rate with respect to lesions of all classification categories as recognition targets.
  • the vertical axis represents a correct rate with respect to minor lesions among the classification categories as the recognition targets.
  • DA represents a simulation result of a conventional method of padding the learning data out merely from a single image.
  • DB represents a simulation result of Manifold Mixup.
  • DC represents a simulation result of the method according to the present embodiment. Put on the respective results are plots of three points, which are results of simulations performed with differentiating offsets with respect to detection of minor lesions.
  • the first feature map MAP 1 includes a first plurality of channels
  • the second feature map MAP 2 includes a second plurality of channels.
  • the feature map combining section 130 replaces the whole of a part of the first plurality of channels with the whole of a part of the second plurality of channels.
  • a part of the first feature map MAP 1 can be replaced with a part of the second feature map MAP 2 .
  • texture is mixed in such a manner that the first image IM 1 is selected for certain texture and the second image IM 2 is selected for another texture.
  • the feature map combining section 130 may replace a partial region of a channel included in the first plurality of channels with a partial region of a channel included in the second plurality of channels.
  • the partial region of the channel instead of the whole of the channel can be replaced.
  • the recognition target exists, it is possible to generate a combined feature map seemed to fit, in a background of one feature map, the recognition target of the other feature map.
  • a part of the recognition target it is possible to generate a combined feature map seemed to combine recognition targets of two feature maps.
  • the feature map combining section 130 may replace a band-like region of a channel included in the first plurality of channels with a band-like region of a channel included in the second plurality of channels.
  • a method for replacing the partial region of the channel is not limited to the above.
  • the feature map combining section 130 may replace a region set to be periodic in a channel included in the first plurality of channels with a region set to be periodic in a channel included in the second plurality of channels.
  • the region set to be periodic is, for example, a striped region, a checkered-pattern region, or the like.
  • the feature map combining section 130 may determine a size of the partial region to be replaced in the channel included in the first plurality channels on the basis of classification categories of the first image and the second image.
  • the feature map is replaced in a region having a size corresponding to the classification category of the image. For example, when a size specific to a recognition target such as a lesion in a classification category is predefined, the feature map is replaced in a region having the specific size. As a result, it is possible to generate, for example, a combined feature map seemed to fit, in a background of one feature map, the recognition target of the other feature map.
  • the first image IM 1 and the second image IM 2 are ultrasonic images. Note that a system for performing learning based on the ultrasonic images will be described later referring to FIG. 13 and the like.
  • the ultrasonic image is normally a monochrome image, which requires texture as an important element in image recognition.
  • the present embodiment enables highly-accurate image recognition based on a subtle difference in texture, and makes it possible to generate an image recognition system appropriate for ultrasonic diagnostic imaging.
  • the application target of the present embodiment is not limited to the ultrasonic image, and application to various medical images is allowed.
  • the method of the present embodiment is also applicable to medical images acquired by an endoscope system that captures images using an image sensor.
  • the first image IM 1 and the second image IM 2 are classified into different classification categories.
  • the first feature map MAP 1 and the second feature map MAP 2 are combined, and learning is performed. Consequently, a boundary between the classification category of the first image IM 1 and the classification category of the second image IM 2 is learned.
  • combination is performed without losing a subtle difference in texture of the feature maps, and the boundary of the classification categories is appropriately learned.
  • the classification category of the first image IM 1 and the classification category of the second image IM 2 are a combination difficult to be discriminated in an image recognition process. By learning a boundary of such classification categories using the method of the present embodiment, recognition accuracy of classification categories difficult to be discriminated improves.
  • the first image IM 1 and the second image IM 2 may be classified into the same classification category. By combining recognition targets whose classification categories are same but features are different, it is possible to generate image data having greater diversity in the same category.
  • the output error calculation section 140 calculates the first output error ERR 1 on the basis of the output information NNQ and the first correct information TD 1 , calculates the second output error ERR 2 on the basis of the output information NNQ and the second correct information TD 2 , and calculates a weighted sum of the first output error ERR 1 and the second output error ERR 2 as the output error ERQ.
  • the output information NNQ constitutes information in which an estimation value to the classification category of the first image IM 1 and an estimation value to the classification category of the second image IM 2 are subjected to addition with weighting.
  • a weighted sum of the first output error ERR 1 and the second output error ERR 2 is calculated to thereby obtain the output error ERQ corresponding to the output information NNQ.
  • the feature map combining section 130 replaces a part of the first feature map MAP 1 with a part of the second feature map MAP 2 at a first rate.
  • the firs rate corresponds to the replacement rate 0.7 described referring to FIG. 5 .
  • the output error calculation section 140 calculates a weighted sun of the first output error ERR 1 and the second output error ERR 2 by weighting based on the first rate, and the calculated weighted sum is defined as the output error ERQ.
  • the above-mentioned weighting of the estimation values in the output information NNQ is weighting according to the first rate.
  • the weighting based on the first rate is used to calculate the weighted sum of the first output error ERR 1 and the second output error ERR 2 , to thereby obtain the output error ERQ corresponding to the output information NNQ.
  • the output error calculation section 140 calculates the weighted sum of the first output error ERR 1 and the second output error ERR 2 at a rate same as the first rate.
  • the above-mentioned weighting of the estimation values in the output information NNQ is expected to be a rate same as the first rate.
  • the weighted sum of the first output error ERR 1 and the second output error ERR 2 is calculated at the rate same as the first rate, thereby weighting of the estimation values in the output information NNQ is fed back so as to become the first rate as an expected value.
  • the output error calculation section 140 may calculate the weighted sum of the first output error ERR 1 and the second output error ERR 2 at a rate different from the first rate.
  • the weighting may be performed so that the estimation value of a minor category such as a rare lesion is offset in a forward direction.
  • a minor category such as a rare lesion
  • the weighting of the first output error ERR 1 is made lager than the first rate.
  • feedback is performed so as to facilitate detection of the minor category to which recognition accuracy is difficult to be improved.
  • the output error calculation section 140 may generate correct probability distribution from the first correct information TD 1 and the second correct information TD 2 and define KL divergence calculated from the output information NNQ and the correct probability distribution as the output error ERQ.
  • FIG. 7 illustrates a second configuration example of the learning data generating system 10 .
  • the image acquisition section 111 includes a data augmentation section 160 .
  • FIG. 8 is a flowchart of processes performed by the processing section 100 in the second configuration example
  • FIG. 9 is a diagram schematically illustrating the processes. Note that components and steps described in the first configuration example are denoted with the same reference numerals and description about the components and the steps is omitted as appropriate.
  • the storage section 200 stores a first input image IM 1 ′ and a second input image IM 2 ′.
  • the image acquisition section 111 reads the first input image IM 1 ′ and the second input image IM 2 ′ from the storage section 200 .
  • the data augmentation section 160 performs at least one of a first augmentation process of subjecting the first input image IM 1 ′ to data augmentation to generate the first image IM 1 and a second augmentation process of subjecting the second input image IM 2 ′ to data augmentation to generate the second image IM 2 .
  • the data augmentation is image processing with respect to input images of the neural network 120 .
  • the data augmentation is a process of converting input images into images suitable for learning, image processing for generating images with different appearance of a recognition target to improve accuracy of learning, or the like.
  • at least one of the first input image IM 1 ′ and the second input image IM 2 ′ is subjected to data augmentation to enable effective learning.
  • the data augmentation section 160 performs, in step S 106 , data augmentation of the first input image IM 1 ′ and performs, in step S 107 , data augmentation of the second input image IM 2 ′. Instead, both or at least one of steps S 106 and S 107 may be performed.
  • FIG. 9 illustrates an example of executing merely the second augmentation process of augmenting data of the second input image IM 2 ′.
  • the second augmentation process includes a process of performing position correction of the second recognition target TG 2 with respect to the second input image IM 2 ′ on the basis of a positional relationship between the first recognition target TG 1 appearing in the first input image IM 1 ′ and the second recognition target TG 2 appearing in the second input image IM 2 ′.
  • the position correction is affine transformation including parallel movement.
  • the data augmentation section 160 grasps the position of the first recognition target TG 1 from the first correct information TD 1 and grasps the position of the second recognition target TG 2 from the second correct information TD 2 , and performs correction so as to make the positions conform to each other. For example, the data augmentation section 160 performs position correction so as to make a barycentric position of the first recognition target TG 1 and a barycentric position of the second recognition target TG 2 conform to each other.
  • the first augmentation process includes a process of performing position correction of the first recognition target TG 1 with respect to the first input image IM 1 ′ on the basis of a positional relationship between the first recognition target TG 1 appearing in the first input image IM 1 ′ and the second recognition target TG 2 appearing in the second input image IM 2 ′.
  • the position of the first recognition target TG 1 in the first image IM 1 and the position of the second recognition target TG 2 in the second image IM 2 conform to each other.
  • the position of the first recognition target TG 1 and the position of the second recognition target TG 2 conform to each other also in the combined feature map SMPA in which the feature maps have been replaced, and therefore it is possible to appropriately learn the boundary of the classification categories.
  • the first augmentation process and the second augmentation process are not limited to the above-mentioned position correction.
  • the data augmentation section 160 may perform at least one of the first augmentation process and the second augmentation process by at least one process selected from color correction, brightness correction, a smoothing process, a sharpening process, noise addition, and affine transformation.
  • the neural network 120 is a CNN.
  • a basic configuration of the CNN will be described.
  • FIG. 10 illustrates an overall configuration example of the CNN.
  • the input layer of the CNN is a convolutional layer followed by a normalization layer and an activation layer. Next, a pooling layer, a convolutional layer, a normalization layer, and an activation layer constitute one set, and the same sets are repeated.
  • the output layer of the CNN is a convolutional layer.
  • the convolutional layer outputs a feature map by performing a convolutional process with respect to input. There is a tendency that the number of channels of the feature map increases and the size of the image of one channel decreases in the convolutional layers of the latter stages.
  • Each layer of the CNN includes a node, and an internode between the node and a node of the next layer is joined by a weighted coefficient.
  • the weighted coefficient of the internode is updated based on the output error, and consequently learning of the neural network 120 is performed.
  • FIG. 11 illustrates an example of the convolutional process.
  • description is made to the example where an output map of two channels is generated from an input map of three channels and a filter size of the weighted coefficient is 3 ⁇ 3.
  • the input map corresponds to an input image.
  • the output map corresponds to a score map.
  • both the input map and the output map are feature maps.
  • y oc n,m is a value arranged in an n-th row and an m-th column of a channel oc in the output map.
  • w oc,ic j,i is a value arranged in a j-th row and an i-th column of a channel ic of a set oc in the weighted coefficient filter.
  • x ic n+j,m+i is a value arranged in an n+j-th row and an m+i-th column of the channel ic in the input map.
  • FIG. 12 illustrates an example of a recognition result output by the CNN.
  • the output information which indicates the recognition result output from the CNN, is a score map in which estimation values are allocated to respective positions (u, v).
  • the estimation value indicates probability that the recognition target has been detected at that position.
  • the correct information is mask information that indicates an ideal recognition result in which a value 1 is allocated to a position (u, v) where the recognition target exists.
  • the above-mentioned weighted coefficient is updated so as to make the error between the correct information and the output information smaller.
  • FIG. 13 illustrates a system configuration example when an ultrasonic image is input to the learning data generating system 10 .
  • the system illustrated in FIG. 13 includes an ultrasonic diagnostic system 20 , a training data generating system 30 , the learning data generating system 10 , and an ultrasonic diagnostic system 40 . Note that those systems are not necessarily in always-on connection, and may be connected as appropriate at each stage of operation.
  • the ultrasonic diagnostic system 20 captures an ultrasonic image as a training image, and transfers the captured ultrasonic image to the training data generating system 30 .
  • the training data generating system 30 displays the ultrasonic image on a display, accepts input of correct information from a user, associates the ultrasonic image with the correct information to generate training data, and transfers the training data to the learning data generating system 10 .
  • the learning data generating system 10 performs learning of the neural network 120 on the basis of the training data and transfers a learned model to the ultrasonic diagnostic system 40 .
  • the ultrasonic diagnostic system 40 may be the same system as the ultrasonic diagnostic system 20 , or may be a different system.
  • the ultrasonic diagnostic system 40 includes a probe 41 and a processing section 42 .
  • the probe 41 detects ultrasonic echoes from a subject.
  • the processing section 42 generates an ultrasonic image on the basis of the ultrasonic echoes.
  • the processing section 42 includes a neural network 50 that performs an image recognition process based on the learned model to the ultrasonic image.
  • the processing section 42 displays a result of the image recognition process on the display.
  • FIG. 14 is a configuration example of the neural network 50 .
  • the neural network 50 has algorithm same as that of the neural network 120 of the learning data generating system 10 and uses parameters such as the weighted coefficient included in the learned model to thereby perform an image recognition process reflecting a learning result in the learning data generating system 10 .
  • a first neural network 51 and a second neural network 52 correspond to the first neural network 121 and the second neural network 122 of the learning data generating system 10 , respectively.
  • a single image IM is input to the first neural network 51 and a feature map MAP corresponding to the image IM is output from the first neural network 51 . In the ultrasonic diagnostic system 40 , combination of feature maps is not performed.
  • the feature map MAP output by the first neural network 51 serves as input of the second neural network 52 .
  • FIG. 14 illustrates the first neural network 51 and the second neural network 52 for comparison with the learning data generating system 10 , the neural network 50 is not divided in an actual process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
US17/902,009 2020-03-04 2022-09-02 Learning data generating system and learning data generating method Pending US20230011053A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/009215 WO2021176605A1 (ja) 2020-03-04 2020-03-04 学習データ作成システム及び学習データ作成方法

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/009215 Continuation WO2021176605A1 (ja) 2020-03-04 2020-03-04 学習データ作成システム及び学習データ作成方法

Publications (1)

Publication Number Publication Date
US20230011053A1 true US20230011053A1 (en) 2023-01-12

Family

ID=77613164

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/902,009 Pending US20230011053A1 (en) 2020-03-04 2022-09-02 Learning data generating system and learning data generating method

Country Status (4)

Country Link
US (1) US20230011053A1 (ja)
JP (1) JP7298010B2 (ja)
CN (1) CN115210751A (ja)
WO (1) WO2021176605A1 (ja)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210334580A1 (en) * 2020-04-23 2021-10-28 Hitachi, Ltd. Image processing device, image processing method and image processing system
US20220004827A1 (en) * 2020-07-02 2022-01-06 Samsung Electronics Co., Ltd. Method and appartaus for data efficient semantic segmentation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022250071A1 (ja) * 2021-05-27 2022-12-01 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 学習方法、学習装置及びプログラム
WO2023243397A1 (ja) * 2022-06-13 2023-12-21 コニカミノルタ株式会社 認識装置、認識システム及びコンピュータープログラム

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7300811B2 (ja) * 2018-06-11 2023-06-30 キヤノンメディカルシステムズ株式会社 医用情報処理装置、医用情報処理方法、およびプログラム
JP2020017229A (ja) * 2018-07-27 2020-01-30 国立大学法人 東京大学 画像処理装置、画像処理方法及び画像処理プログラム

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210334580A1 (en) * 2020-04-23 2021-10-28 Hitachi, Ltd. Image processing device, image processing method and image processing system
US11954600B2 (en) * 2020-04-23 2024-04-09 Hitachi, Ltd. Image processing device, image processing method and image processing system
US20220004827A1 (en) * 2020-07-02 2022-01-06 Samsung Electronics Co., Ltd. Method and appartaus for data efficient semantic segmentation
US11687780B2 (en) * 2020-07-02 2023-06-27 Samsung Electronics Co., Ltd Method and apparatus for data efficient semantic segmentation

Also Published As

Publication number Publication date
CN115210751A (zh) 2022-10-18
JP7298010B2 (ja) 2023-06-26
JPWO2021176605A1 (ja) 2021-09-10
WO2021176605A1 (ja) 2021-09-10

Similar Documents

Publication Publication Date Title
US20230011053A1 (en) Learning data generating system and learning data generating method
US11288550B2 (en) Data processing apparatus and method, recognition apparatus, learning data storage apparatus, machine learning apparatus, and program
JP7135504B2 (ja) 画像識別装置、画像識別方法及びプログラム
US20200090345A1 (en) Method and System for Deep Motion Model Learning in Medical Images
EP3111422B1 (en) System and method for auto-contouring in adaptive radiotherapy
CN111902825A (zh) 多边形对象标注系统和方法以及训练对象标注系统的方法
CN113450320B (zh) 一种基于较深网络结构的超声结节分级与良恶性预测方法
CN111626379B (zh) 肺炎x光图像检测方法
CN113673710A (zh) 用于训练评估算法的计算机实现的方法和系统
CN112508902A (zh) 白质高信号分级方法、电子设备及存储介质
KR102419270B1 (ko) Mlp 기반 아키텍처를 통한 의료영상 세그먼테이션 장치 및 그 방법
AU2021100684A4 (en) DEPCADDX - A MATLAB App for Caries Detection and Diagnosis from Dental X-rays
CN117876690A (zh) 一种基于异构UNet的超声影像多组织分割方法和系统
KR102476888B1 (ko) 디지털 병리이미지의 인공지능 진단 데이터 처리 장치 및 그 방법
CN111126424A (zh) 一种基于卷积神经网络的超声图像分类方法
EP3864620B1 (en) Correcting segmentation of medical images using a statistical analysis of historic corrections
CN114511642A (zh) 用于预测虚拟定位片流的方法和系统
US20220044454A1 (en) Deep reinforcement learning for computer assisted reading and analysis
Nguyen et al. Class label conditioning diffusion model for robust brain tumor mri synthesis
Demin et al. Semantic segmentation of lung radiographs using U-net type neural network
CN112750137B (zh) 基于深度学习的肝脏肿瘤分割方法及系统
US20230060113A1 (en) Editing presegmented images and volumes using deep learning
CN118196013A (zh) 支持多医生协同监督的多任务医学图像分割方法及系统
JP2023157208A (ja) 学習装置、学習方法およびプログラム
CN115578360A (zh) 一种针对超声心动图像的多目标语义分割方法

Legal Events

Date Code Title Description
AS Assignment

Owner name: OLYMPUS CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ANDO, JUN;REEL/FRAME:060975/0001

Effective date: 20220823

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION