JP2021527859A

JP2021527859A - Irregular shape segmentation in an image using deep region expansion

Info

Publication number: JP2021527859A
Application number: JP2020556276A
Authority: JP
Inventors: デュフォール、ポール
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2018-06-21
Filing date: 2019-05-13
Publication date: 2021-10-14
Also published as: CN112189217A; WO2019243910A1; GB2589478A; GB202019774D0; DE112019001959T5; GB2589478B

Abstract

画像内の対象の領域を決定するためのシステム。このシステムは、メモリおよび電子プロセッサを含む。システムに含まれる電子プロセッサは、メモリに接続され、空間格子のノードの内部状態を初期化するように構成される。空間格子の各ノードは、画像のピクセルに対応し、画像の隣のピクセルを表す少なくとも１つのノードに接続される。また、電子プロセッサは、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬を使用して空間格子内の各ノードの内部状態を反復的に更新し、空間格子の収束でのノードの内部状態に基づいて画像内の対象の領域を識別するように構成される。一実施形態では、電子プロセッサは、画像の画像ピラミッドを作成するように構成される。 A system for determining the area of interest in an image. This system includes a memory and an electronic processor. The electronic processor included in the system is connected to memory and is configured to initialize the internal state of the nodes in the spatial grid. Each node in the spatial grid corresponds to a pixel in the image and is connected to at least one node that represents the pixel next to the image. The electronic processor also uses neural networks to iteratively update the internal state of each node in the spatial grid using spatially gated propagation, and the interior of the node in the convergence of the spatial grid. It is configured to identify the area of interest in the image based on the state. In one embodiment, the electronic processor is configured to create an image pyramid of images.

Description

本明細書に記載された実施形態は、生物医学画像などの画像のセグメント化に関連しており、特に、時間および空間の両方においてニューラル・ネットワークのゲーティング・データの伝搬を使用する画像のセグメント化に関連している。 The embodiments described herein relate to segmentation of images, such as biomedical images, and in particular, segments of images that use the propagation of neural network gating data both in time and space. It is related to the conversion.

本明細書に記載された実施形態は、回帰型ニューラル・ネットワーク（ＲＮＮ：recurrent neural networks）および畳み込みニューラル・ネットワーク（ＣＮＮ：convolutional neural networks）において使用される原理を結合する、新しいタイプのニューラル・ネットワーク・ユニットに関連している。ＲＮＮは、入力シーケンスを受信し、一度にシーケンスの１つの要素を読み取って処理する。ＲＮＮがシーケンス内の各要素を処理するときに、ＲＮＮは、シーケンスに関する知識を変更し、この知識は、ＲＮＮの内部状態に格納される。ＲＮＮは、入力シーケンスのすべてを読み取った後に、内部状態の一部または全部を使用して第２のシーケンスを出力するか、または単一の予測を行う。ＲＮＮの例は、１つまたは複数のＬＳＴＭセルを含んでいる長短期記憶（ＬＳＴＭ：Long Short-Term Memory）ニューラル・ネットワークである。各ＬＳＴＭセルは、セルの前の状態を格納し、この前の状態は、ＬＳＴＭニューラル・ネットワークの他のコンポーネントに提供され得る。各ＬＳＴＭセルは、入力ゲート、忘却ゲート、および出力ゲートを含んでいる。ＬＳＴＭは、消失する勾配に関連するＲＮＮのトレーニングに伴う問題を解決するために導入された。 The embodiments described herein combine principles used in recurrent neural networks (RNNs) and convolutional neural networks (CNNs), a new type of neural network. -Related to the unit. The RNN receives the input sequence and reads and processes one element of the sequence at a time. As the RNN processes each element in the sequence, the RNN changes its knowledge about the sequence, and this knowledge is stored in the internal state of the RNN. After reading the entire input sequence, the RNN uses some or all of the internal state to output a second sequence or make a single prediction. An example of an RNN is a Long Short-Term Memory (LSTM) neural network containing one or more LSTM cells. Each LSTM cell stores the previous state of the cell, and this previous state may be provided to other components of the LSTM neural network. Each LSTM cell contains an input gate, an oblivion gate, and an output gate. LSTMs have been introduced to solve the problems associated with RNN training associated with disappearing gradients.

ＣＮＮは、フィルタ（カーネル）を入力（例えば、画像）に適用し、入力に関する予測を行う。１つの例では、この予測は、画像が一連のカテゴリのうちのどれに属するかである。フィルタは、入力画像内で検出されることがある特徴に対応する。例えば、画像がＣＮＮに入力される場合、入力画像内の隣接するピクセルのブロックにフィルタが適用されて中間画像を生成し、この中間画像は、画像内の各位置で各特徴がどの程度強く表されるかを示す。特徴の内容は、特徴に関連付けられたフィルタの重みによって示される。この重みは、隣接するピクセルの各ブロックに含まれているピクセルに掛け合わされる。例えば、ＣＮＮへの入力が手書き数字である場合、ＣＮＮは、手書き数字を複数のカテゴリ（この場合、カテゴリは数字１〜９である）のうちの１つに属しているとして分類する。手書き数字のＣＮＮの分類は、数字に関連付けられているとＣＮＮが検出した画像の特徴、およびそれらの特徴が、手書き数字が数字１〜９のうちの１つであるということをどの程度強く示しているかに基づく。 The CNN applies a filter (kernel) to the input (eg, an image) to make predictions about the input. In one example, this prediction is which of a series of categories the image belongs to. Filters correspond to features that may be detected in the input image. For example, when an image is input to a CNN, a filter is applied to blocks of adjacent pixels in the input image to generate an intermediate image, which shows how strongly each feature is represented at each position in the image. Indicates whether it will be done. The content of a feature is indicated by the weight of the filter associated with the feature. This weight is multiplied by the pixels contained in each block of adjacent pixels. For example, if the input to the CNN is a handwritten digit, the CNN classifies the handwritten digit as belonging to one of a plurality of categories (in this case, the categories are numbers 1-9). The CNN classification of handwritten numbers indicates the features of the image that CNN has detected as being associated with the number, and how strongly those features indicate that the handwritten number is one of numbers 1-9. Based on what you are doing.

本明細書に記載された実施形態は、生物医学画像のセグメント化に関連している。生物医学画像のセグメント化は、画像（特に、医用画像）内の物体の境界を識別することを含む。以前は、画像内の物体を識別するために、領域拡張が使用された。領域拡張では、対象の物体内のどこかに、シード・ピクセルが配置される。シード・ピクセルは、画像内に配置された後に、類似する強度または明度の隣接するピクセルに、繰り返し広げられる。物体の境界に達したときに、ピクセルの拡張が止まる。領域拡張では、強度しきい値または明度しきい値を下回る低下によって、境界が定義されてよい。 The embodiments described herein relate to segmentation of biomedical images. Segmentation of a biomedical image involves identifying the boundaries of an object within the image (particularly a medical image). Previously, area expansion was used to identify objects in an image. Region expansion places seed pixels somewhere within the object of interest. Seed pixels are placed in the image and then repeatedly spread to adjacent pixels of similar intensity or brightness. When the boundary of the object is reached, the pixel expansion stops. For region expansion, boundaries may be defined by a drop below the intensity or lightness threshold.

領域拡張に伴う１つの問題は、医用画像内の隣接する明るいピクセルとのわずかな接続でさえ、対象の物体の外側への領域の拡張を引き起こす可能性があるということである。例えば、図１に示されているように、２つの明るい組織領域（肺の内部に位置する１つの組織領域および肺の外部に位置する１つの組織領域）が、小さい明るい組織の断片によって接続されている場合、領域拡張は、２つの明るい組織領域を同じ塊または物体に属しているとして誤って示す。したがって、多くの場合、領域拡張は破棄され、等位集合、条件付き確率場（ＣＲＦ：conditional random fields）、能動輪郭、およびグラフ・カットなどの、より洗練された方法が好まれる。 One problem with region expansion is that even the slightest connection with adjacent bright pixels in a medical image can cause region expansion to the outside of the object of interest. For example, as shown in FIG. 1, two bright tissue regions (one tissue region located inside the lung and one tissue region located outside the lung) are connected by a small piece of bright tissue. If so, region expansion incorrectly indicates two bright tissue regions as belonging to the same mass or object. Therefore, region extensions are often discarded and more sophisticated methods such as stage sets, conditional random fields (CRFs), active contours, and graph cuts are preferred.

ＣＮＮは、ピクセルの隣接性の優位性を破棄する。むしろＣＮＮは、規則性を有する物体を識別する。規則性を有している物体を使用して、ある種類の物体として物体を分類するように、ＣＮＮをトレーニングすることができる。しかしＣＮＮは、腫瘤、病変などの、規則的でない形状を正確に認識してセグメント化することが、できないことがある。したがってＣＮＮは、多くの場合、配列、強度などにおいて変化する形状などの、医用画像内の不規則な形状の境界を正確に決定することができない。 CNN discards the pixel adjacency advantage. Rather, the CNN identifies objects that have regularity. Objects that have regularity can be used to train CNNs to classify objects as a type of object. However, CNNs may not be able to accurately recognize and segment irregular shapes such as masses and lesions. Therefore, CNNs are often unable to accurately determine the boundaries of irregular shapes in medical images, such as shapes that vary in alignment, intensity, and so on.

したがって、本明細書に記載された実施形態は、不規則な形状の対象の物体の境界を識別するための以前の解決策に対応する、前述した問題に対する技術的解決策を提供する。具体的には、本明細書に記載された実施形態は、ＣＮＮの空間的接続性を、ＲＮＮにおいて使用される時間的ゲーティングと組み合わせ、画像内の不規則な構造をセグメント化するためのより高性能な方法を提供する。特に、本明細書に記載された実施形態は、前の内部状態、および分類されているピクセルに隣接するピクセルを表すノードの現在の値に基づいて画像内のピクセルを分類する、新しいタイプのユニットを提供する。この新しいタイプのユニットは、本明細書ではゲート付き時空間ユニット（gated spatiotemporal unit）と呼ばれ、通常はＣＮＮに関連付けられる空間認識を含むゲート付き回帰型ユニットである。例えば、各時間ステップで、各ノードが、それ自体の前の内部状態または隣のノードのうちの１つの内部状態の値でその内部状態を更新するかどうかを決定する。 Accordingly, the embodiments described herein provide technical solutions to the aforementioned problems, corresponding to previous solutions for identifying boundaries of objects of irregular shape. Specifically, the embodiments described herein combine the spatial connectivity of CNNs with the temporal gating used in RNNs to better segment irregular structures within an image. Provides a high performance method. In particular, the embodiments described herein are a new type of unit that classifies pixels in an image based on its previous internal state and the current values of nodes that represent pixels adjacent to the pixel being classified. I will provide a. This new type of unit, referred to herein as a gated spatiotemporal unit, is a gated recurrent unit that includes spatial cognition, usually associated with a CNN. For example, at each time step, each node determines whether to update its internal state with the value of its previous internal state or one of its neighboring nodes.

したがって、本明細書に記載された方法およびシステムは、時間および空間の両方にわたって情報を伝搬するニューラル・ネットワークを提供する。単に時間にわたって情報の流れをゲーティングすることと比較して、時間および空間の両方にわたってゲーティングすることは、回帰型ユニットが、内部状態および画像内の周囲のピクセルの値に基づいて、ピクセルの内部状態に関する決定を行えるようにする。また、一部の実施形態では、ニューラル・ネットワークは、時間および空間の両方にわたって画像解像度間で情報を伝搬することができる。 Therefore, the methods and systems described herein provide a neural network that propagates information both in time and space. Gating over time and space, as opposed to simply gating the flow of information over time, allows the regression unit to be based on the internal state and the values of the surrounding pixels in the image. Allows you to make decisions about the internal state. Also, in some embodiments, the neural network can propagate information between image resolutions both in time and space.

下で詳細に説明されているように、本明細書に記載された実施形態は、機械学習を使用してアルゴリズムを学習する。具体的には、ネットワークは、内部状態に関連付けられた値が収束するまで更新される。これに対して、シングル・パス・ネットワークは関数を学習する。前述したように、本明細書における実施形態は、ピクセル間に広がる情報の量を制御するゲート付き時空間ユニットを提供する。下で詳細に説明されているように、一部の実施形態では、画像がシステムに入力され、システムが、複数の層を含んでいる画像ピラミッドを作成する。画像ピラミッドの各層は、入力画像を表す異なる数の変数を含む。ピラミッドの基部は、画像を表す多数の値を含む（言い換えると、基層は、高解像度で画像を表す）。ピラミッドの連続する各レベルで、画像を表すために、次第により少ない値が使用される（言い換えると、連続する各層は、直前の層より低い解像度で画像を表す）。画像ピラミッドは、システムが画像ピラミッドを利用しなかった場合より少ない反復回数で、画像の一部からの情報をより低い解像度に伝搬し、次に、画像の異なる部分においてより高い解像度に逆に情報を伝搬することを可能にする。これは、例えば、数千のピクセルを含む画像がシステムに入力される場合に有益である。そのような入力は、システムが予測を生成する前に、数千回の反復の実行を必要とすることがある。システムは、前の時間ステップからのシステムの内部状態および画像ピラミッド内の画像の表現を使用して、畳み込みを実行する。ネットワーク内のノードの現在の内部状態に含める値を決定するために、ゲート付き時空間ユニットによって畳み込み層の結果が使用される。ネットワーク内のノードの内部状態が収束するまで、ゲート付き時空間ユニット上で反復が実行される。システム内のノードの内部状態が収束したときに、各ピクセルが対象の物体に属する確率が計算される。具体的には、本明細書に記載された実施形態は、医用画像内の不規則な構造をセグメント化するためのネットワークを提供し、このネットワークは、データが格子上をどのように流れるかに関してインテリジェントであり、均質性などのその他の要因を学習して、ピクセルを広げる方法を決定する。しかし、これらの実施形態は、例えば気象予測、石油およびガスのモデル化などを含む、医用画像のセグメント化以外の領域において適用可能であってよい。 As described in detail below, the embodiments described herein use machine learning to learn algorithms. Specifically, the network is updated until the values associated with the internal state converge. Single-path networks, on the other hand, learn functions. As mentioned above, embodiments herein provide a gated spatiotemporal unit that controls the amount of information spread between pixels. As described in detail below, in some embodiments, an image is input to the system, which creates an image pyramid containing multiple layers. Each layer of the image pyramid contains a different number of variables that represent the input image. The base of the pyramid contains a number of values that represent the image (in other words, the base layer represents the image in high resolution). At each successive level of the pyramid, lesser values are used to represent the image (in other words, each successive layer represents the image at a lower resolution than the previous layer). The image pyramid propagates information from one part of the image to a lower resolution and then reverses to a higher resolution in different parts of the image with fewer iterations than if the system did not utilize the image pyramid. Allows to propagate. This is useful, for example, when an image containing thousands of pixels is input to the system. Such inputs may require thousands of iterations before the system can generate predictions. The system uses the internal state of the system from the previous time step and the representation of the image in the image pyramid to perform the convolution. The result of the convolution layer is used by the gated spatiotemporal unit to determine the values to include in the current internal state of the nodes in the network. Iterations are performed on the gated spatiotemporal unit until the internal state of the nodes in the network converges. When the internal states of the nodes in the system converge, the probability that each pixel belongs to the object of interest is calculated. Specifically, the embodiments described herein provide a network for segmenting irregular structures in medical images, which network relates to how data flows over a grid. Be intelligent and learn other factors such as homogeneity to determine how to spread pixels. However, these embodiments may be applicable in areas other than medical image segmentation, including, for example, weather forecasting, oil and gas modeling, and the like.

例えば、一実施形態は、医用画像内の対象の物体を識別するための方法を提供する。この方法は、空間格子のノードの内部状態を初期化することを含む。空間格子内の各ノードは、医用画像のピクセルに対応し、医用画像の隣のピクセルを表す少なくとも１つのノードに接続される。この方法は、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬（spatially gated propagation）を使用して空間格子内のノードの内部状態を反復的に更新することも含む。各反復で、各ノードが、前の反復からのノードの値、前の反復からの隣のノードの値、およびノードの新しい値から成る群から選択された少なくとも１つに基づいて、その内部状態を更新する。この方法は、空間格子の収束でのノードの値に基づいて、医用画像内の対象の物体を識別することをさらに含む。 For example, one embodiment provides a method for identifying an object of interest in a medical image. This method involves initializing the internal state of the nodes in the spatial grid. Each node in the spatial grid corresponds to a pixel in the medical image and is connected to at least one node that represents the pixel next to the medical image. This method also involves using neural networks to iteratively update the internal state of nodes in a spatial lattice using spatially gated propagation. At each iteration, each node has its internal state based on at least one selected from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, and the new value of the node. To update. This method further comprises identifying the object of interest in the medical image based on the values of the nodes in the convergence of the spatial grid.

別の実施形態も、医用画像内の対象の物体を識別するための方法を提供する。しかし、この実施形態によって提供される方法は、医用画像の画像ピラミッドを作成することを含む。作成された画像ピラミッドは複数の層を含んでおり、各層は複数の値を含んでおり、各値は、医用画像内の１つまたは複数のピクセルのブロックを表す。画像ピラミッド内の連続する各層は、直前の層より少ない値を含む。この方法は、画像ピラミッドの層ごとに、空間格子のノードの内部状態を初期化することも含む。空間格子内の各ノードは、医用画像内の１つまたは複数のピクセルのブロックを表しており、医用画像内の１つまたは複数のピクセルの隣のブロックを表す少なくとも１つのノードに接続される。この方法は、画像ピラミッドの層ごとに、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬を使用して空間格子内のノードの内部状態を反復的に更新することも含む。各反復で、各ノードが、前の反復からのノードの値、前の反復からの隣のノードの値、およびノードの新しい値から成る群から選択された少なくとも１つに基づいて、その内部状態を更新する。この方法は、画像ピラミッドの第１の層に含まれる値を表すノードを含んでいる空間格子の収束でのノードの値に基づいて、医用画像内の対象の物体を識別することをさらに含む。 Another embodiment also provides a method for identifying an object of interest in a medical image. However, the method provided by this embodiment involves creating an image pyramid of medical images. The image pyramid created contains multiple layers, each layer containing multiple values, each value representing a block of one or more pixels in a medical image. Each contiguous layer in the image pyramid contains fewer values than the previous layer. This method also includes initializing the internal state of the nodes of the spatial grid for each layer of the image pyramid. Each node in the spatial grid represents a block of one or more pixels in the medical image and is connected to at least one node that represents the block next to the one or more pixels in the medical image. This method also includes using neural networks for each layer of the image pyramid to iteratively update the internal state of the nodes in the spatial lattice using spatially gated propagation. At each iteration, each node has its internal state based on at least one selected from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, and the new value of the node. To update. The method further comprises identifying the object of interest in the medical image based on the values of the nodes in the convergence of the spatial grid containing the nodes representing the values contained in the first layer of the image pyramid.

一実施形態は、画像内の対象の領域を決定するためのシステムを提供する。このシステムは、メモリおよび電子プロセッサを含む。システムに含まれる電子プロセッサは、メモリに接続され、空間格子のノードの内部状態を初期化するように構成される。空間格子の各ノードは、画像のピクセルに対応し、画像の隣のピクセルを表す少なくとも１つのノードに接続される。また、電子プロセッサは、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬を使用して空間格子内の各ノードの内部状態を反復的に更新し、空間格子の収束でのノードの内部状態に基づいて画像内の対象の領域を識別するように構成される。 One embodiment provides a system for determining a region of interest in an image. This system includes a memory and an electronic processor. The electronic processor included in the system is connected to memory and is configured to initialize the internal state of the nodes in the spatial grid. Each node in the spatial grid corresponds to a pixel in the image and is connected to at least one node that represents the pixel next to the image. The electronic processor also uses neural networks to iteratively update the internal state of each node in the spatial grid using spatially gated propagation, and the interior of the node in the convergence of the spatial grid. It is configured to identify the area of interest in the image based on the state.

別の実施形態も、画像内の対象の領域を決定するためのシステムを提供する。前述した実施形態のシステムと同様に、この実施形態において説明されるシステムも、メモリおよびメモリに接続された電子プロセッサを含む。しかし、この実施形態によって提供されるシステムの電子プロセッサは、画像の画像ピラミッドを作成するように構成される。画像ピラミッドは、複数の層を含む。画像ピラミッドの層ごとに、電子プロセッサは、空間格子のノードの内部状態を初期化し、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬を使用して空間格子内の各ノードの内部状態を反復的に更新するように構成される。空間格子内の各ノードは、画像内の１つまたは複数のピクセルのブロックを表しており、画像内の１つまたは複数のピクセルの隣のブロックを表す少なくとも１つのノードに接続される。また、電子プロセッサは、画像ピラミッドの第１の層に含まれる値を表すノードを含んでいる空間格子の収束でのノードの内部状態に基づいて、画像内の対象の領域を識別するように構成される。 Another embodiment also provides a system for determining a region of interest in an image. Like the systems of the embodiments described above, the systems described in this embodiment also include memory and an electronic processor connected to the memory. However, the electronic processor of the system provided by this embodiment is configured to create an image pyramid of images. The image pyramid contains multiple layers. For each layer of the image pyramid, the electronic processor initializes the internal state of the nodes in the spatial grid, uses a neural network, and uses spatially gated propagation to create the internal state of each node in the spatial grid. Is configured to be updated iteratively. Each node in the spatial grid represents a block of one or more pixels in the image and is connected to at least one node that represents the next block of one or more pixels in the image. The electronic processor is also configured to identify a region of interest in the image based on the internal state of the node at the convergence of the spatial grid containing the node representing the values contained in the first layer of the image pyramid. Will be done.

一実施形態は、一連の機能を実行するために電子プロセッサによって実行できる命令を含んでいる非一時的なコンピュータ可読媒体を提供する。一連の機能は、空間格子のノードの内部状態を初期化することを含む。各ノードは、画像のピクセルを表しており、画像の少なくとも１つの隣のピクセルに接続される。一連の機能は、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬を使用して空間格子内のノードの内部状態を反復的に更新することも含む。各反復で、各ノードが、前の反復からのノードの値、前の反復からの隣のノードの値、またはノードの新しい値から成る群から選択された少なくとも１つに基づいて、その内部状態を更新する。一連の機能は、空間格子の収束でのノードの値に基づいて、画像内の対象の物体を識別することをさらに含む。 One embodiment provides a non-transitory computer-readable medium containing instructions that can be executed by an electronic processor to perform a set of functions. The set of functions involves initializing the internal state of the nodes in the spatial grid. Each node represents a pixel in the image and is connected to at least one adjacent pixel in the image. A set of functions also includes using neural networks to iteratively update the internal state of nodes in a spatial lattice using spatially gated propagation. At each iteration, each node has its internal state based on at least one selected from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, or the new value of the node. To update. The set of functions further includes identifying objects of interest in the image based on the values of the nodes in the convergence of the spatial grid.

別の実施形態も、一連の機能を実行するために電子プロセッサによって実行できる命令を含んでいる非一時的なコンピュータ可読媒体を提供する。しかし、前述した実施形態における一連の機能とは異なり、この実施形態の電子プロセッサによって実行される一連の機能は、画像の画像ピラミッドを作成することを含む。作成された画像ピラミッドは複数の層を含んでおり、各層は複数の値を含んでおり、各値は、画像内の１つまたは複数のピクセルのブロックを表す。画像ピラミッド内の連続する各層は、直前の層より少ない値を含む。一連の機能は、画像ピラミッドの層ごとに、空間格子のノードの内部状態を初期化することも含む。画像ピラミッドの各ノードは、画像内の１つまたは複数のピクセルのブロックを表しており、画像内の１つまたは複数のピクセルの隣のブロックを表す少なくとも１つのノードに接続される。一連の機能は、画像ピラミッドの層ごとに、ニューラル・ネットワークを使用し、空間的にゲーティングされる伝搬を使用して空間格子内のノードの内部状態を反復的に更新することも含む。各反復で、各ノードが、前の反復からのノードの値、前の反復からの隣のノードの値、またはノードの新しい値から成る群から選択された少なくとも１つに基づいて、その内部状態を更新する。一連の機能は、画像ピラミッドの第１の層に含まれる値を表すノードを含んでいる空間格子の収束でのノードの値に基づいて、画像内の対象の物体を識別することをさらに含む。 Another embodiment also provides a non-transitory computer-readable medium containing instructions that can be executed by an electronic processor to perform a set of functions. However, unlike the set of functions in the embodiments described above, the set of functions performed by the electronic processor of this embodiment involves creating an image pyramid of images. The image pyramid created contains multiple layers, each layer containing multiple values, each value representing a block of one or more pixels in the image. Each contiguous layer in the image pyramid contains fewer values than the previous layer. The set of functions also includes initializing the internal state of the nodes of the spatial grid for each layer of the image pyramid. Each node in the image pyramid represents a block of one or more pixels in the image and is connected to at least one node that represents the next block of one or more pixels in the image. The set of functions also includes using neural networks for each layer of the image pyramid to iteratively update the internal state of the nodes in the spatial lattice using spatially gated propagation. At each iteration, each node has its internal state based on at least one selected from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, or the new value of the node. To update. The set of functions further includes identifying objects of interest in the image based on the values of the nodes in the convergence of the spatial grid containing the nodes representing the values contained in the first layer of the image pyramid.

対象の物体を識別するために領域拡張が適用された医用画像を示す図である。It is a figure which shows the medical image which the area expansion was applied to identify the object of interest. 画像内の対象の領域を決定するためのシステムを示す図である。It is a figure which shows the system for determining the area of interest in an image. 図２のシステムに含まれているニューラル・ネットワークを示す図である。It is a figure which shows the neural network included in the system of FIG. ゲート付き時空間ユニット内のノードへの入力の例を示す図である。It is a figure which shows the example of the input to the node in the space-time unit with a gate. 図４のニューラル・ネットワークが入力として受信する医用画像の例を示す図である。It is a figure which shows the example of the medical image which the neural network of FIG. 4 receives as an input. 図４のニューラル・ネットワークが図５の医用画像内で検出する対象の領域の例を示す図である。It is a figure which shows the example of the area of interest which the neural network of FIG. 4 detects in the medical image of FIG.

以下の説明および添付の図面では、１つまたは複数の実施形態が説明され、図示される。これらの実施形態は、本明細書において提供される特定の詳細に限定されず、さまざまな方法で変更されてよい。さらに、本明細書に記載されていない他の実施形態が存在してよい。また、１つのコンポーネントによって実行されるとして本明細書において説明された機能が、複数のコンポーネントによって分散方式で実行されてよい。同様に、複数のコンポーネントによって実行される機能が、単一のコンポーネントによって統合されて実行されてよい。同様に、特定の機能を実行するとして説明されたコンポーネントが、本明細書に記載されていない追加の機能を実行してもよい。例えば、特定の方法で「構成された」デバイスまたは構造は、少なくともその方法で構成されるが、示されていない方法で構成されてもよい。さらに、本明細書に記載された一部の実施形態は、非一時的なコンピュータ可読媒体に格納された命令を実行することによって、説明された機能を実行するように構成された１つまたは複数の電子プロセッサを含んでよい。同様に、本明細書に記載された実施形態は、説明された機能を実行するために１つまたは複数の電子プロセッサによって実行できる命令を格納する非一時的なコンピュータ可読媒体として実装されてよい。本出願において使用されるとき、「非一時的なコンピュータ可読媒体」は、すべてのコンピュータ可読媒体を含むが、一過性の伝搬信号では構成されない。したがって、非一時的なコンピュータ可読媒体は、例えば、ハード・ディスク、ＣＤ−ＲＯＭ、光ストレージ・デバイス、磁気ストレージ・デバイス、ＲＯＭ（Read Only Memory：読み取り専用メモリ）、ＲＡＭ（Random Access Memory：ランダム・アクセス・メモリ）、レジスタ・メモリ、プロセッサのキャッシュ、またはこれらの任意の組み合わせを含んでよい。 In the following description and accompanying drawings, one or more embodiments will be described and illustrated. These embodiments are not limited to the particular details provided herein and may be modified in various ways. In addition, there may be other embodiments not described herein. Also, the functions described herein as being performed by one component may be performed by a plurality of components in a distributed manner. Similarly, functions performed by multiple components may be integrated and performed by a single component. Similarly, a component described as performing a particular function may perform additional functions not described herein. For example, a device or structure "configured" in a particular way may be constructed in at least that way, but in a manner not shown. In addition, some embodiments described herein are configured to perform the functions described by executing instructions stored on a non-transitory computer-readable medium. Electronic processor may be included. Similarly, the embodiments described herein may be implemented as a non-transitory computer-readable medium containing instructions that can be executed by one or more electronic processors to perform the functions described. As used in this application, "non-transitory computer-readable medium" includes all computer-readable media, but does not consist of transient propagating signals. Therefore, non-temporary computer-readable media include, for example, hard disks, CD-ROMs, optical storage devices, magnetic storage devices, ROMs (Read Only Memory), RAMs (Random Access Memory). It may include access memory), register memory, processor cache, or any combination thereof.

加えて、本明細書において使用される表現および用語は、説明を目的としており、制限と見なされるべきではない。例えば、本明細書における「含んでいる」、「包含している」、「備えている」、「有している」、およびこれらの変形の使用は、その後に示された項目およびそれらと同等のものに加えて、追加の項目を包含するよう意図されている。「接続された」および「結合された」という用語は、広範囲に使用され、直接的および間接的に接続することおよび結合することの両方を包含する。さらに、「接続された」および「結合された」は、物理的または機械的接続または結合に制限されず、直接的または間接的のいずれであろうと、電気的接続または電気的結合を含むことができる。加えて、有線接続、無線接続、またはこれらの組み合わせを使用して、電子通信および電子通知が実行されてよく、さまざまな種類のネットワーク、通信チャネル、および接続を経由して、直接的に、または１つまたは複数の仲介デバイスを介して送信されてよい。さらに、本明細書では、第１および第２、上部および下部などの関係を示す用語が、そのような実体または動作の間のそのような実際の関係または順序を必ずしも必要としないか、または意味せずに、ある実体または動作を別の実体または動作と区別するために、単独で使用されることがある。 In addition, the expressions and terms used herein are for illustration purposes only and should not be considered restrictions. For example, the use of "includes", "includes", "provides", "has", and variations thereof herein are the items and equivalents set forth thereafter. It is intended to include additional items in addition to those of. The terms "connected" and "bonded" are widely used and include both direct and indirect connection and connection. Furthermore, "connected" and "bonded" are not limited to physical or mechanical connections or connections and may include electrical connections or electrical connections, whether direct or indirect. can. In addition, wire and wireless connections, or combinations thereof, may be used to perform electronic communications and electronic notifications, either directly or via various types of networks, communication channels, and connections. It may be transmitted via one or more intermediary devices. Moreover, as used herein, terms indicating relationships such as first and second, top and bottom do not necessarily require or mean such an actual relationship or order between such entities or actions. Instead, it may be used alone to distinguish one entity or action from another.

前述したように、生物医学画像のセグメント化は、対象の物体を表す画像内のピクセルを識別しようとし、物体に対するさまざまな計算およびデータ処理（例えば、体積計算など）を実行できるようにする。しかし、画像のセグメント化を実行するための多くの手法は、一貫性のある形状および状況を識別することに頼る。例えば、前述したように、ＣＮＮは、ＣＮＮが認識するようにトレーニングされている画像内の形状および物体を認識することにおいて優れているが、ＣＮＮは、画像内の不規則な形状を認識するのが不得意である。したがって、一貫性のある形状および状況を識別することに頼る手法は、腫瘤、病変などの不規則な物体の識別においては、効果的でないことがある。 As mentioned above, segmentation of biomedical images attempts to identify pixels in an image that represent an object of interest, allowing various calculations and data processing (eg, volumetric calculations) to be performed on the object. However, many techniques for performing image segmentation rely on identifying consistent shapes and situations. For example, as mentioned above, CNNs are good at recognizing shapes and objects in images that are trained to be recognized by CNNs, whereas CNNs recognize irregular shapes in images. Is not good at. Therefore, techniques that rely on identifying consistent shapes and situations may not be effective in identifying irregular objects such as tumors and lesions.

他の手法は、画像内の対象の物体の境界を決定するために、ピクセルが広がることに頼る。前述したように、領域拡張は、規則性に頼らず、境界が識別されるまで、シード・ピクセルを隣接するピクセルに広げる。したがって、対象の物体の形状は、領域拡張の性能に影響を与えない。しかし、図１に示されているように、物体が明確に定義された境界を有していない場合（小さい接続によってであっても、物体が隣接する明るい組織に接続されている場合など）、領域拡張は、真の境界の外側に、物体を不適切に拡張することがある。 Other techniques rely on pixel expansion to determine the boundaries of the object of interest in the image. As mentioned above, region expansion does not rely on regularity and extends seed pixels to adjacent pixels until a boundary is identified. Therefore, the shape of the object of interest does not affect the performance of region expansion. However, as shown in FIG. 1, if the object does not have well-defined boundaries (for example, if the object is connected to adjacent bright tissue, even by a small connection). Region expansion can improperly extend an object outside its true boundaries.

前述した手法の欠陥を解決するために、本明細書に記載された実施形態は、ＣＮＮおよびＲＮＮの利点を時空間ユニットにおいて組み合わせ、画像内の不規則な物体の識別を改善する。具体的には、下で詳細に説明されているように、本明細書に記載された実施形態は、空間的にゲーティングされる伝搬を採用する。ゲーティングは、（前の状態および新たに受信された情報に基づいて）システムの新しい状態を生成するネットワークの１つの部分と、この新しい状態をゲーティングし、新しい状態が使用されて時間において前方に伝搬されるかどうかを決定する、ネットワークの別の部分とを含む。本明細書において説明されているように、ピクセルおよびピクセルの最も近い隣の直前の内部状態が、ゲーティングされ、現在の時間ステップでのピクセルの内部状態を決定するために使用される。したがって、本明細書に記載されたシステムおよび方法は、空間および時間の両方にわたって値を伝搬する。さらに、前述した画像ピラミッドの作成によって、異なる画像解像度にわたる値の伝搬を可能にする。 To solve the deficiencies of the methods described above, the embodiments described herein combine the advantages of CNNs and RNNs in a spatiotemporal unit to improve the identification of irregular objects in an image. Specifically, as described in detail below, the embodiments described herein employ spatially gated propagation. Gating is a part of the network that creates a new state of the system (based on the previous state and newly received information) and gating this new state, and the new state is used forward in time. Includes another part of the network that determines if it is propagated to. As described herein, the pixel and the nearest immediately preceding internal state of the pixel are gated and used to determine the internal state of the pixel at the current time step. Therefore, the systems and methods described herein propagate values over both space and time. In addition, the creation of the image pyramid described above allows the propagation of values across different image resolutions.

図２は、ニューラル・ネットワークを実装するためのシステム２００を示している。ニューラル・ネットワークは、受信された入力に関する出力を予測するために非線形ユニットの１つまたは複数の層を採用する機械学習モデルである。一部のニューラル・ネットワークは、入力層および出力層に加えて、１つまたは複数の隠れ層を含む。各隠れ層の出力は、ネットワーク内の次の層（次の隠れ層または出力層）への入力として使用される。ネットワークの各層は、パラメータの各セットの現在の値に従って、受信された入力から出力を生成する。 FIG. 2 shows a system 200 for implementing a neural network. A neural network is a machine learning model that employs one or more layers of nonlinear units to predict the output for a received input. Some neural networks include one or more hidden layers in addition to the input and output layers. The output of each hidden layer is used as an input to the next layer (next hidden or output layer) in the network. Each layer of the network produces an output from the received input according to the current value of each set of parameters.

図２に示されているように、システム２００は、電子プロセッサ２０４およびメモリ２０６を含んでいるコンピューティング・デバイス２０２を含んでいる。電子プロセッサ２０４およびメモリ２０６は、無線によって、有線通信チャネルもしくはバスを経由して、またはこれらの組み合わせによって、通信する。コンピューティング・デバイス２０２は、さまざまな構成において、図２に示されているコンポーネント以外の追加のコンポーネントを含んでよい。例えば、一部の実施形態では、コンピューティング・デバイス２０２は、複数の電子プロセッサ、複数のメモリ・モジュール、またはこれらの組み合わせを含む。また、一部の実施形態では、コンピューティング・デバイス２０２は、コンピューティング・デバイス２０２がネットワーク、周辺機器などと通信できるようにする、１つまたは複数の入出力インターフェイスを含む。 As shown in FIG. 2, the system 200 includes a computing device 202 that includes an electronic processor 204 and a memory 206. The electronic processor 204 and the memory 206 communicate wirelessly, via a wired communication channel or bus, or by a combination thereof. The computing device 202 may include additional components in various configurations other than those shown in FIG. For example, in some embodiments, the computing device 202 includes a plurality of electronic processors, a plurality of memory modules, or a combination thereof. Also, in some embodiments, the computing device 202 includes one or more input / output interfaces that allow the computing device 202 to communicate with a network, peripherals, and the like.

コンピューティング・デバイス２０２によって実行されるとして本明細書において説明されている機能が、さまざまな地理的位置にある複数のコンピューティング・デバイスによって分散的に実行されてよいということが、理解されるべきである。例えば、コンピューティング・デバイス２０２によって実行されるとして本明細書において説明されている機能は、クラウド・コンピューティング環境に含まれている複数のコンピューティング・デバイス２０２によって実行されてよい。電子プロセッサ２０４は、マイクロプロセッサ、特定用途向け集積回路（ＡＳＩＣ：application-specific integrated circuit）などであってよい。電子プロセッサ２０４は、通常、本明細書に記載された機能を含む一連の機能を実行するためのソフトウェア命令を実行するように構成される。メモリ２０６は、非一時的なコンピュータ可読媒体を含み、電子プロセッサ２０４によって実行できる命令を含むデータを格納する。例えば、図２に示されているように、メモリ２０６は、電子プロセッサ２０４によって実行されるコンピュータ・プログラムを含んでいるニューラル・ネットワーク２０８を格納する。 It should be understood that the functions described herein as being performed by computing device 202 may be performed in a distributed manner by multiple computing devices at different geographic locations. Is. For example, a function described herein as being performed by a computing device 202 may be performed by a plurality of computing devices 202 included in a cloud computing environment. The electronic processor 204 may be a microprocessor, an application-specific integrated circuit (ASIC), or the like. The electronic processor 204 is typically configured to execute software instructions for performing a set of functions, including the functions described herein. The memory 206 includes a non-transitory computer-readable medium and stores data including instructions that can be executed by the electronic processor 204. For example, as shown in FIG. 2, memory 206 stores a neural network 208 containing a computer program executed by electronic processor 204.

図３は、電子プロセッサ２０４が本明細書に記載された方法を実行するために実行する、ニューラル・ネットワーク２０８の例の視覚的表現を示している。図３に示されているように、電子プロセッサ２０４によって実行されたときに、ニューラル・ネットワーク２０８は、入力を受信して出力３０５を生成する機械学習システムを提供する。１つの例として、入力は、生物医学画像などの画像（入力画像３００）、または別の種類の多次元データを含み、出力３０５は、同様に、画像または別の種類の多次元データを含む。 FIG. 3 shows a visual representation of an example of a neural network 208 that the electronic processor 204 performs to perform the methods described herein. As shown in FIG. 3, the neural network 208 provides a machine learning system that receives inputs and produces outputs 305 when executed by electronic processor 204. As one example, the input includes an image such as a biomedical image (input image 300), or another type of multidimensional data, and the output 305 also contains an image or another type of multidimensional data.

図３に示されているように、入力画像３００が、ニューラル・ネットワーク２０８の第１の層３１０に入力される。単一の層として第１の層３１０が示されているが、これは単に例示を目的としており、第１の層３１０が任意の数の層を含んでよいということが、理解されるべきである。第１の層３１０では、ニューラル・ネットワーク２０８が、各ピクセルの明るさを表す値に対して、複数の畳み込みを実行してよい。他の実施形態では、下で説明されているように、ニューラル・ネットワーク２０８が、第１の層３１０内で複数の畳み込みを実行し、入力画像３００（Ｉ_０）から画像ピラミッド３１５を作成してよい。 As shown in FIG. 3, the input image 300 is input to the first layer 310 of the neural network 208. Although the first layer 310 is shown as a single layer, it is for illustration purposes only and it should be understood that the first layer 310 may include any number of layers. be. In the first layer 310, the neural network 208 may perform a plurality of convolutions on a value representing the brightness of each pixel. In another embodiment, the neural network 208 performs a plurality of convolutions within the first layer 310 to create an image pyramid 315 from the _{input image 300 (I 0), as described below.} good.

画像ピラミッド３１５は、入力画像３００から畳み込まれたテンソル（Ｉ_１−Ｉ_ｌ）のシーケンスである。ｌ＝１の場合に生成されたテンソルは、入力画像３００（Ｉ_０）と同じ空間的次元を有するが、その後の畳み込み／縮小ごとに、テンソルのサイズが半分になる。したがって、ｌの値ごとに、テンソルは異なる解像度を有し、テンソルＩ_１が最高の解像度を有しており、テンソルＩ_ｌが最低の解像度を有している。次の方程式は、第１の層３１０において実行される、画像ピラミッド３１５を作成するプロセスを示している。 The image pyramid 315 is a sequence of _{tensors (I 1-} I _l ) convoluted from the input image 300. The tensor generated when l = 1 has the _{same spatial dimension as the input image 300 (I 0} ), but the size of the tensor is halved with each subsequent convolution / reduction. Therefore, for each value of l, the tensor has a different resolution, the tensor I ₁ has the highest resolution, and the tensor _Il has the lowest resolution. The following equation shows the process of creating the image pyramid 315, which is performed in the first layer 310.

・
・
・

演算子＊は、畳み込み演算を表している。例えば、方程式Ａ＊Ｂは、入力ＢとカーネルＡの間の畳み込みを表す。 The operator * represents a convolution operation. For example, equation A * B represents a convolution between input B and kernel A.

Ｉ_０は、元の入力画像３００を表す変数である。Ｉ_０は次元Ｎ_０×Ｎ_０×１を有する。言い換えると、入力画像３００は、Ｎ_０個の行、Ｎ_０個の列、および（この実施形態例では、入力画像３００がグレースケール画像であるため）１つのチャネルを有する。 I ₀ is a variable representing the original input image 300. I ₀ has dimensions N ₀ × N ₀ × 1. In other words, the input image 300 has N ₀ rows, N ₀ columns, and one channel (because the input image 300 is a grayscale image in this embodiment).

Ｉ_ｌは、入力画像３００（Ｉ_０）に対して１つまたは複数の縮小が実行された後に生成された画像データ（テンソル）の中間形態を表す変数である。前述したように、ｌ＞１である場合、Ｉ_ｌは入力画像３００（Ｉ_０）より低い解像度を有する。Ｉ_ｌは次元Ｎ_ｌ×Ｎ_ｌ×Ｃを有し、Ｎ_ｌ＝２^{−（ｌ−１）}Ｎ_０であり、Ｃはチャネルの数である。 _Il is a variable representing an intermediate form of image data (tensor) generated after one or more reductions have been performed on the input image 300 (I _0). As described above, if a l> 1, _{I l} has a lower than the input image 300 _{(I 0)} resolution. _Il has dimensions N _l × N _l × C, N _l = 2- ^(l-1) N ₀ , and C is the number of channels.

は、入力画像データの次元を維持する畳み込み演算子（カーネル）を表す変数である。入力画像データは次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_Ｉを有し、一方、出力画像データは次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_０を有する。Ｋは、例えばＡｌｅｘＮｅｔ、ＤｅｎｓｅＮｅｔ、またはある範囲の他のアーキテクチャにおけるように配置された複数の連続的な畳み込み演算の組み合わせ、および畳み込み演算子の学習可能なパラメータを表してよい。

Is a variable that represents a convolution operator (kernel) that maintains the dimensions of the input image data. Input image data has a dimension _{_{_{N l × N l × C I}}} , while the output image data has a dimension _{_{_{N l × N l × C 0}}} . K may represent a combination of multiple consecutive convolution operations arranged, for example in AlexNet, DenseNet, or some other architecture in a range, and the learnable parameters of the convolution operator.

は、入力画像データの次元を半分に縮小する畳み込み演算子を表す変数である。入力画像データは次元Ｎ_ｌ−１×Ｎ_ｌ−１×Ｃ_Ｉを有し、一方、出力画像データは次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_０を有する。Ｋと同様に、Ｄは、例えばＡｌｅｘＮｅｔ、ＤｅｎｓｅＮｅｔ、またはある範囲の他のアーキテクチャにおけるように配置された複数の連続的な畳み込み演算、および畳み込み演算子の学習可能なパラメータを表してよい。しかし、畳み込み演算子Ｄは、入力画像データの次元を半分に縮小する最大プーリング層またはストライド畳み込み層も表す。

Is a variable representing a convolution operator that reduces the dimension of the input image data in half. Input image data has a dimension _{_{N l-1 × N l-}} 1 × C I, while the output image data has a dimension _{_{_{N l × N l × C 0}}} . Like K, D may represent a number of consecutive convolution operations arranged, such as in AlexNet, DenseNet, or some other architecture, and the learnable parameters of the convolution operator. However, the convolution operator D also represents a maximum pooling layer or stride convolution layer that reduces the dimension of the input image data in half.

画像ピラミッド３１５のレベルごとに計算されたテンソルが、第２の層３２０に供給される。第２の層３２０において実行される動作を示す方程式は、次のとおりである。 A tensor calculated for each level of the image pyramid 315 is supplied to the second layer 320. The equation indicating the operation performed in the second layer 320 is as follows.

やはり、演算子＊は畳み込み演算子を表し、Ｉ_ｌは、入力画像３００（Ｉ_０）に対して１つまたは複数の縮小が実行された後に生成された画像データ（テンソル）の中間形態を表す変数である。 Again, operator * represents a convolution operator, I _l represents an intermediate form of the input image 300 image data that is generated after one or more reduced is performed on (I ₀₎ (tensor) It is a variable.

［Ａ、Ｂ］は、テンソル（例えば、ＡおよびＢ）間の連結動作である。２つのテンソルに対して実行される連結動作は、テンソルの各々に含まれているチャネルを結合する。例えば、テンソルＡが次元Ｍ×Ｍ×Ｃ_１を有しており、テンソルＢが次元Ｍ×Ｍ×Ｃ_２を有している場合、［Ａ、Ｂ］の出力は次元Ｍ×Ｍ×（Ｃ_１＋Ｃ_２）を有する。 [A, B] is a connection operation between tensors (for example, A and B). The concatenation operation performed on the two tensors joins the channels contained in each of the tensors. For example, if the tensor A has the dimension M × M × C ₁ and the tensor B has the dimension M × M × C ₂ , the output of [A, B] is the dimension M × M × (C). _{It has 1} + C ₂ ).

は、解像度ｌおよび時間ステップｔでの空間格子内のノードごとに内部状態を保持するテンソル３２２である。前述したように、空間格子内の各ノードの内部状態が、各時間ステップで更新される。テンソル３２２は、次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_Ｈを有する。したがって、解像度ｌで画像の１つまたは複数のピクセルの各ブロックを表すＣ_Ｈ個の変数が存在する。

Is a tensor 322 that holds an internal state for each node in the spatial grid at resolution l and time step t. As mentioned above, the internal state of each node in the spatial grid is updated at each time step. Tensor 322 has a dimension _{_{_{N l × N l × C H}}} . Therefore, C _H number of variables representing each block of one or more pixels of the image at a resolution l is present.

は、入力画像データの次元を維持する畳み込み演算子を表す変数である。入力画像データは次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_Ｉを有し、一方、出力画像データは次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_０を有する。Ｋは、例えばＡｌｅｘＮｅｔ、ＤｅｎｓｅＮｅｔ、またはある範囲の他のアーキテクチャにおけるように配置された複数の連続的な畳み込み演算の組み合わせ、および畳み込み演算子の学習可能なパラメータを表してよい。

Is a variable that represents a convolution operator that maintains the dimensions of the input image data. Input image data has a dimension _{_{_{N l × N l × C I}}} , while the output image data has a dimension _{_{_{N l × N l × C 0}}} . K may represent a combination of multiple consecutive convolution operations arranged, for example in AlexNet, DenseNet, or some other architecture in a range, and the learnable parameters of the convolution operator.

は、方程式（４）の実行結果３２３を表す変数である。

は、次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_Ｘを有しており、ニューラル・ネットワーク２０８の第３の層３２５に入力される。

Is a variable representing the execution result 323 of the equation (4).

_Has dimensions N l × N _l × C _X and is input to the third layer 325 of the neural network 208.

要約すると、方程式（４）は、テンソル（Ｉ^ｌ）をテンソル３２２（

）に連結し（第１の連結を実行し）、畳み込み演算子

をこの連結に適用し（画像ピラミッドの現在の層に対して第１の畳み込みを実行し）、結果３２３をテンソル

に保存する。 In summary, equation (4) translates the tensor (I ^l ) into a tensor 322 (

) (Performs the first concatenation) and the convolution operator

Is applied to this concatenation (performing the first convolution on the current layer of the image pyramid) and the result 323 is a tensor.

Save to.

第３の層３２５において実行される動作を示す方程式は、次のとおりである。 The equation indicating the operation performed in the third layer 325 is as follows.

やはり、前述したように、演算子＊は畳み込み演算を表し、［Ａ、Ｂ］は、テンソル（例えば、テンソルＡおよびＢ）間の連結動作である。同様に、

は、入力画像データの次元を維持する畳み込み演算子（カーネル）を表す変数であり、

は、入力画像データの次元を半分に縮小する畳み込み演算子（カーネル）を表す変数であり、

は、テンソルＩ_ｌ、内部状態

、およびカーネル

から計算された方程式（４）の結果を表す変数である。 Again, as described above, the operator * represents a convolution operation, and [A, B] is a concatenation operation between tensors (eg, tensors A and B). Similarly

Is a variable that represents the convolution operator (kernel) that maintains the dimensions of the input image data.

Is a variable that represents the convolution operator (kernel) that reduces the dimension of the input image data in half.

Is the tensor _Il , internal state

, And kernel

It is a variable representing the result of the equation (4) calculated from.

は、テンソルＩ_ｌ＋１、内部状態

、およびカーネル

から計算された方程式（５）の結果を表す変数であり、

は、テンソルＩ_ｌ−１、内部状態

、およびカーネル

から計算された方程式（５）の結果を表す変数である。

Is the tensor _{Il + 1} , internal state

, And kernel

It is a variable that represents the result of equation (5) calculated from

Is the tensor _Il-1 , internal state

, And kernel

It is a variable representing the result of the equation (5) calculated from.

は、次元を２倍にすることによって入力画像データの次元をアップサンプリングする畳み込み演算子（カーネル）を表す変数である。例えば、入力画像データは次元Ｎ_ｌ＋１×Ｎ_ｌ＋１×Ｃ_Ｉを有し、出力画像データは次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_０を有する。畳み込み演算子Ｋと同様に、畳み込み演算子Ｕは、ＡｌｅｘＮｅｔ、ＤｅｎｓｅＮｅｔ、またはある範囲の他のアーキテクチャにおけるように配置された複数の連続的な畳み込み演算、および畳み込み演算の学習可能なパラメータの組み合わせを表してよい。しかし、畳み込み演算子Ｕは、画像の次元を２倍にするための転置された畳み込み層を表してもよい。

Is a variable representing a convolution operator (kernel) that upsamples the dimension of the input image data by doubling the dimension. For example, the input image data has a dimension _{_{N l + 1 × N l +}} 1 × C I, the output image data has a dimension _{_{_{N l × N l × C 0}}} . Like the convolution operator K, the convolution operator U combines multiple continuous convolution operations arranged as in AlexNet, DenseNet, or some other architecture, and a combination of learnable parameters for the convolution operation. May be represented. However, the convolution operator U may represent a transposed convolution layer for doubling the dimensions of the image.

は、方程式（５）の実行結果を含むテンソルである。

は、ゲート付き時空間ユニットに渡される情報を含む。要約すると、方程式（５）は、より高い解像度（Ｉ_ｌ−１）で入力画像３００を表すテンソル（画像ピラミッドの現在の層の真下にある画像ピラミッドの層）を使用する方程式（４）の計算結果（

）を縮小することと、より低い解像度（Ｉ_ｌ＋１）で入力画像３００を表すテンソル（画像ピラミッドの現在の層の真上にある画像ピラミッドの層）からの方程式（４）の計算結果（

）をアップサンプリングこととを含む。方程式（５）は、アップサンプリングの結果をダウンサンプリングの結果と連結し、解像度Ｉ_ｌで入力画像３００を表すテンソルからの方程式（４）の計算結果と連結する（第２の連結を実行する）ことも含む。この方程式は、連結の結果およびカーネル

の畳み込みを実行し（第２の畳み込みを実行し）、その結果を変数

に保存する。

Is a tensor containing the execution result of equation (5).

Contains information passed to the gated spatiotemporal unit. In summary, equation (5) is a calculation of equation (4) using a tensor (the layer of the image pyramid beneath the current layer of the image pyramid) representing the input image 300 at a _{higher resolution (I l-1).} result(

) And the calculation result of equation (4) from the tensor (the layer of the image pyramid directly above the current layer of the image pyramid) representing the input image 300 at a lower resolution ( _{Il + 1) (}

) Includes upsampling. Equation (5), the result of upsampling coupled with the results of downsampling, (executes the second connection) the calculation result to coupling equation (4) from tensor representing the input image 300 at a resolution I _l Including that. This equation is the result of the concatenation and the kernel

Performs a convolution of (performs a second convolution) and sets the result as a variable

Save to.

ニューラル・ネットワーク２０８の第１の反復では、第３の層３２５からの出力を使用して、ゲート付き時空間ユニット３３０の空間格子に含まれているノードごとに、内部状態を単に初期化する。各ノードは、ノードの内部状態を表す値のベクトル、およびそのノードの中心にある１つまたは複数のピクセルのブロックの明るさから画像ピラミッド内で導出された値を含む。連続する各反復で、前の反復からの各ノードの内部状態が、テンソル３２２（

）を介してニューラル・ネットワーク２０８の第２の層３２０に入力される。その後、第２の層３２０から開始して、前述したプロセスが繰り返される。 The first iteration of the neural network 208 uses the output from the third layer 325 to simply initialize the internal state for each node contained in the spatial grid of the gated spatiotemporal unit 330. Each node contains a vector of values that represent the internal state of the node, and values derived within the image pyramid from the brightness of the block of one or more pixels in the center of the node. In each successive iteration, the internal state of each node from the previous iteration is tensor 322 (

) Is input to the second layer 320 of the neural network 208. The process described above is then repeated, starting with the second layer 320.

前述したように、ニューラル・ネットワーク２０８は、複数のノードが空間格子内に配置されているゲート付き時空間ユニット３３０を含む。この格子内の各ノードは、入力画像３００内のピクセルに対応する。ゲート付き時空間ユニット３３０は、複数の時間ステップの各々でデータ処理を実行する。各時間ステップで、ゲート付き時空間ユニット３３０が複数の値を受信する。受信された値、および前の時間ステップでゲーティングされた各ノードの内部状態を表す値に基づいて、ゲート付き時空間ユニット３３０が、現在の時間ステップでの各ノードの内部状態の更新方法を決定する。下で詳細に説明されているように、ゲート付き時空間ユニット３３０は、格子内のノードごとに、前の時間ステップでのノードの内部状態を維持するか、ノードの内部状態を、前の時間ステップからの隣のノードの内部状態を表す値に設定するか、またはノードの新しい内部状態を生成するかを決定することによって、各ノードの内部状態を更新する方法を決定する。 As mentioned above, the neural network 208 includes a gated spatiotemporal unit 330 in which a plurality of nodes are arranged in a spatial grid. Each node in this grid corresponds to a pixel in the input image 300. The gated spatiotemporal unit 330 performs data processing in each of the plurality of time steps. At each time step, the gated spatiotemporal unit 330 receives multiple values. Based on the value received and the value representing the internal state of each node gated in the previous time step, the gated spatiotemporal unit 330 determines how to update the internal state of each node in the current time step. decide. As described in detail below, the gated spatiotemporal unit 330 maintains the internal state of the node in the previous time step, or changes the internal state of the node to the previous time, for each node in the grid. Determine how to update the internal state of each node by deciding whether to set it to a value that represents the internal state of the next node from the step or to generate a new internal state for the node.

以下の方程式は、現在の時間ステップでのゲート付き時空間ユニット３３０の格子に含まれているノードの内部状態

３２７を決定するために使用される計算の例であり、解釈を容易にするために、７行に分けられている（Ｉ〜ＶＩＩのラベルが付けられている）。 The following equation shows the internal state of the nodes contained in the grid of gated spatiotemporal unit 330 at the current time step.

An example of the calculation used to determine 327, which is divided into 7 lines (labeled I-VII) for ease of interpretation.

σ（Ａ）は、テンソルのすべての要素ａへのシグモイド関数１／（１＋ｅ^−ａ）の要素ごとの適用を表す。シグモイド関数は、「スカッシング」関数と呼ばれることがある。シグモイド関数は、＋∞〜−∞の任意の入力値を受け取り、その入力値を０〜１の出力値に押しつぶす。 σ (A) represents the application of the sigmoid function 1 / (1 + e ^{−a) to all the elements a of the tensor for each element.} The sigmoid function is sometimes referred to as the "squashing" function. The sigmoid function receives an arbitrary input value from + ∞ to −∞ and crushes the input value to an output value of 0 to 1.

ｔａｎｈも、スカッシング関数である。ｔａｎｈ関数は、＋∞〜−∞の任意の入力値を受け取るが、この関数は、その入力値を−１〜１の出力値に押しつぶす。 tanh is also a squashing function. The tanh function receives an arbitrary input value from + ∞ to −∞, but this function crushes the input value to an output value of -1 to 1.

演算子

は、アダマール積演算を表している。例えば、方程式Ａ

Ｂが与えられた場合、入力Ｂと入力Ａの間のアダマール積演算が実行される。アダマール積は、２つの同一サイズの入力からの要素の各対の、要素ごとの乗算である。 operator

Represents the Hadamard product operation. For example, equation A

Given B, the Hadamard product operation between input B and input A is performed. The Hadamard product is an element-by-element multiplication of each pair of elements from two identically sized inputs.

は、解像度ｌおよび時間ステップｔでの第３の層３２５で実行される計算の結果３２６を含んでいるテンソルである。

は次元Ｎ_ｌ×Ｎ_ｌ×７×Ｃ_Ｈを有する。第３の次元の７つの要素の各々は、時空間ゲーティング・プロセスにおいて特定の役割を有している。方程式内の変数の

〜

は、７つの要素のうちの１つが選択されたときに得られるテンソルを参照している。７つの要素の各々に関連付けられた各テンソルは、次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_Ｈを有する。

Is a tensor containing 326 results of calculations performed on the third layer 325 at resolution l and time step t.

Has a dimension _{_{N l × N l × 7 ×}} C H. Each of the seven elements of the third dimension has a specific role in the spatiotemporal gating process. Of the variables in the equation

~

Refers to the tensor obtained when one of the seven elements is selected. Each tensor associated with each of the seven elements have a dimension _{_{_{N l × N l × C H}}} .

は、解像度ｌおよび時間ステップｔでの空間格子内のノードごとに内部状態を保持するテンソル３２２である。前述したように、内部状態は、各時間ステップで動的に更新される。テンソルは、次元Ｎ_ｌ×Ｎ_ｌ×Ｃ_Ｈを有する。したがって、解像度ｌで各ノードを表すＣ_Ｈ個の変数が存在する。

Is a tensor 322 that holds an internal state for each node in the spatial grid at resolution l and time step t. As mentioned above, the internal state is dynamically updated at each time step. Tensor has dimension _{_{_{N l × N l × C H}}} . Therefore, C _H number of variables to represent each node in the resolution l is present.

Ｓ_{Δｘ，Δｙ}は、空間変位畳み込み演算子（spatial shifting convolution operator）である。この演算子は、学習可能なパラメータを含んでいない。Ｓ_{Δｘ，Δｙ}は、ノードの現在の内部状態を決定するときに、最も近い隣のノードの内部状態からの情報を考慮できるようにする。 S _{Δx and Δy} are spatial shifting convolution operators. This operator contains no learnable parameters. S _{Δx, Δy} allow information from the internal state of the nearest adjacent node to be taken into account when determining the current internal state of the node.

上の方程式に戻ると、テンソル

に格納される結果３２６が、７つの部分に分けられている。各部分は、前述したように、テンソル

の第３の次元内の要素を表している。方程式（６）の行（Ｉ）は、ｔａｎｈスカッシング関数を方程式（６）の行ＩＩ〜ＶＩＩの合計に適用し、現在の反復でのノードの内部状態を決定する。方程式（６）の行（ＩＩ）は、ゲーティングされた値に応じて、前のタイムスタンプからのノードの内部状態を現在のタイムスタンプにコピーする可能性に対応する。次の４つの行（ＩＩＩ〜ＶＩ）（

、

）は、前の反復からの最も近い隣のうちの１つの内部状態を現在の反復でのノードの内部状態にコピーする可能性にそれぞれ対応する。最後の行（ＶＩＩ）は、全く新しい値を生成し、場合によっては現在の反復でのノードの内部状態を新しい値に設定することに対応する。 Returning to the above equation, the tensor

The result 326 stored in is divided into seven parts. Each part is a tensor as described above

Represents an element in the third dimension of. Row (I) of equation (6) applies the tanh scavenging function to the sum of rows II-VII of equation (6) to determine the internal state of the node at the current iteration. Row (II) of equation (6) corresponds to the possibility of copying the internal state of the node from the previous timestamp to the current timestamp, depending on the gated value. The next four lines (III-VI) (

,

) Corresponds to the possibility of copying the internal state of one of the closest neighbors from the previous iteration to the internal state of the node in the current iteration. The last line (VII) corresponds to generating a whole new value and possibly setting the internal state of the node at the current iteration to the new value.

図４は、現在の反復で決定されている内部状態を有するノード４００と、前の反復で決定された内部状態を有するノードとの間の接続を示している。各ノードは、ゲート付き時空間ユニット３３０の直前の反復で計算された内部状態を有するノードに接続されている。具体的には、各ノードは、ゲート付き時空間ユニット３３０の直前の反復でのそれ自身の内部状態を表すノードに加え、ゲート付き時空間ユニット３３０の直前の反復での隣のノードの内部状態を表すノードに接続されている。図４では、グループ４０５内のノードは、ニューラル・ネットワーク２０８の直前の反復で決定された内部状態にそれぞれ関連付けられているノードである。グループ４１０内のノードは、ニューラル・ネットワーク２０８の現在の反復で決定された内部状態に関連付けられているノードである。前述したように、各ノードは、入力画像３００内のピクセル（または１つまたは複数のピクセルのブロック）に対応する。あるノードの隣の各ノードは、このノードによって表されるピクセル（または１つまたは複数のピクセルのブロック）に隣接するピクセル（または１つまたは複数のピクセルのブロック）を表す。例えば、ノード４００が座標（ｉ、ｊ）での画像内のピクセルを表している場合、（ｉ、ｊ）でのピクセルを表しているノード４００は、（（ｉ、ｊ）でのピクセルのすぐ左にある）座標（ｉ−１、ｊ）を表しているノード４１５、（（ｉ、ｊ）でのピクセルのすぐ右にある）座標（ｉ＋１、ｊ）でのピクセルを表しているノード４２０、（（ｉ、ｊ）でのピクセルのすぐ上にある）座標（ｉ、ｊ＋１）でのピクセルを表しているノード４２５、および（（ｉ、ｊ）でのピクセルのすぐ下にある）座標（ｉ、ｊ−１）でのピクセルを表しているノード４３０に接続される。ピクセル（ｉ、ｊ）を表しているノード４００に接続されているとして説明された上記のノードの各々は、ノード４００の隣のノードである。したがって、ゲート付き時空間ユニット３３０は、ノード４００の内部状態を、グループ４０５内のノードのうちの１つの内部状態に設定するかどうかを決定する。 FIG. 4 shows the connection between the node 400 having the internal state determined in the current iteration and the node having the internal state determined in the previous iteration. Each node is connected to a node that has the internal state calculated in the previous iteration of the gated spatiotemporal unit 330. Specifically, each node represents the internal state of its own in the previous iteration of the gated spatiotemporal unit 330, as well as the internal state of the adjacent node in the previous iteration of the gated spatiotemporal unit 330. Is connected to the node that represents. In FIG. 4, the nodes in group 405 are the nodes associated with the internal states determined in the previous iteration of the neural network 208, respectively. The nodes in group 410 are the nodes associated with the internal state determined by the current iteration of neural network 208. As mentioned above, each node corresponds to a pixel (or a block of one or more pixels) in the input image 300. Each node next to a node represents a pixel (or a block of one or more pixels) adjacent to the pixel (or a block of one or more pixels) represented by this node. For example, if node 400 represents a pixel in the image at coordinates (i, j), node 400 representing a pixel at (i, j) is immediately after the pixel at (i, j). Node 415 representing the (left) coordinate (i-1, j), node 420 representing the pixel at the coordinate (i + 1, j) (immediately to the right of the pixel at (i, j)), Node 425 representing the pixel at coordinate (i, j + 1) (just above the pixel at (i, j)), and coordinate (i just below the pixel at (i, j)) , J-1) is connected to the node 430 representing the pixel. Each of the above nodes described as being connected to a node 400 representing a pixel (i, j) is a node next to the node 400. Therefore, the gated spatiotemporal unit 330 determines whether to set the internal state of the node 400 to the internal state of one of the nodes in group 405.

ゲート付き時空間ユニット３３０の空間格子内のノードの内部状態が収束した（既定の量より少なく変化した）ときに、最高の解像度で入力画像３００を表すノードの内部状態が、ニューラル・ネットワーク２０８の最終的な層３３５に出力される。最終的な層３３５は、各ノードの内部状態に含まれている１つの値を使用して、そのノードが表しているピクセルが入力画像３００内の対象の物体に属している確率（例えば、０と１の間の値）を計算する。次の方程式は、各ピクセルが対象の物体の一部である確率を決定するために最終的な層３３５において実行される動作を表している。 When the internal state of the node in the spatial grid of the gated spatiotemporal unit 330 converges (changes less than the default amount), the internal state of the node representing the input image 300 at the highest resolution is the neural network 208. It is output to the final layer 335. The final layer 335 uses one value contained in the internal state of each node and the probability that the pixels represented by that node belong to the object of interest in the input image 300 (eg 0). And the value between 1) is calculated. The following equation represents the action performed in the final layer 335 to determine the probability that each pixel is part of the object of interest.

Ｙ^ｔは、時間ステップｔでのニューラル・ネットワーク２０８の出力３０５を表す変数であり、次元Ｎ_０×Ｎ_０×１（入力と同じ次元）を有する。 Y ^t is a variable representing the output 305 of the neural network 208 in the time step t and has dimensions N ₀ × N ₀ × 1 (same dimension as the input).

は、解像度１および時間ステップｔでの空間格子内のノードごとに内部状態を保持するテンソルである。

Is a tensor that holds an internal state for each node in the spatial grid at resolution 1 and time step t.

は、入力画像データの次元を維持する畳み込み演算子を表す変数である。入力画像データは次元Ｎ_０×Ｎ_０×Ｃ_Ｉを有し、一方、出力画像データは次元Ｎ_０×Ｎ_０×１を有する。Ｋは、例えばＡｌｅｘＮｅｔ、ＤｅｎｓｅＮｅｔ、またはある範囲の他のアーキテクチャにおけるように配置された複数の連続的な畳み込み演算の組み合わせ、および畳み込み演算子の学習可能なパラメータを表してよい。

Is a variable that represents a convolution operator that maintains the dimensions of the input image data. Input image data has a dimension _{_{_{N 0 × N 0 × C I}}} , while the output image data has a dimension _{_{N 0}} × _{N 0} × 1. K may represent a combination of multiple consecutive convolution operations arranged, for example in AlexNet, DenseNet, or some other architecture in a range, and the learnable parameters of the convolution operator.

要約すると、方程式（７）は、

を使用する最後の畳み込みを、最高の解像度の内部状態

に適用し、それによって、入力チャネルの数（Ｃ_Ｈ）を１つの出力チャネルに減らす。この方程式は、

を使用する最後の畳み込みを、最高の解像度の内部状態

に適用した結果に、シグモイド関数を適用し、それによって、Ｙ^ｔに含まれている各値を０〜１の範囲内の値に押しつぶす。Ｙ^ｔに含まれている０〜１の範囲内の各値は、画像のピクセルが入力画像３００内の対象の物体内にある確率に対応する。例えば、単一のピクセルに対してシグモイド関数によって生成された値が０．５である場合、このピクセルが対象の物体内にある５０％の確率が存在する。 In summary, equation (7)

Use the last convolution, the highest resolution internal state

Apply to, thereby reducing the number of input channels ( _CH ) to one output channel. This equation is

Use the last convolution, the highest resolution internal state

A sigmoid function is applied to the result applied to, thereby ^{crushing each value contained in Y t} to a value in the range of 0 to 1. Each value in the range 0 to 1 contained in Y ^t corresponds to the probability that the pixels of the image are within the target object in the input image 300. For example, if the value generated by the sigmoid function for a single pixel is 0.5, then there is a 50% probability that this pixel is within the object of interest.

ニューラル・ネットワーク２０８は、生成された出力３０５（各ノードに対して計算された確率）を出力データ・リポジトリ（例えば、メモリ２０６）に格納するか、または生成された出力３０５を、ディスプレイ・デバイス上でユーザに表示することなどによって、使用または消費するために提供してよい。いずれにせよ、電子プロセッサ２０４は、ピクセルが対象の物体に含まれているピクセルごとの確率を、既定のしきい値と比較する。ピクセルが対象の物体の一部である確率が既定のしきい値より高い場合、電子プロセッサ２０４は、このピクセルが対象の物体の一部であるということを決定する。 The neural network 208 stores the generated output 305 (probability calculated for each node) in an output data repository (eg, memory 206), or stores the generated output 305 on a display device. May be provided for use or consumption, such as by displaying to the user at. In any case, the electronic processor 204 compares the pixel-by-pixel probabilities that the pixels are contained in the object of interest with a default threshold. If the probability that a pixel is part of the object of interest is higher than a predetermined threshold, the electronic processor 204 determines that the pixel is part of the object of interest.

一部の実施形態では、前述したニューラル・ネットワーク２０８内で、画像ピラミッド３１５に含まれている入力画像３００の表現のレベル（Ｉ_１〜Ｉ_ｌ）ごとに方程式（４〜６）が実行されるということが、理解されるべきである。ニューラル・ネットワーク２０８は、時間、空間、および解像度にわたって値を伝搬するとして、上で説明されたが、ニューラル・ネットワーク２０８が、時間および空間のみにわたって値を伝搬するように変更されてよいということも、理解されるべきである。 In some embodiments, equations (4-6) are executed for each _{level of representation (I 1-} I _l ) of the input image 300 contained in the image pyramid 315 within the neural network 208 described above. That should be understood. The neural network 208 has been described above as propagating values over time, space, and resolution, but it is also possible that the neural network 208 may be modified to propagate values over time and space only. , Should be understood.

各反復で各ノードの内部状態を決定するために使用されるゲートの値は、０または１のいずれかである必要はなく、０〜１の範囲内の任意の値であってよいということも、理解されるべきである（上の方程式（６）を参照）。したがって、一部の実施形態では、ノードの更新された内部状態は、前述した選択肢（前の反復からのノードの値、前の反復からの１つまたは複数の隣のノードの値、およびノードの新しい値）のうちの２つ以上の混合（または、より数学的には、線形結合）であってよい。 The value of the gate used to determine the internal state of each node in each iteration need not be either 0 or 1, and may be any value in the range 0 to 1. , Should be understood (see equation (6) above). Therefore, in some embodiments, the updated internal state of a node is the value of the node from the previous iteration, the value of one or more adjacent nodes from the previous iteration, and the node's updated internal state. It may be a mixture (or, more mathematically, a linear combination) of two or more of the new values.

図５および図６は、ニューラル・ネットワーク２０８の実際の適用の例を示している。図５は、ニューラル・ネットワーク２０８が入力として受信できる医用画像５００の例を示している。画像５００内の対象の物体は、左肺５１０内の腫瘍５０５である。図６は、ニューラル・ネットワーク２０８が対象の物体（腫瘍５０５）として識別する医用画像５００の領域を示している。領域拡張手法が使用される場合（図１を参照）と異なり、対象の物体の境界が左肺５１０の外側に広がっていない。 5 and 6 show examples of actual applications of neural network 208. FIG. 5 shows an example of a medical image 500 that the neural network 208 can receive as an input. The object of interest in image 500 is tumor 505 in the left lung 510. FIG. 6 shows a region of the medical image 500 that the neural network 208 identifies as the object of interest (tumor 505). Unlike when the region expansion technique is used (see FIG. 1), the boundaries of the object of interest do not extend outside the left lung 510.

したがって、本明細書に記載された実施形態は、時空間ユニットを含んでいるニューラル・ネットワークを提供する。時空間ユニットは、空間的に拡張されたノードの格子である。例えば、各ノードは、画像内のピクセルに対応する。ニューラル・ネットワークは、ノードごとに初期内部状態を決定し、ノードごとに内部状態を反復的に更新し、時間または空間あるいはその両方にわたって値を伝搬することによって、およびノードごとに内部状態を表すための新しい値を計算することによって、何度も繰り返して新しい内部状態を生成する。したがって、文字または単語の１次元のシーケンスに対して反復する長短期記憶（ＬＳＴＭ）ネットワークおよびゲート付き回帰型ユニット（ＧＲＵ：gated recurrent unit）ネットワークなどの、他の種類のＲＮＮと比較して、本明細書に記載された実施形態は、各ノードの内部状態を更新するときに、隣のノードの決定を考慮する。具体的には、本明細書に記載された実施形態は、空間的次元および時間的次元の両方を適用する。したがって、時間次元は前方にのみ反復するが、空間的ゲーティングは、画像の１つの部分において新しい結論に達し、その結論が画像の他の部分に伝搬され、それらの部分での意思決定に情報を与えるというように、空間的情報が、必要な長さにわたって空間格子上を後方および前方に共鳴できるようにする。さらに、本明細書に記載された一部の実施形態では、ニューラル・ネットワーク２０８内の値が、画像の異なる解像度間で伝搬されてよい。 Therefore, the embodiments described herein provide a neural network that includes spatiotemporal units. A spatiotemporal unit is a grid of spatially extended nodes. For example, each node corresponds to a pixel in the image. A neural network determines the initial internal state on a node-by-node basis, updates the internal state iteratively on a node-by-node basis, propagates values over time and / or space, and represents the internal state on a node-by-node basis. Generate a new internal state over and over again by calculating a new value for. Therefore, compared to other types of RNNs, such as long short-term memory (LSTM) networks and gated recurrent unit (GRU) networks that iterate over one-dimensional sequences of letters or words. The embodiments described herein take into account the determination of neighboring nodes when updating the internal state of each node. Specifically, the embodiments described herein apply both spatial and temporal dimensions. Thus, although the time dimension repeats only forward, spatial gating reaches new conclusions in one part of the image, which conclusions are propagated to other parts of the image, informing decisions in those parts. Allows spatial information to resonate backwards and forwards on the spatial grid over the required length, such as giving. Moreover, in some embodiments described herein, values within the neural network 208 may be propagated between different resolutions of the image.

本明細書に記載された実施形態は、閉じている。具体的には、本明細書に記載されたニューラル・ネットワーク２０８には、外部の世界に関する情報のすべてが初期入力（処理を必要とする画像）として与えられ、その時点以降、ニューラル・ネットワーク２０８は、外部からさらに情報を受け取らずに、それ自身の内部状態およびルールのみに従って、時間と共に進化する。そのようにして、内部状態がそれ以上変化しなくなって収束するまで、反復が継続する。これによって、ニューラル・ネットワーク２０８を、関数ではなくアルゴリズムのようなものにする。これに対して、ＲＮＮには、各時間ステップで新しい１つの問題（例えば、１つの単語）が与えられるため、新しい情報が使用可能である場合にのみ、反復が継続する。 The embodiments described herein are closed. Specifically, the neural network 208 described herein is given all of the information about the outside world as an initial input (an image that needs to be processed), and from that point on, the neural network 208 Evolves over time, only according to its own internal state and rules, without receiving further information from the outside. In that way, the iteration continues until the internal state no longer changes and converges. This makes the neural network 208 more like an algorithm than a function. In contrast, the RNN is given one new question (eg, one word) at each time step, so the iteration continues only if new information is available.

以下の特許請求の範囲では、一部の実施形態のさまざまな特徴および利点が示される。 The following claims show various features and advantages of some embodiments.

Claims

A method for identifying an object of interest in a medical image,
Initializing the internal state of the nodes in the spatial grid, wherein each node is connected to at least one node that corresponds to a pixel in the medical image and represents a pixel next to the medical image. To do and
Using a neural network to iteratively update the internal state of the node in the spatial lattice using spatially gated propagation, at each iteration, each node prepends. Updates its internal state based on at least one selected from the group consisting of the value of the node from the iteration of, the value of the adjacent node from the previous iteration, and the new value of the node. To do and
A method comprising identifying the object of interest in the medical image based on the value of the node in the convergence of the spatial grid.

Claim 1 that iteratively updating the internal state of the node using a neural network comprises updating a value in a vector of values associated with the internal state of the node. The method described in.

The method of claim 2, wherein the value in the vector of values includes a value representing the brightness of the pixel corresponding to the node and a value representing the internal state of the node.

The method of claim 1, wherein the convolution including the internal state in front of the node is performed on an iterative basis.

The method of claim 1, wherein the method further comprises performing a convolution on each value representing the brightness of each pixel in the first iteration.

Identifying an object of interest in the medical image based on the value of the node in the convergence of the spatial grid is the value associated with each pixel using the final layer of the neural network. To calculate the probability that each pixel is contained in the object of interest, based on the values contained in the vector of
The method of claim 1, comprising determining, for each pixel, whether the calculated probability is higher than a predetermined threshold.

Each node uses a squashing function to select at least one from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, and the new value of the node. The method according to claim 1, wherein the internal state is updated based on the above.

The method of claim 1, wherein the adjacent node is a node selected from a group of nodes representing pixels directly above, below, to the right, and to the left of the pixel represented by the node.

The method of claim 1, wherein the method further comprises generating an image pyramid comprising a plurality of layers, wherein each contiguous layer represents the medical image containing fewer values.

9. The method of claim 9, wherein the method further comprises concatenating values from multiple layers of the image pyramid at each iteration.

A system for determining the area of interest in an image
Memory and
The electronic processor includes an electronic processor connected to the memory.
Initializing the internal state of the nodes of the spatial grid, wherein each node is connected to at least one node that corresponds to a pixel in the image and represents a pixel next to the image. When,
Using a neural network to iteratively update the internal state of each node in the spatial lattice using spatially gated propagation,
A system configured to identify a region of interest in an image based on the internal state of the node in the convergence of the spatial grid.

Based on at least one selected from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, or the new value of the node at each iteration. The system according to claim 11, wherein the internal state of the node is updated by updating the internal state.

The electronic processor is configured to iteratively update the internal state of the node by using a neural network to update the values in the vector of values associated with the internal state of the node. The system according to claim 11.

13. The system of claim 13, wherein the value in the vector of values includes a value representing the brightness of the pixel corresponding to the node and a value representing the internal state of the node.

11. The system of claim 11, wherein the electronic processor is further configured to perform a convolution that includes an internal state in front of the node at each iteration.

11. The system of claim 11, wherein the electronic processor is further configured to perform a convolution for each value representing the brightness of each pixel in the first iteration.

The electronic processor uses the final layer of the neural network to calculate the probability that each pixel is contained in the object of interest in the image, based on the vector associated with each pixel. That and
For each pixel, by determining whether the calculated probability is higher than a predetermined threshold, the object of interest in the image is based on the value of the node at the convergence of the spatial grid. The system according to claim 11, which is configured to identify.

The electronic processor is selected by using a squashing function from a group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, or the new value of the node. The system according to claim 12, which is configured to update the internal state based on one.

12. The system of claim 12, wherein the adjacent node is a node selected from a group of nodes representing pixels directly above, below, to the right, and to the left of the pixel represented by the node.

A non-transitory computer-readable medium that stores instructions that perform a set of functions when executed using an electronic processor.
Initializing the internal state of the nodes in the spatial grid, where each node represents a pixel in the image and is connected to at least one adjacent pixel in the image.
Using a neural network to iteratively update the internal state of the node in the spatial lattice using spatially gated propagation, at each iteration, each node prepends. Update its internal state based on at least one selected from the group consisting of the value of the node from the iteration of, the value of the adjacent node from the previous iteration, or the new value of the node. To do and
A non-transitory computer-readable medium comprising identifying an object of interest in the image based on the value of the node in the convergence of the spatial grid.

20. Claim 20 that iteratively updating the internal state of the node using a neural network comprises updating a value in a vector of values associated with the internal state of the node. Non-temporary computer-readable medium described in.

Identifying an object of interest in the image based on the value of the node at the convergence of the spatial lattice is associated with each pixel using the final layer in the neural network. To calculate the probability that each pixel is contained in the object of interest based on the vector,
The non-transitory computer-readable medium of claim 20, comprising determining, on a pixel-by-pixel basis, whether the calculated probability is higher than a predetermined threshold.

A method for identifying an object of interest in a medical image,
Creating an image pyramid of the medical image, wherein the image pyramid contains a plurality of layers, each layer contains a plurality of values, each value being one or more pixels in the medical image. Representing a block of, each contiguous layer contains less value than the previous layer, the creation and
For each layer of the image pyramid,
Initializing the internal state of the nodes in the spatial grid, where each node in the spatial grid represents a block of one or more pixels in the medical image, one in the medical image. Or the initialization, which is connected to at least one node that represents the block next to multiple pixels.
Using a neural network to iteratively update the internal state of the node in the spatial lattice using spatially gated propagation, at each iteration, each node prepends. Updates its internal state based on at least one selected from the group consisting of the value of the node from the iteration of, the value of the adjacent node from the previous iteration, and the new value of the node. To do and
Including identifying the object of interest in the medical image based on the value of the node in the convergence of the spatial grid containing the node representing the value contained in the first layer of the image pyramid. How to be.

23. Claim 23 that iteratively updating the internal state of the node using a neural network comprises updating a value in a vector of values associated with the internal state of the node. The method described in.

The above method
At each layer-by-layer iteration of the image pyramid, the first internal state before the node that represents the value contained in the layer of the image pyramid and the value contained in the layer of the image pyramid. Performing the first convolution, including concatenation,
23. The method of claim 23, further comprising storing the execution result of the first convolution.

The method is at each iteration of each layer of the image pyramid, the current layer of the image pyramid, the layer of the image pyramid directly above the current layer of the image pyramid, and the current layer of the image pyramid. 25. The method of claim 25, further comprising performing a second convolution, including a second concatenation of the results of performing the first convolution on the layer of the image pyramid beneath.

Creating the image pyramid involves performing a convolution on each value representing the brightness of each block of one or more pixels in the medical image, reducing the dimension of the input medical image data. 23. The method of claim 23, wherein each convolution that includes produces a value that is used to represent the medical image within the next layer of the image pyramid.

23. The method of claim 23, wherein each value representing the medical image in the first layer of the image pyramid corresponds to a pixel in the medical image.

Identifying the object of interest in the medical image based on the value of the node in the convergence of the spatial lattice containing the node representing the value contained in the first layer of the image pyramid. Using the final layer of the neural network, the medical use is based on the values contained in each vector of values associated with the node representing the value contained in the first layer of the image pyramid. To calculate the probability that each pixel in the image is included in the object of interest,
28. The method of claim 28, comprising determining, for each pixel, whether the calculated probability is higher than a predetermined threshold.

Each node determines its internal state based on at least one selected from the group consisting of the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, and the new value of the node. 26. The method of claim 26, wherein updating comprises using a squashing function and the result of performing the second convolution.

The adjacent node was selected from a group of nodes representing the block of one or more pixels directly above, below, to the right, and to the left of the block of one or more pixels represented by the node. The method of claim 23, which is a node.

23. The method of claim 23, wherein a medical image having a lower resolution is created by representing the medical image containing less value.

A system for determining the area of interest in an image
Memory and
The electronic processor includes an electronic processor connected to the memory.
Creating an image pyramid of the image, wherein the image pyramid contains a plurality of layers.
For each layer of the image pyramid,
Initializing the internal state of the nodes in the spatial grid, where each node represents a block of one or more pixels in the image, next to one or more pixels in the image. That initialization, which is connected to at least one node that represents the block,
Using a neural network to iteratively update the internal state of the node in the spatial lattice using spatially gated propagation,
Identifying the area of interest in the image based on the internal state of the node in the convergence of the spatial grid containing the node representing the values contained in the first layer of the image pyramid. A system that is configured to run.

33. The system of claim 33, wherein each contiguous layer of the plurality of layers included in the image pyramid represents the image at a lower resolution than the image represented by the layer immediately preceding the image pyramid.

34. The system of claim 34, wherein the electronic processor is configured to represent the image at a lower resolution by representing the image with less value.

For each node, the electronic processor maintains the value of the node from the previous iteration, sets the value of the node to the value of the adjacent node from the previous iteration, or the node. 33. The system of claim 33, which is configured to update the internal state of the node by determining whether to set a new value for.

The electronic processor is configured to iteratively update the internal state of the node by using a neural network to update the values in the vector of values associated with the internal state of the node. 33. The system according to claim 33.

In each iteration of the image pyramid layer by layer, the electronic processor in front of the node representing the value contained in the layer of the image pyramid and the value contained in the layer of the image pyramid. 35. The system of claim 35, which is configured to perform a first convolution that includes a first concatenation of the internal state and to store the execution result of the first convolution.

At each iteration of each layer of the image pyramid, the electronic processor comprises the current layer of the image pyramid, the layer of the image pyramid directly above the current layer of the image pyramid, and the current layer of the image pyramid. 38. The system of claim 38, wherein a second convolution is configured to perform a second concatenation of the result of performing the first convolution on the layer of the image pyramid beneath the layer.

The electronic processor is further configured to perform a convolution on each value representing the brightness of each block of one or more pixels in the image in the first iteration, the dimension of the input image data. 34. The system of claim 34, wherein each convolution, including a reduction of the image, produces a value used to represent the image in the next layer of the image pyramid.

Each of the electronic processors in the image uses the final layer of the neural network, based on each vector associated with a node representing the value contained in the first layer of the image pyramid. To calculate the probability that a pixel is included in the target object in the image,
At the convergence of the spatial grid containing nodes representing the values contained in the first layer of the image pyramid, by determining for each pixel whether the calculated probability is higher than a predetermined threshold. 33. The system of claim 33, which is configured to identify the object of interest in the image based on the value of the node.

By using the squashing function and the execution result of the second convolution, the electronic processor uses the value of the node from the previous iteration, the value of the adjacent node from the previous iteration, or the new value of the node. 39. The system of claim 39, which is configured to update the internal state based on at least one selected from the group consisting of values.

The adjacent node is a block of one or more pixels in the image that is directly above, below, to the right, and to the left of the block of one or more pixels in the image represented by the node. 36. The system of claim 36, which is a node selected from the group consisting of representing nodes.

A non-transitory computer-readable medium that stores instructions that perform a set of functions when executed using an electronic processor.
Creating an image pyramid of an image, wherein the image pyramid contains multiple layers, each layer contains multiple values, each value containing a block of one or more pixels in the image. Representing that each contiguous layer contains less value than the previous layer, as described above.
For each layer of the image pyramid,
Initializing the internal state of the nodes in the spatial grid, where each node represents a block of one or more pixels in the image, next to one or more pixels in the image. That initialization, which is connected to at least one node that represents the block,
Using a neural network to iteratively update the internal state of the node in the spatial lattice using spatially gated propagation, at each iteration, each node prepends. Update its internal state based on at least one selected from the group consisting of the value of the node from the iteration of, the value of the adjacent node from the previous iteration, or the new value of the node. To do and
Includes identifying an object of interest in the image based on the value of the node in the convergence of the spatial grid containing the node representing the value contained in the first layer of the image pyramid. , A non-temporary computer-readable medium.

44. Claim 44 that iteratively updating the internal state of the node using a neural network comprises updating the value in the vector of values associated with the internal state of the node. Non-temporary computer-readable medium described in.

Identifying a target object in the image based on the value of the node at the convergence of the spatial lattice containing the node representing the value contained in the first layer of the image pyramid is the neural network. Each pixel in the image is included in the object of interest, based on the vector associated with the node representing the value contained in the first layer of the image pyramid, using the final layer in the image. To calculate the probability of being
The non-transitory computer-readable medium of claim 44, comprising determining, on a pixel-by-pixel basis, whether the calculated probability is higher than a predetermined threshold.